Oly Pacbio Files
Exploring the different aspects of data generated for the Oly genome.
Most recently did some PacBio that resulted in about 14x coverage assuming 500Mb genome size. We have decent assembly of these data with Canu.
The raw file
m170211_224036_42134_c101073082550000001823236402101737_s1_X0_filtered_subreads.fastq*
m170301_100013_42134_c101174162550000001823269408211761_s1_p0_filtered_subreads.fastq*
m170301_162825_42134_c101174162550000001823269408211762_s1_p0_filtered_subreads.fastq*
m170301_225711_42134_c101174162550000001823269408211763_s1_p0_filtered_subreads.fastq*
m170308_163922_42134_c101174252550000001823269408211742_s1_p0_filtered_subreads.fastq*
m170308_230815_42134_c101174252550000001823269408211743_s1_p0_filtered_subreads.fastq*
m170315_001112_42134_c101169372550000001823273008151717_s1_p0_filtered_subreads.fastq*
m170315_063041_42134_c101169382550000001823273008151700_s1_p0_filtered_subreads.fastq*
m170315_124938_42134_c101169382550000001823273008151701_s1_p0_filtered_subreads.fastq*
m170315_190851_42134_c101169382550000001823273008151702_s1_p0_filtered_subreads.fastq*
I cat ‘d them up to a plain fasta
!cat *.fastq > m170-all.fastq
!cat m170-all.fastq | awk 'NR%4==1{printf ">%s\n", substr($0,2)}NR%4==2{print}' > m170-all.fa
!perl /Users/sr320/git-repos/nb-2017/scripts/count_fasta.pl -i 10000 \
m170-all.fa
0:9999 1432866
10000:19999 310387
20000:29999 32184
30000:39999 3849
40000:49999 563
50000:59999 60
60000:69999 8
Total length of sequence: 12591188319 bp
Total number of sequences: 1779917
N25 stats: 25% of total sequence length is contained in the 181750 sequences >= 12785 bp
N50 stats: 50% of total sequence length is contained in the 486270 sequences >= 8657 bp
N75 stats: 75% of total sequence length is contained in the 912149 sequences >= 6401 bp
Total GC count: 4720472838 bp
GC %: 37.49 %
Written on September 13, 2017