July Bits

This is a running daily of all the stuff done in July.

July 27

Losing faith in our ability to find SNPs in MBDBS Oly data.. Working in Epidiverse pipeline we see an unusual output from the Oly data, as particular as it relates to our Geoduck WGBS..

pic

The geoduck plot of SNPs is almost exactly what the tutorial plot looks like.

Given this skepticism (along with the fact that we are having to use a concatenated genome) we decided to take another approach to try to quantify the degree to which CT SNPs might influence the results.

Attempt A. I took Katherine’s ../data/HCSS_Afilt32m70_01_pp90_m75.recode.vcf (5021 lines) and determined there were 900 C->T SNPs via

fgrep "C	T" ../data/HCSS_Afilt32m70_01_pp90_m75.recode.vcf | wc -l

I wrote that file out, did a little tidying

CTbed <- select(CTsnp, X1, X2) %>%
  mutate(X3 = X2 - 1) %>%
  select(X1, X3, X2)

write_tsv(CTbed, "../analyses/CT_HCSS_Afilt32m70_01_pp90_m75.recode.bed", col_names = FALSE)

intersected with CpG in the genome..

/Applications/bioinfo/bedtools2/bin/intersectbed \
-wa \
-a ../analyses/CT_HCSS_Afilt32m70_01_pp90_m75.recode.bed \
-b ../genome-features/Olurida_v081_CG-motif.gff

to determine that 216 of the CT SNPs in that given vcf were in fact in CG loci. (24%)

My hope is this is some level of support that we can use to quantify the proportion instances where CT SNPs might influence our interpretation..

This coupled with an explanation that it is ok to include the data as a CT SNP is in fact a situation where there is no methylation (similar to what BS does.)

July 7

Got Epidiverse running though skeptical as it is really on identifying C->T SNPS

July 3

BSSNPer failed because fastat to long soo.

sr320@raven:/home/shared/8TB_HDD_01/sr320/github/paper-oly-mbdbs-gen/data/bgdata$ fasta_formatter -i Olurida_v081-mergecat99.fa -w 65000 > Olurida_v081-mergecat98.fa

July 1

Got the oly genome merged and named contig 1-14

echo "Starting genome concatenation"
awk -v i=10628 -f merge_contigs.awk.txt \
../data/bgdata/Olurida_v081.fa  > ../data/bgdata/Olurida_v081-mergecat.fa
cat ../data/bgdata/Olurida_v081-mergecat.fa | sed 's/> Contig0/>Contig0/' \
> ../data/bgdata/Olurida_v081-mergecat01.fa
sed '/>/s/.*/>Contig00/' ../data/bgdata/Olurida_v081-mergecat01.fa > ../data/bgdata/Olurida_v081-mergecat02.fa
awk '{for(x=1;x<=NF;x++)if($x~/00/){sub(/00/,++i)}}1' ../data/bgdata/Olurida_v081-mergecat02.fa | fgrep ">"

Written on July 1, 2022