All gigas at once

What if we start with current data and worked our way back to see if an integration of data was fruitful. Step 1 - bismark it all together..

We have what we call Ronit’s data.

samples

placing those samples in directory

/gscratch/srlab/sr320/data/cg-big/

specifically

cp /gscratch/srlab/mngeorge/data/cgigas_ronit/zr3534* .


We also have Roberto’s samples…

rb

Moving trimmed files to mox.

wget -r --no-directories --no-parent -P . --no-check-certificate -A fastp-trim.20200819.fq.gz https://gannet.fish.washington.edu/Atumefaciens/20200818_cgig_wgbs_roberto_fastp_trimming/

Haws samples

3644 prefix

wget -r \
--no-directories --no-parent \
-P . \
--no-check-certificate \
-A "20201206.fq.gz" https://gannet.fish.washington.edu/Atumefaciens/20201206_cgig_fastp-10bp-5-3-prime_ploidy-pH-wgbs/

and

wget -r \
--no-directories --no-parent \
-P . \
--no-check-certificate \
-A "*val_*.fq.gz" https://gannet.fish.washington.edu/spartina/project-oyster-oa/Haws/trimmed-data-2/poly-G/

Manchester gonad

man

3616

wget -r \
--no-directories --no-parent \
-P . \
--no-check-certificate \
-A "*val_*.fq.gz" https://gannet.fish.washington.edu/spartina/project-gigas-oa-meth/output/trimgalore/03-poly-G/

Possibly some old samples?? maybe later

Claire and Yanouk..

old


[sr320@mox1 cg-big]$ ls
0501_R1.fastp-trim.20200819.fq.gz	zr3534_3_R2.fastp-trim.20201202.fq.gz	zr3644_19_R1.fastp-trim.20201206.fq.gz
0501_R2.fastp-trim.20200819.fq.gz	zr3534_4_R1.fastp-trim.20201202.fq.gz	zr3644_19_R2.fastp-trim.20201206.fq.gz
0502_R1.fastp-trim.20200819.fq.gz	zr3534_4_R2.fastp-trim.20201202.fq.gz	zr3644_1_R1.fastp-trim.20201206.fq.gz

A little bit of renaming for fastp

for file in *.fq.gz ; do
	mv $file $(echo $file | rev | cut -f2- -d- | rev | sed s/.fastp//g).fq.gz
	done

after pulling down the trimgalore.. (Haws and Manchester) Want to save the duplicate Haws so

#zr3644_7_R2_val_2_val_2_val_2.fq.gz
for file in *R*_val* ; do
	mv $file $(echo $file | rev | cut -f7- -d_ | rev | sed s/zr/zrtg/g).fq.gz
	done

072121
Timed out at 20 days, missing about 5 samples. Will know those out in Bismark then combined all files to complete script.

Written on June 4, 2021