The Bismark Boat

I have been working through Bismark with a few Crassosstrea virginica datasets. This includes the BS data from the 2015 Oil exposure experiment, OA exposure - gonad tissue (OAKL), and a full suite of library preps via Qiagen.

A few take aways is are 1) use the -u feature to work out analysis on subset, 2) the --score_min variable is important for allowing for mismatches, 3) the working directory approach seems to work good given the number of files.

I have created a few notebooks for running most of the Bismark pipeline.

Most cells do not need much attention besides this one..

%%bash
find /Users/sr320/Desktop/trim14/zr2096_*R1* \
| xargs basename -s _s1_R1_val_1.fq.gz | xargs -I{} /Applications/bioinfo/Bismark_v0.19.0/bismark \
--path_to_bowtie /Applications/bioinfo/bowtie2-2.3.4.1-macos-x86_64 \
--genome /Users/sr320/Dropbox/wd/18-03-15/genome \
--score_min L,0,-1.2 \
-u 10000 \
-p 2 \
--non_directional \
-1 /Users/sr320/Desktop/trim14/{}_s1_R1_val_1.fq.gz \
-2 /Users/sr320/Desktop/trim14/{}_s1_R2_val_2.fq.gz \
2> bismark.err

Here you need to be aware of file naming structure for basename and of course where the files are.

As far as some results. For the OAKL samples sample 2 seems a bit off.

summary