FROGER Day 01

Step 1 is prepping 3 coral genomes

TLDR: Step 1 is about as far as we got beyond getting files up to Mox and the trimming started.


/gscratch/scrubbed/sr320/030520-fr01/job.sh

#!/bin/bash
## Job Name
#SBATCH --job-name=fr-01
## Allocation Definition
#SBATCH --account=srlab
#SBATCH --partition=srlab
## Resources
## Nodes (We only get 1, so this is fixed)
#SBATCH --nodes=1
## Walltime (days-hours:minutes:seconds format)
#SBATCH --time=6-00:00:00
## Memory per node
#SBATCH --mem=100G
#SBATCH --mail-type=ALL
#SBATCH --mail-user=sr320@uw.edu
## Specify the working directory for this job
#SBATCH --chdir=/gscratch/scrubbed/sr320/030520-fr01/

# Prepping 3 coral genomes

# Directories and programs
bismark_dir="/gscratch/srlab/programs/Bismark-0.21.0"
bowtie2_dir="/gscratch/srlab/programs/bowtie2-2.3.4.1-linux-x86_64/"
samtools="/gscratch/srlab/programs/samtools-1.9/samtools"
reads_dir="/gscratch/srlab/sr320/data/olurida-bs/decomp/"
#genome_folder="/gscratch/srlab/sr320/data/olurida-genomes/v081/"

source /gscratch/srlab/programs/scripts/paths.sh


${bismark_dir}/bismark_genome_preparation \
--verbose \
--parallel 28 \
--path_to_aligner ${bowtie2_dir} \
/gscratch/srlab/sr320/data/froger/Mcap_Genome/


${bismark_dir}/bismark_genome_preparation \
--verbose \
--parallel 28 \
--path_to_aligner ${bowtie2_dir} \
/gscratch/srlab/sr320/data/froger/Pact_Genome/



${bismark_dir}/bismark_genome_preparation \
--verbose \
--parallel 28 \
--path_to_aligner ${bowtie2_dir} \
/gscratch/srlab/sr320/data/froger/Pdam_Genome/

As that moves along we need to start thinking about a mapping strategy. Possibly going with a all 9 with a common setting, subsetting u to get a quick feel of mapping efficiency. Then customize each bismark mapping parameter for the type of library: WGBS, RRBS, and MBD-BS.

Maybe something along the lines of

find ${reads_dir}*_s456_trimmed.fq \
| xargs basename -s _s456_trimmed.fq | xargs -I{} ${bismark_dir}/bismark \
--path_to_bowtie ${bowtie2_dir} \
-genome ${genome_folder} \
-p 4 \
-u 10000 \
--non_directional \
-1 ${reads_dir}/{}_s456_trimmed.fq \
-2 ${reads_dir}/{}_s456_trimmed.fq

or maybe leverage commas!

Comma-separated list of files containing the #2 mates (filename usually includes _2), e.g. flyA_2.fq, flyB_2.fq). Sequences specified with this option must correspond file-for-file and read-for-read with those specified in mates1.


Sam trimmed some stuff up

[sr320@mox1 20200305_methcompare_fastp_trimming]$ ls
20200305_methcompare_fastp_trimming.sh	Meth3_R1_001.fastq.gz
fastq.list.txt				Meth3_R2_001.fastq.gz
Meth10_R1_001.fastq.gz			Meth4.fastp-trim.202003055221_R1_001.fastq.gz
Meth10_R2_001.fastq.gz			Meth4.fastp-trim.202003055221_R2_001.fastq.gz
Meth11_R1_001.fastq.gz			Meth4.fastp-trim.202003055221.report.html
Meth11_R2_001.fastq.gz			Meth4.fastp-trim.202003055221.report.json
Meth12_R1_001.fastq.gz			Meth4_R1_001.fastq.gz
Meth12_R2_001.fastq.gz			Meth4_R2_001.fastq.gz
Meth13_R1_001.fastq.gz			Meth5.fastp-trim.202003051832_R1_001.fastq.gz
Meth13_R2_001.fastq.gz			Meth5.fastp-trim.202003051832_R2_001.fastq.gz
Meth14_R1_001.fastq.gz			Meth5_R1_001.fastq.gz
Meth14_R2_001.fastq.gz			Meth5_R2_001.fastq.gz
Meth15_R1_001.fastq.gz			Meth6_R1_001.fastq.gz
Meth15_R2_001.fastq.gz			Meth6_R2_001.fastq.gz
Meth16_R1_001.fastq.gz			Meth7_R1_001.fastq.gz
Meth16_R2_001.fastq.gz			Meth7_R2_001.fastq.gz
Meth17_R1_001.fastq.gz			Meth8_R1_001.fastq.gz
Meth17_R2_001.fastq.gz			Meth8_R2_001.fastq.gz
Meth18_R1_001.fastq.gz			Meth9_R1_001.fastq.gz
Meth18_R2_001.fastq.gz			Meth9_R2_001.fastq.gz
Meth1_R1_001.fastq.gz			program_options.log
Meth1_R2_001.fastq.gz			slurm-2015259.out
Meth2_R1_001.fastq.gz			system_path.log
Meth2_R2_001.fastq.gz			trimmed_fastq_checksums.md5

Went with the following code for a first round test.

#!/bin/bash
## Job Name
#SBATCH --job-name=fr-02
## Allocation Definition
#SBATCH --account=srlab
#SBATCH --partition=srlab
## Resources
## Nodes (We only get 1, so this is fixed)
#SBATCH --nodes=1
## Walltime (days-hours:minutes:seconds format)
#SBATCH --time=1-00:00:00
## Memory per node
#SBATCH --mem=100G
#SBATCH --mail-type=ALL
#SBATCH --mail-user=sr320@uw.edu
## Specify the working directory for this job
#SBATCH --chdir=/gscratch/scrubbed/sr320/030520-fr01/

# Looking for mapping efficiency

# Directories and programs
bismark_dir="/gscratch/srlab/programs/Bismark-0.21.0"
bowtie2_dir="/gscratch/srlab/programs/bowtie2-2.3.4.1-linux-x86_64/"
samtools="/gscratch/srlab/programs/samtools-1.9/samtools"

reads_dir="/gscratch/scrubbed/samwhite/outputs/20200305_methcompare_fastp_trimming/"
genome_folder="/gscratch/srlab/sr320/data/froger/Mcap_Genome/"

source /gscratch/srlab/programs/scripts/paths.sh


# ${bismark_dir}/bismark_genome_preparation \
# --verbose \
# --parallel 28 \
# --path_to_aligner ${bowtie2_dir} \
# ${genome_folder}



find ${reads_dir}*_R1_001.fastq.gz \
| xargs basename -s _R1_001.fastq.gz | xargs -I{} ${bismark_dir}/bismark \
--path_to_bowtie ${bowtie2_dir} \
-genome ${genome_folder} \
-p 4 \
-u 10000 \
--non_directional \
-1 ${reads_dir}/{}_R1_001.fastq.gz \
-2 ${reads_dir}/{}_R2_001.fastq.gz \
-o Mcap_test01



genome_folder="/gscratch/srlab/sr320/data/froger/Pact_Genome/"



find ${reads_dir}*_R1_001.fastq.gz \
| xargs basename -s _R1_001.fastq.gz | xargs -I{} ${bismark_dir}/bismark \
--path_to_bowtie ${bowtie2_dir} \
-genome ${genome_folder} \
-p 4 \
-u 10000 \
--non_directional \
-1 ${reads_dir}/{}_R1_001.fastq.gz \
-2 ${reads_dir}/{}_R2_001.fastq.gz \
-o Pact_test01


genome_folder="/gscratch/srlab/sr320/data/froger/Pdam_Genome/"



find ${reads_dir}*_R1_001.fastq.gz \
| xargs basename -s _R1_001.fastq.gz | xargs -I{} ${bismark_dir}/bismark \
--path_to_bowtie ${bowtie2_dir} \
-genome ${genome_folder} \
-p 4 \
-u 10000 \
--non_directional \
-1 ${reads_dir}/{}_R1_001.fastq.gz \
-2 ${reads_dir}/{}_R2_001.fastq.gz \
-o Pdam_test01

Written on March 5, 2020