Here we take the first set of time series methylation data to CpG level percent methylation.

Data was trimmed

Notebook

https://robertslab.github.io/sams-notebook/posts/2025/2025-01-02-Trimming---A.pulchra-WGBS-with-fastp-FastQC-and-MultiQC-on-Raven/

To reproduce the trimming and quality control (QC) of Acropora pulchra Whole Genome Bisulfite Sequencing (WGBS) data, obtain the raw FastQ files and use fastp to trim 25bp from the 5’ end while removing adapters, polyG, and polyA sequences; if errors occur, attempt repair with BBTools’ repair.sh before re-trimming. Evaluate trimmed reads with FastQC, then summarize results using MultiQC. The complete workflow, including commands and scripts, is available in the project repository, and output files can be accessed here.

Reads were aligned

Notebook

https://robertslab.github.io/sams-notebook/posts/2025/2025-01-02-Bismark-Alignments---A.pulchra-Trimmed-Reads-Using-Bismark-on-Hyak/

To align Acropora pulchra WGBS reads using Bismark on the Hyak cluster, trimmed FastQ files were prepared, and a fastq_pairs.txt file was generated. The alignment process was initiated by submitting a SLURM array job using sbatch –array=1-$(wc -l < ../output/01.20-D-Apul-WGBS-trimming-fastp-FastQC-MultiQC/fastq_pairs.txt) 02.20-D-Apul-WGBS-alignment-SLURM-job.sh, which allowed each node to process a unique FastQ pair. The job script (02.20-D-Apul-WGBS-alignment-SLURM_array-bismark.sh) configured directories, allocated computational resources (bismark_threads=5), extracted FastQ filenames, and executed Bismark with Bowtie2 using the parameters –score_min L,0,-0.6 –non_directional –gzip -p 5. The SLURM script (02.20-D-Apul-WGBS-alignment-SLURM-job.sh) defined job parameters (#SBATCH –cpus-per-task=20 –mem=100G –time=72:00:00) and executed the containerized Bismark workflow via Apptainer.

Taking BAMS to tables

Deduplicated

find /gscratch/srlab/sam/gitrepos/urol-e5/timeseries_molecular/D-Apul/output/02.20-D-Apul-WGBS-alignment-SLURM_array-bismark/*.bam | \
xargs -n 1 basename -s _R1_001.fastp-trim_bismark_bt2_pe.bam | \
parallel -j 8 deduplicate_bismark \
--bam \
--paired \
--output_dir ../output/15-Apul-bismark \
/gscratch/srlab/sam/gitrepos/urol-e5/timeseries_molecular/D-Apul/output/02.20-D-Apul-WGBS-alignment-SLURM_array-bismark//{}_R1_001.fastp-trim_bismark_bt2_pe.bam

Sorted

find ../output/15-Apul-bismark/*deduplicated.bam | \
xargs basename -s _R1_001.fastp-trim_bismark_bt2_pe.deduplicated.bam | \
xargs -I{} samtools \
sort --threads 24 \
../output/15-Apul-bismark/{}_R1_001.fastp-trim_bismark_bt2_pe.deduplicated.bam \
-o ../output/15-Apul-bismark/{}.sorted.bam

Methylation Extraction

find ../output/15-Apul-bismark/*deduplicated.bam | xargs -n 1 -I{} \
bismark_methylation_extractor --bedGraph --counts --comprehensive --merge_non_CpG \
--multicore 24 --buffer_size 75% --output ../output/15-Apul-bismark "{}"

Methylation call

find ../output/15-Apul-bismark/*deduplicated.bismark.cov.gz | \
xargs -n 1 basename -s _pe.deduplicated.bismark.cov.gz | \
parallel -j 24 coverage2cytosine \
--genome_folder /gscratch/srlab/sam/gitrepos/urol-e5/timeseries_molecular/D-Apul/data/ \
-o ../output/15-Apul-bismark/{} \
--merge_CpG \
--zero_based \
../output/15-Apul-bismark/{}_pe.deduplicated.bismark.cov.gz

Rename cov files

# Read the CSV file and create an associative array - hand edited 3 filenames first
declare -A sample_map
while IFS=, read -r col1 sample_name col3 col4 azenta_sample_name col6; do
  sample_map["$azenta_sample_name"]="$sample_name"
done < ../../M-multi-species/data/e5_DNA_Azenta_metadata.csv

cd ../output/15.5-Apul-bismark/

# Iterate over all *.cov files and rename them
for file in *.cov; do
  # Extract the portion before the first underscore
  prefix="${file%%_*}"
  if [[ -n "${sample_map[$prefix]}" ]]; then
    new_file="${file//$prefix/${sample_map[$prefix]}}"
    mv "$file" "$new_file"
  fi
done

Made bedgraph

cd ../output/15.5-Apul-bismark/

for f in *merged_CpG_evidence.cov
do
  STEM=$(basename "${f}" _R1_001.fastp-trim_bismark_bt2.CpG_report.merged_CpG_evidence.cov)
  cat "${f}" | awk -F $'\t' 'BEGIN {OFS = FS} {if ($5+$6 >= 10) {print $1, $2, $3, $4}}' \
  > "${STEM}"_10x.bedgraph
done

Created modified tables

for file in ../output/15.5-Apul-bismark/*10x.bedgraph; do
    awk '{print "CpG_"_$1"_"$2, $4}' "$file" > "${file%.bedgraph}_processed.txt"
done

Now there are files for each of the Apul samples @ https://gannet.fish.washington.edu/seashell/bu-github/timeseries_molecular/D-Apul/output/15.5-Apul-bismark/

--- title: "Apul Time Series Methylation" description: "It's in the name..." categories: [e5, methylation, coral, bismark] #citation: date: 02-08-2025 image: http://gannet.fish.washington.edu/seashell/snaps/Monosnap_urol-e5timeseries_molecular_2025-02-08_04-42-58.png # finding a good image author: - name: Steven Roberts url: orcid: 0000-0001-8302-1138 affiliation: Professor, UW - School of Aquatic and Fishery Sciences affiliation-url: https://robertslab.info #url: # self-defined draft: false # setting this to `true` will prevent your post from appearing on your listing page until you're ready! format: html: code-fold: FALSE code-tools: true code-copy: true highlight-style: github code-overflow: wrap --- Here we take the first set of time series methylation data to CpG level percent methylation. # Data was trimmed ::: callout-note ## Notebook <https://robertslab.github.io/sams-notebook/posts/2025/2025-01-02-Trimming---A.pulchra-WGBS-with-fastp-FastQC-and-MultiQC-on-Raven/> ::: To reproduce the trimming and quality control (QC) of *Acropora pulchra* Whole Genome Bisulfite Sequencing (WGBS) data, obtain the raw FastQ files and use **fastp** to trim **25bp from the 5’ end** while removing adapters, polyG, and polyA sequences; if errors occur, attempt repair with **BBTools’ repair.sh** before re-trimming. Evaluate trimmed reads with **FastQC**, then summarize results using **MultiQC**. The complete workflow, including commands and scripts, is available in the project repository, and output files can be accessed [here](https://gannet.fish.washington.edu/gitrepos/urol-e5/timeseries_molecular/D-Apul/output/01.20-D-Apul-WGBS-trimming-fastp-FastQC-MultiQC/). # Reads were aligned ::: callout-note ## Notebook <https://robertslab.github.io/sams-notebook/posts/2025/2025-01-02-Bismark-Alignments---A.pulchra-Trimmed-Reads-Using-Bismark-on-Hyak/> ::: To align *Acropora pulchra* WGBS reads using **Bismark** on the **Hyak** cluster, trimmed FastQ files were prepared, and a **fastq_pairs.txt** file was generated. The alignment process was initiated by submitting a SLURM array job using sbatch --array=1-\$(wc -l \< ../output/01.20-D-Apul-WGBS-trimming-fastp-FastQC-MultiQC/fastq_pairs.txt) 02.20-D-Apul-WGBS-alignment-SLURM-job.sh, which allowed each node to process a unique FastQ pair. The job script (02.20-D-Apul-WGBS-alignment-SLURM_array-bismark.sh) configured directories, allocated computational resources (bismark_threads=5), extracted FastQ filenames, and executed **Bismark** with **Bowtie2** using the parameters --score_min L,0,-0.6 --non_directional --gzip -p 5. The SLURM script (02.20-D-Apul-WGBS-alignment-SLURM-job.sh) defined job parameters (#SBATCH --cpus-per-task=20 --mem=100G --time=72:00:00) and executed the containerized Bismark workflow via **Apptainer**. # Taking BAMS to tables ## Deduplicated ``` bash find /gscratch/srlab/sam/gitrepos/urol-e5/timeseries_molecular/D-Apul/output/02.20-D-Apul-WGBS-alignment-SLURM_array-bismark/*.bam | \ xargs -n 1 basename -s _R1_001.fastp-trim_bismark_bt2_pe.bam | \ parallel -j 8 deduplicate_bismark \ --bam \ --paired \ --output_dir ../output/15-Apul-bismark \ /gscratch/srlab/sam/gitrepos/urol-e5/timeseries_molecular/D-Apul/output/02.20-D-Apul-WGBS-alignment-SLURM_array-bismark//{}_R1_001.fastp-trim_bismark_bt2_pe.bam ``` ## Sorted ``` bash find ../output/15-Apul-bismark/*deduplicated.bam | \ xargs basename -s _R1_001.fastp-trim_bismark_bt2_pe.deduplicated.bam | \ xargs -I{} samtools \ sort --threads 24 \ ../output/15-Apul-bismark/{}_R1_001.fastp-trim_bismark_bt2_pe.deduplicated.bam \ -o ../output/15-Apul-bismark/{}.sorted.bam ``` ## Methylation Extraction ``` bash find ../output/15-Apul-bismark/*deduplicated.bam | xargs -n 1 -I{} \ bismark_methylation_extractor --bedGraph --counts --comprehensive --merge_non_CpG \ --multicore 24 --buffer_size 75% --output ../output/15-Apul-bismark "{}" ``` ## Methylation call ``` bash find ../output/15-Apul-bismark/*deduplicated.bismark.cov.gz | \ xargs -n 1 basename -s _pe.deduplicated.bismark.cov.gz | \ parallel -j 24 coverage2cytosine \ --genome_folder /gscratch/srlab/sam/gitrepos/urol-e5/timeseries_molecular/D-Apul/data/ \ -o ../output/15-Apul-bismark/{} \ --merge_CpG \ --zero_based \ ../output/15-Apul-bismark/{}_pe.deduplicated.bismark.cov.gz ``` ## Rename cov files ``` bash # Read the CSV file and create an associative array - hand edited 3 filenames first declare -A sample_map while IFS=, read -r col1 sample_name col3 col4 azenta_sample_name col6; do sample_map["$azenta_sample_name"]="$sample_name" done < ../../M-multi-species/data/e5_DNA_Azenta_metadata.csv cd ../output/15.5-Apul-bismark/ # Iterate over all *.cov files and rename them for file in *.cov; do # Extract the portion before the first underscore prefix="${file%%_*}" if [[ -n "${sample_map[$prefix]}" ]]; then new_file="${file//$prefix/${sample_map[$prefix]}}" mv "$file" "$new_file" fi done ``` ## Made bedgraph ``` bash cd ../output/15.5-Apul-bismark/ for f in *merged_CpG_evidence.cov do STEM=$(basename "${f}" _R1_001.fastp-trim_bismark_bt2.CpG_report.merged_CpG_evidence.cov) cat "${f}" | awk -F $'\t' 'BEGIN {OFS = FS} {if ($5+$6 >= 10) {print $1, $2, $3, $4}}' \ > "${STEM}"_10x.bedgraph done ``` ## Created modified tables ``` bash for file in ../output/15.5-Apul-bismark/*10x.bedgraph; do awk '{print "CpG_"_$1"_"$2, $4}' "$file" > "${file%.bedgraph}_processed.txt" done ``` ![](http://gannet.fish.washington.edu/seashell/snaps/Monosnap_httpsgannet.fish.washington.eduseashellbu-githubtimeseries_molecularD-Apuloutput15.5-Apul-bismarkACR-139-TP2_10x_proc_2025-02-08_05-03-16.png) Now there are files for each of the Apul samples \@ <https://gannet.fish.washington.edu/seashell/bu-github/timeseries_molecular/D-Apul/output/15.5-Apul-bismark/>