Getting on with Nanopore cDNA data

Minion
oyster
chile
Author
Affiliation

Steven Roberts

Published

August 1, 2025

Went in a few directions but will try to layout properly here. Experiment - 3 populations, 4 pooled (n=5) libraires - thus 20 individuals represented per population.

Data as provided

Combining

Here is an example from one species

find /home/shared/16TB_HDD_01/valentina/data/Quilhua/barcode01 -name "*.fastq.gz" -print0 | xargs -0 zcat | gzip > ../output/02-RNAseq/Qui_barcode01_combined.fastq.gz

find /home/shared/16TB_HDD_01/valentina/data/Quilhua/barcode01_1 -name "*.fastq.gz" -print0 | xargs -0 zcat | gzip > ../output/02-RNAseq/Qui_barcode01_1_combined.fastq.gz

find /home/shared/16TB_HDD_01/valentina/data/Quilhua/barcode02_1 -name "*.fastq.gz" -print0 | xargs -0 zcat | gzip > ../output/02-RNAseq/Qui_barcode02_1_combined.fastq.gz

find /home/shared/16TB_HDD_01/valentina/data/Quilhua/barcode12 -name "*.fastq.gz" -print0 | xargs -0 zcat | gzip > ../output/02-RNAseq/Qui_barcode12_combined.fastq.gz

Align

eval "$(/opt/anaconda/anaconda3/bin/conda shell.bash hook)"
conda activate

# Set number of threads
threads=42

# Set paths
ref_genome="../data/Och_HapA_assembly.fa"
input_dir="../output/02-RNAseq"
output_dir="../output/02-RNAseq"

# Loop over each FASTQ file
for fq in ${input_dir}/*.fastq.gz; do
  # Extract sample name (e.g., Pum_barcode02 from Pum_barcode02_combined.fastq.gz)
  sample=$(basename "$fq" .fastq.gz)

  echo "Processing $sample..."

  # Run minimap2 and samtools sort
  minimap2 -ax splice -k14 --MD -t $threads "$ref_genome" "$fq" | \
    samtools sort -@ $threads -o "${output_dir}/${sample}.sorted.bam"

  # Index the sorted BAM
  samtools index "${output_dir}/${sample}.sorted.bam"
done

Stringtie

mkdir -p ../output/02-RNAseq/stringtie_out

for bam in ../output/02-RNAseq/*combined.sorted.bam; do
    sample=$(basename "$bam" _combined.sorted.bam)
    /home/shared/stringtie-2.2.1.Linux_x86_64/stringtie "$bam" \
        -L \
        -o ../output/02-RNAseq/stringtie_out/${sample}.gtf \
        -p 32 \
        -v
done
ls ../output/02-RNAseq/stringtie_out/*.gtf > ../output/02-RNAseq/stringtie_out/mergelist.txt

/home/shared/stringtie-2.2.1.Linux_x86_64/stringtie --merge \
    -L \
    -G ../data/GN.gene.gtf \
    -o ../output/02-RNAseq/stringtie_out/merged.gtf \
    ../output/02-RNAseq/stringtie_out/mergelist.txt
mkdir -p ../output/02-RNAseq/stringtie_quant

for bam in ../output/02-RNAseq/*combined.sorted.bam; do
    sample=$(basename "$bam" _combined.sorted.bam)
    /home/shared/stringtie-2.2.1.Linux_x86_64/stringtie "$bam" \
        -L \
        -e \
        -B \
        -G ../output/02-RNAseq/stringtie_out/merged.gtf \
        -o ../output/02-RNAseq/stringtie_quant/${sample}.gtf
done
python /home/shared/stringtie-2.2.1.Linux_x86_64/prepDE.py \
-i ../output/02-RNAseq/stringtie_quant/samplelist.txt \
-g ../output/02-RNAseq/stringtie_quant/gene_count_matrix.csv \
-t ../output/02-RNAseq/stringtie_quant/transcript_count_matrix.csv

Next?

Normalize, QC?, Annotate (many are novel genes)