Alignment Data
SAMs and BAMs
Assignment
Create and inspect and alignment files.
Task 1
Looking at Alignment Files
Download alignment data
Caution
Reminder - these are big files, be sure to ignore on commit.
```{r, engine='bash'}
/data
cd ..-O https://gannet.fish.washington.edu/seashell/bu-mox/scrubbed/120321-cvBS/19F_R1_val_1_bismark_bt2_pe.deduplicated.sorted.bam
curl -O https://gannet.fish.washington.edu/seashell/bu-mox/scrubbed/120321-cvBS/19F_R1_val_1_bismark_bt2_pe.deduplicated.sorted.bam.bai
curl ```
```{r, engine='bash'}
/data
cd ..-O https://gannet.fish.washington.edu/seashell/bu-mox/data/Cvirg-genome/GCF_002022765.2_C_virginica-3.0_genomic.fa
curl -O https://gannet.fish.washington.edu/seashell/bu-mox/data/Cvirg-genome/GCF_002022765.2_C_virginica-3.0_genomic.fa.fai
curl ```
Visualize with tview
Important
Run the following in Terminal as is interactive
/home/shared/samtools-1.12/samtools tview \
\
../data/19F_R1_val_1_bismark_bt2_pe.deduplicated.sorted.bam ../data/GCF_002022765.2_C_virginica-3.0_genomic.fa
Task II
Aligning WGS data
```{r, engine='bash'}
/data
cd ..-O https://owl.fish.washington.edu/nightingales/C_gigas/F143n08_R2_001.fastq.gz
curl -O https://owl.fish.washington.edu/nightingales/C_gigas/F143n08_R1_001.fastq.gz
curl ```
```{r, engine='bash'}
/data
cd ..-O https://gannet.fish.washington.edu/panopea/Cg-roslin/cgigas_uk_roslin_v1_genomic-mito.fa
curl -O https://gannet.fish.washington.edu/panopea/Cg-roslin/cgigas_uk_roslin_v1_genomic-mito.fa.fai
curl -O https://gannet.fish.washington.edu/panopea/Cg-roslin/GCF_902806645.1_cgigas_uk_roslin_v1_genomic-mito.gtf
curl ```
Alignment
```{r, engine='bash'}
/home/shared/hisat2-2.2.1/hisat2-build \
-f ../data/cgigas_uk_roslin_v1_genomic-mito.fa \
/output/cgigas_uk_roslin_v1_genomic-mito.index
..```
```{r, engine='bash'}
/home/shared/hisat2-2.2.1/hisat2 \
-x ../output/cgigas_uk_roslin_v1_genomic-mito.index \
-p 4 \
-1 ../data/F143n08_R1_001.fastq.gz \
-2 ../data/F143n08_R2_001.fastq.gz \
-S ../output/F143_cgigas.sam
```
Take a look
```{r, engine='bash'}
-1 ../output/F143_cgigas.sam
tail ```
```{r, engine='bash'}
# Convert SAM to BAM, using 4 additional threads
/home/shared/samtools-1.12/samtools view -@ 4 -bS \
/output/F143_cgigas.sam > ../output/F143_cgigas.bam
..```
```{r, engine='bash'}
# Sort the BAM file, using 4 additional threads
/home/shared/samtools-1.12/samtools sort -@ 4 \
/output/F143_cgigas.bam -o ../output/F143_cgigas_sorted.bam
..
# Index the sorted BAM file (multi-threading is not applicable to this operation)
/home/shared/samtools-1.12/samtools index \
/output/F143_cgigas_sorted.bam
..```
Find SNPs
mpileup
Now bcftools is recommended for mpileup instead of samtools (which was described in textbook)
```{r, engine='bash'}
/home/shared/bcftools-1.14/bcftools mpileup --threads 4 --no-BAQ \
--fasta-ref ../data/cgigas_uk_roslin_v1_genomic-mito.fa \
/output/F143_cgigas_sorted.bam > ../output/F143_mpileup_output.txt
..```
```{r, engine='bash'}
/output/F143_mpileup_output.txt
tail ..```
```{r, engine='bash'}
/output/F143_mpileup_output.txt \
cat ..| /home/shared/bcftools-1.14/bcftools call -mv -Oz \
> ../output/F143_mpile.vcf.gz
```
```{r, engine='bash'}
"^##" -v ../output/F143_mpile.vcf.gz | \
zgrep 'BEGIN{OFS="\t"} {split($8, a, ";"); print $1,$2,$4,$5,$6,a[1],$9,$10}' | head
awk ```