Alignment Data
SAMs and BAMs
Screen Recording
Assignment
Create and inspect and alignment files. Including visualizing and capturing “outside” graphics. Publish notebook in rpubs and provide link at top of code.
Task 1
Looking at Alignment Files
Download alignment data
Danger
Reminder - these are big files, be sure to ignore on commit.
```{r, engine='bash'}
/data
cd ..-O https://gannet.fish.washington.edu/seashell/bu-mox/scrubbed/120321-cvBS/19F_R1_val_1_bismark_bt2_pe.deduplicated.sorted.bam
curl -O https://gannet.fish.washington.edu/seashell/bu-mox/scrubbed/120321-cvBS/19F_R1_val_1_bismark_bt2_pe.deduplicated.sorted.bam.bai
curl ```
```{r, engine='bash'}
/data
cd ..-O https://gannet.fish.washington.edu/seashell/bu-mox/data/Cvirg-genome/GCF_002022765.2_C_virginica-3.0_genomic.fa
curl -O https://gannet.fish.washington.edu/seashell/bu-mox/data/Cvirg-genome/GCF_002022765.2_C_virginica-3.0_genomic.fa.fai
curl ```
Visualize with tview
Important
Run the following in Terminal as is interactive
/home/shared/samtools-1.12/samtools tview \
\
../data/19F_R1_val_1_bismark_bt2_pe.deduplicated.sorted.bam ../data/GCF_002022765.2_C_virginica-3.0_genomic.fa
Capture Image
Take a screen shot of the tview
display and place in your notebook.
Task II
Aligning WGS data and visualizing in IGV
```{r, engine='bash'}
/data
cd ..-O https://owl.fish.washington.edu/nightingales/C_gigas/F143n08_R2_001.fastq.gz
curl -O https://owl.fish.washington.edu/nightingales/C_gigas/F143n08_R1_001.fastq.gz
curl ```
```{r, engine='bash'}
/data
cd ..-O https://gannet.fish.washington.edu/panopea/Cg-roslin/cgigas_uk_roslin_v1_genomic-mito.fa
curl -O https://gannet.fish.washington.edu/panopea/Cg-roslin/cgigas_uk_roslin_v1_genomic-mito.fa.fai
curl -O https://gannet.fish.washington.edu/panopea/Cg-roslin/GCF_902806645.1_cgigas_uk_roslin_v1_genomic-mito.gtf
curl ```
Alignment
```{r, engine='bash'}
/home/shared/hisat2-2.2.1/hisat2-build \
-f ../data/cgigas_uk_roslin_v1_genomic-mito.fa \
/output/cgigas_uk_roslin_v1_genomic-mito.index
..```
```{r, engine='bash'}
/home/shared/hisat2-2.2.1/hisat2 \
-x ../output/cgigas_uk_roslin_v1_genomic-mito.index \
-p 4 \
-1 ../data/F143n08_R1_001.fastq.gz \
-2 ../data/F143n08_R2_001.fastq.gz \
-S ../output/F143_cgigas.sam
```
Take a look
```{r, engine='bash'}
-1 ../output/F143_cgigas.sam
tail ```
```{r, engine='bash'}
# Convert SAM to BAM, using 4 additional threads
/home/shared/samtools-1.12/samtools view -@ 4 -bS \
/output/F143_cgigas.sam > ../output/F143_cgigas.bam
..```
```{r, engine='bash'}
# Sort the BAM file, using 4 additional threads
/home/shared/samtools-1.12/samtools sort -@ 4 \
/output/F143_cgigas.bam -o ../output/F143_cgigas_sorted.bam
..
# Index the sorted BAM file (multi-threading is not applicable to this operation)
/home/shared/samtools-1.12/samtools index \
/output/F143_cgigas_sorted.bam
..```
mpileup
Now bcftools is recommended for mpileup instead of samtools (which was described in textbook)
```{r, engine='bash'}
/home/shared/bcftools-1.14/bcftools mpileup --threads 4 --no-BAQ \
--fasta-ref ../data/cgigas_uk_roslin_v1_genomic-mito.fa \
/output/F143_cgigas_sorted.bam > ../output/F143_mpileup_output.txt
..```
```{r, engine='bash'}
/output/F143_mpileup_output.txt
tail ..```
```{r, engine='bash'}
/output/F143_mpileup_output.txt \
cat ..| /home/shared/bcftools-1.14/bcftools call -mv -Oz \
> ../output/F143_mpile.vcf.gz
```
```{r, engine='bash'}
"^##" -v ../output/F143_mpile.vcf.gz | \
zgrep 'BEGIN{OFS="\t"} {split($8, a, ";"); print $1,$2,$4,$5,$6,a[1],$9,$10}' | head
awk
```
The code below might not work. That is fine. The VCF in the above chunk can be used for visualization in IGV.
```{r, engine='bash'}
/home/shared/bcftools-1.14/bcftools call \
-v -c ../output/F143_mpile.vcf.gz \
> ../output/F143_mpile_calls.vcf
```
Visualize
these data in IGV and get a few cool snapshots.
Minimally show bam file, and at least 2 genome feature files.
Bonus for annotating screenshots.
Useful link: https://robertslab.github.io/resources/Genomic-Resources/#crassostrea-gigas-cgigas_uk_roslin_v1