BEDtools
doing arithmetic with sequences
Assignment
Run some basic sub-commands in BEDtools.
Coverage
For this sub command you will be using the same data as in Week 6, so you should already have it in you data
directory. If not or if you deleted it…
cd ../data
curl -O https://gannet.fish.washington.edu/seashell/bu-mox/scrubbed/120321-cvBS/19F_R1_val_1_bismark_bt2_pe.deduplicated.sorted.bam
curl -O https://gannet.fish.washington.edu/seashell/bu-mox/scrubbed/120321-cvBS/19F_R1_val_1_bismark_bt2_pe.deduplicated.sorted.bam.bai
You will also need a bed file with gene information.
cd ../data
curl -O https://eagle.fish.washington.edu/Cvirg_tracks/C_virginica-3.0_Gnomon_genes.bed
Convert bam to bed
/home/shared/bedtools2/bin/bedtools bamtobed \
\
-i ../data/19F_R1_val_1_bismark_bt2_pe.deduplicated.sorted.bam > ../output/08-19F.bed
Warning
This is big file. What should you not do with it?
Get coverage of sequence reads on gene regions
Default behavior
After each interval in A, bedtools coverage
will report:
The number of features in B that overlapped (by at least one base pair) the A interval.
The number of bases in A that had non-zero coverage from features in B.
The length of the entry in A.
The fraction of bases in A that had non-zero coverage from features in B.
/home/shared/bedtools2/bin/bedtools coverage \
\
-a ../data/C_virginica-3.0_Gnomon_genes.bed \
-b ../output/08-19F.bed > ../output/08-gene-19F-coverage.out
Intersect
Lets grab a bed file of Transposable Elements and lncRNAs
cd ../data
curl -O http://owl.fish.washington.edu/halfshell/genomic-databank/cgigas_uk_roslin_v1_gene.gff
curl -O http://owl.fish.washington.edu/halfshell/genomic-databank/cgigas_uk_roslin_v1_rm.te.bed
curl -O http://owl.fish.washington.edu/halfshell/genomic-databank/cgigas_uk_roslin_v1_lncRNA.gff
/home/shared/bedtools2/bin/bedtools intersect \
\
-a ../data/cgigas_uk_roslin_v1_gene.gff \
-b ../data/cgigas_uk_roslin_v1_rm.te.bed > ../output/08-gene-TE-intersect.out
head -2 ../output/08-gene-TE-intersect.out
Closest
/home/shared/bedtools2/bin/bedtools closest \
\
-a ../data//cgigas_uk_roslin_v1_lncRNA.gff\
-b ./data/cgigas_uk_roslin_v1_gene.gff > ../output/08-lnc-gene-closet.out