BEDtools

doing arithmetic with sequences

Assignment

Run some basic sub-commands in BEDtools.

Coverage

For this sub command you will be using the same data as in Week 6, so you should already have it in you data directory. If not or if you deleted it…

cd ../data 
curl -O https://gannet.fish.washington.edu/seashell/bu-mox/scrubbed/120321-cvBS/19F_R1_val_1_bismark_bt2_pe.deduplicated.sorted.bam 
curl -O https://gannet.fish.washington.edu/seashell/bu-mox/scrubbed/120321-cvBS/19F_R1_val_1_bismark_bt2_pe.deduplicated.sorted.bam.bai

You will also need a bed file with gene information.

cd ../data 
curl -O https://eagle.fish.washington.edu/Cvirg_tracks/C_virginica-3.0_Gnomon_genes.bed

Convert bam to bed

/home/shared/bedtools2/bin/bedtools bamtobed \
-i ../data/19F_R1_val_1_bismark_bt2_pe.deduplicated.sorted.bam \
> ../output/08-19F.bed
Warning

This is big file. What should you not do with it?

Get coverage of sequence reads on gene regions

Default behavior

After each interval in A, bedtools coverage will report:

  1. The number of features in B that overlapped (by at least one base pair) the A interval.

  2. The number of bases in A that had non-zero coverage from features in B.

  3. The length of the entry in A.

  4. The fraction of bases in A that had non-zero coverage from features in B.

/home/shared/bedtools2/bin/bedtools coverage \
-a ../data/C_virginica-3.0_Gnomon_genes.bed \
-b ../output/08-19F.bed \
> ../output/08-gene-19F-coverage.out

Intersect

Lets grab a bed file of Transposable Elements and lncRNAs

cd ../data
curl -O http://owl.fish.washington.edu/halfshell/genomic-databank/cgigas_uk_roslin_v1_gene.gff 
curl -O http://owl.fish.washington.edu/halfshell/genomic-databank/cgigas_uk_roslin_v1_rm.te.bed 
curl -O http://owl.fish.washington.edu/halfshell/genomic-databank/cgigas_uk_roslin_v1_lncRNA.gff
/home/shared/bedtools2/bin/bedtools intersect \
-a ../data/cgigas_uk_roslin_v1_gene.gff \
-b ../data/cgigas_uk_roslin_v1_rm.te.bed \
> ../output/08-gene-TE-intersect.out

head -2 ../output/08-gene-TE-intersect.out

Closest

/home/shared/bedtools2/bin/bedtools closest \
-a ../data//cgigas_uk_roslin_v1_lncRNA.gff\
-b ./data/cgigas_uk_roslin_v1_gene.gff \
> ../output/08-lnc-gene-closet.out