Determining gene methylaiton

Here is some code for getting gene methylation. Will also add to handbook.

Will be using 10x coverage bedgraphs as spit out of our Bismark pipeline.

cd ../data/big

wget -r \
--no-check-certificate \
--quiet \
--no-directories --no-parent \
-P . \
-A R1_val_1_10x.bedgraph \
https://gannet.fish.washington.edu/seashell/bu-mox/scrubbed/120321-cvBS/

Then taking these files and intersecting with genes in genome.

 cd ../data/big/

 FILES=$(ls *bedgraph)

 cd -

 for file in ${FILES}
 do
    NAME=$(echo ${file} | awk -F "_" '{print $1}')
    echo ${NAME}
   /home/shared/bedtools2/bin/intersectBed \
   -wb \
   -a ../data/big/${NAME}_R1_val_1_10x.bedgraph \
   -b ../genome-features/C_virginica-3.0_Gnomon_genes.bed \
   | awk -v name=$NAME -v OFS="\t" '{ print $0, name}' \
   > ../output/40-gene-methylation/${NAME}_mGene.out
 done  


Individual output files look like..

head ../output/40-gene-methylation/36F_mGene.out
NC_035780.1	13597	13599	0.000000	NC_035780.1	13578	14594	gene-LOC111116054	0	+	36F
NC_035780.1	13725	13727	0.000000	NC_035780.1	13578	14594	gene-LOC111116054	0	+	36F
NC_035780.1	14144	14146	3.703704	NC_035780.1	13578	14594	gene-LOC111116054	0	+	36F
NC_035780.1	14430	14432	46.666667	NC_035780.1	13578	14594	gene-LOC111116054	0	+	36F

and then smashing all together

cat ../output/40-gene-methylation/*_mGene.out > ../output/40-gene-methylation/meth_all-samples.out

Then into tidyverse

meth_all <- read.delim("../output/40-gene-methylation/meth_all-samples.out", header = FALSE)
gm <- meth_all %>%
   mutate(art = paste(V8, V11, sep = '_')) %>%
   group_by(art) %>%
   summarize(avg_meth = mean(V4))


mc <- inner_join(gm, ic, by = "art") %>%
   separate(art, into = c("gene", "sample"), sep = "_") %>%
   separate(sample, into = c("number", "sex"), sep = -1)

Written on August 16, 2022