It is best to decide on central location on computer where software will be downloaded. Note for our purposes SOFTWARE IS ALREADY INSTALLED! Below is just an example of how you would install.
Later we will need to convert sp|Q08013|SSRG_RAT to get accession number out.
Getting more information
https://www.uniprot.org/uniprotkb
Joining blast table with annotation table
At this point we have a blast output table and annotation table both with a Uniprot accession number. Thus we can join the two tables and be able to get more functional information about thet genes.
---title: "NCBI Blast"subtitle: "Taking a set of unknown sequence files and annotating them"format: html: code-fold: false code-tools: true code-copy: true highlight-style: github code-overflow: wrap---::: callout-important## AssignmentAnnotate a provided, unknown fasta file with Gene Ontology terms. Your code should be fully reproducible.:::see also https://rpubs.com/sr320/1026094In your code directory create a file.`01-blast.Rmd`::: callout-tipRmarkdown is a good option as you can use markdown, add pictures and more!:::# Downloading softwareNCBI Blast Software is at<https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/>::: callout-tipIt is best to decide on central location on computer where software will be downloaded. Note for our purposes SOFTWARE IS ALREADY INSTALLED! Below is just an example of how you would install.:::```{bash}/home/shared/ncbi-blast-2.15.0+/bin/blastx-h```# Make Blast Databasesee <https://www.uniprot.org/downloads>https://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.fasta.gz``` bashcd ../datacurl-O https://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.fasta.gzmv uniprot_sprot.fasta.gz uniprot_sprot_r2023_05.fasta.gzgunzip-k uniprot_sprot_r2023_05.fasta.gzls ../data`````` bash/home/shared/ncbi-blast-2.15.0+/bin/makeblastdb\-in ../data/uniprot_sprot_r2023_05.fasta \-dbtype prot \-out ../blastdb/uniprot_sprot_r2023_05```# Get the query sequence``` bashcurl https://eagle.fish.washington.edu/cnidarian/Ab_4denovo_CLC6_a.fa \-k \> ../data/Ab_4denovo_CLC6_a.fa``````{bash}head ../data/Ab_4denovo_CLC6_a.faecho"How many sequences are there?"grep-c">" ../data/Ab_4denovo_CLC6_a.fa```# Run Blast``` bash/home/shared/ncbi-blast-2.15.0+/bin/blastx\-query ../data/Ab_4denovo_CLC6_a.fa \-db ../blastdb/uniprot_sprot_r2023_05 \-out ../output/Ab_4-uniprot_blastx.tab \-evalue 1E-20 \-num_threads 2 \-max_target_seqs 1 \-outfmt 6``````{bash}head-2 ../output/Ab_4-uniprot_blastx.tabwc-l ../output/Ab_4-uniprot_blastx.tab```Later we will need to convert `sp|Q08013|SSRG_RAT` to get accession number out.# Getting more information```{r schemat, echo = FALSE, out.width = "70%", fig.align = "center"}knitr::include_graphics("../img/uniprot.png")```https://www.uniprot.org/uniprotkb```{r api, echo = FALSE, out.width = "70%", fig.align = "center"}knitr::include_graphics("../img/uniprot-api.png")```## Joining blast table with annotation tableAt this point we have a blast output table and annotation table both with a Uniprot accession number. Thus we can join the two tables and be able to get more functional information about thet genes.```{bash}head-2 ../output/Ab_4-uniprot_blastx.tabwc-l ../output/Ab_4-uniprot_blastx.tab``````{bash}tr'|''\t'< ../output/Ab_4-uniprot_blastx.tab |head-2``````{bash}tr'|''\t'< ../output/Ab_4-uniprot_blastx.tab \> ../output/Ab_4-uniprot_blastx_sep.tab``````{bash}head-2 ../data/uniprot_table_r2023_05.tabwc-l ../data/uniprot_table_r2023_05.tab``````{r}library(tidyverse)``````{r}bltabl <-read.csv("../output/Ab_4-uniprot_blastx_sep.tab", sep ='\t', header =FALSE)``````{bash, eval=FALSE}cd ../datacurl-O https://gannet.fish.washington.edu/seashell/snaps/uniprot_table_r2023_05.tab``````{r, eval=FALSE, cache=TRUE}spgo <-read.csv("../data/uniprot_table_r2023_05.tab", sep ='\t', header =TRUE)``````{r, eval=FALSE}str(spgo)``````{r, eval=FALSE}left_join(bltabl, spgo, by =c("V3"="Entry")) %>%select(V1, V3, V13, Protein.names, Organism, Gene.Ontology..biological.process., Gene.Ontology.IDs) %>%mutate(V1 =str_replace_all(V1, pattern ="solid0078_20110412_FRAG_BC_WHITE_WHITE_F3_QV_SE_trimmed", replacement ="Ab"))```Code that joins AND writes out a tab delim table```{r, eval=FALSE}left_join(bltabl, spgo, by =c("V3"="Entry")) %>%select(V1, V3, V13, Protein.names, Organism, Gene.Ontology..biological.process., Gene.Ontology.IDs) %>%mutate(V1 =str_replace_all(V1, pattern ="solid0078_20110412_FRAG_BC_WHITE_WHITE_F3_QV_SE_trimmed", replacement ="Ab")) %>%write_delim("../output/blast_annot_go.tab", delim ='\t')```