CG Motifs
big week - easy assignment
Amnesty
This week you can turn in all missing assignments to be assessed. Indicate your decision to do this and which specific assignments here
CG Motifs
For this you will take 10 sequences related to your project, ID CG motifs using the emboss package: fuzznuc
, and visualize in IGV. You do not have to follow this workflow exactly, but it is provided here for guidance. This uses R package seqinr
.
```{r}
library(seqinr)
# Replace 'input.fasta' with the name of your multi-sequence fasta file
<- "input.fasta"
input_file <- read.fasta(input_file)
sequences
```
```{r}
# Set the seed for reproducibility (optional)
set.seed(42)
<- 10
number_of_sequences_to_select
if (length(sequences) < number_of_sequences_to_select) {
warning("There are fewer than 10 sequences in the fasta file. All sequences will be selected.")
<- length(sequences)
number_of_sequences_to_select
}
<- sample(length(sequences), number_of_sequences_to_select)
selected_indices <- sequences[selected_indices]
selected_sequences
```
```{r}
# Replace 'output.fasta' with your desired output file name
<- "../output/output.fasta"
output_file write.fasta(selected_sequences, names(selected_sequences), output_file, open = "w")
```
```{bash}
#likely will not need; fix issue where gff and fa name did not match
# sed -i 's/>lcl|/>/g' ../output/10_seqs.fa
```
```{bash}
#needed downstream for IGV
/home/shared/samtools-1.12/samtools faidx \
../output/10_seqs.fa```
```{bash}
fuzznuc -sequence ../output/10_seqs.fa -pattern CG -rformat gff -outfile ../output/CGoutput.gff
```
Push these files to GitHub. Grab raw urls to visualize in IGV. Fasta file is the “genome”. Take 2 screenshots and place in code file. At the top of your code page be sure to provide link to visual report (rpubs). Alternatively you can also output to markdown.