CG Motifs

big week - easy assignment

Assignment
  1. Identify any prior assignments you would like regraded (this week only). 2) Visualize CG motifs in 10 of your sequences.

Amnesty

This week you can turn in all missing assignments to be assessed. Indicate your decision to do this and which specific assignments here

CG Motifs

For this you will take 10 sequences related to your project, ID CG motifs using the emboss package: fuzznuc, and visualize in IGV. You do not have to follow this workflow exactly, but it is provided here for guidance. This uses R package seqinr.


```{r}
library(seqinr)

# Replace 'input.fasta' with the name of your multi-sequence fasta file
input_file <- "input.fasta"
sequences <- read.fasta(input_file)

```


```{r}
# Set the seed for reproducibility (optional)
set.seed(42)

number_of_sequences_to_select <- 10

if (length(sequences) < number_of_sequences_to_select) {
  warning("There are fewer than 10 sequences in the fasta file. All sequences will be selected.")
  number_of_sequences_to_select <- length(sequences)
}

selected_indices <- sample(length(sequences), number_of_sequences_to_select)
selected_sequences <- sequences[selected_indices]

```


```{r}
# Replace 'output.fasta' with your desired output file name
output_file <- "../output/output.fasta"
write.fasta(selected_sequences, names(selected_sequences), output_file, open = "w")
```


```{bash}
#likely will not need; fix issue where gff and fa name did not match
# sed -i 's/>lcl|/>/g' ../output/10_seqs.fa
```


```{bash}
#needed downstream for IGV
/home/shared/samtools-1.12/samtools faidx \
../output/10_seqs.fa
```


```{bash}
fuzznuc -sequence ../output/10_seqs.fa -pattern CG -rformat gff -outfile ../output/CGoutput.gff
```

Push these files to GitHub. Grab raw urls to visualize in IGV. Fasta file is the “genome”. Take 2 screenshots and place in code file. At the top of your code page be sure to provide link to visual report (rpubs). Alternatively you can also output to markdown.