Exercises

  1. -- Print Strings --

    Write a series of print statements that returns the following (include a blank line between each answer):

    1. Post hoc ergo propter hoc
    2. What’s up with scientists using all of this snooty latin?
    3. 'atgcatgcatgcatgcatgcatgcatgcatgcatgcatgcatgcatgcatgcatgcatgc'. Do this using the * operator to make 15 copies of 'atgc'.
    4. Darwin’s “On the origin of species” is a seminal work in biology.
    [click here for output]
  2. -- string Functions --

    Use functions from the string module or from base Python to print the following strings.

    1. 'species' in all capital letters
    2. 'gcagtctgaggattccaccttctacctgggagagaggacatactatatcgcagcagtggaggtggaatgg' with all of the occurrences of 'a' replaced with 'A'
    3. ”    Thank goodness it’s Friday” without the leading white space (i.e., without the spaces before Thank)
    4. The number of 'a's in 'gccgatgtacatggaatatacttttcaggaaacacatatctgtggagagg'.
    5. Print the length of this dna sequence 'gccgatgtacatggaatatacttttcaggaaacacatatctgtggagagg'
    [click here for output]
  3. -- string Methods --

    Use string methods to print the following strings. Remember that methods work by adding the function to the end of the object name using a ., like

    mystring = 'Hello World'
    print mystring.lower()
    
    1. 'species' in all capital letters
    2. 'gcagtctgaggattccaccttctacctgggagagaggacatactatatcgcagcagtggaggtggaatgg' with all of the occurences of 'a' replaced with 'A'
    3. "    Thank goodness it's Friday" without the leading white space (i.e., without the spaces before "Thank")
    4. The number of 'a's in 'gccgatgtacatggaatatacttttcaggaaacacatatctgtggagagg'.
    [click here for output]
  4. -- Long Strings --

    For the DNA sequence below determine the following properties and print them to the screen (you can cut and paste the following into your code, it’s a lot longer than you can see on the screen, but just select the whole thing and when you paste it into Python you’ll see what it looks like):

    dna='ttcacctatgaatggactgtccccaaagaagtaggacccactaatgcagatcctgtgtgtctagctaagatgtattattctgctgtggatcccactaaagatatattcactgggcttattgggccaatgaaaatatgcaagaaaggaagtttacatgcaaatgggagacagaaagatgtagacaaggaattctatttgtttcctacagtatttgatgagaatgagagtttactcctggaagataatattagaatgtttacaactgcacctgatcaggtggataaggaagatgaagactttcaggaatctaataaaatgcactccatgaatggattcatgtatgggaatcagccgggtctcactatgtgcaaaggagattcggtcgtgtggtacttattcagcgccggaaatgaggccgatgtacatggaatatacttttcaggaaacacatatctgtggagaggagaacggagagacacagcaaacctcttccctcaaacaagtcttacgctccacatgtggcctgacacagaggggacttttaatgttgaatgccttacaactgatcattacacaggcggcatgaagcaaaaatatactgtgaaccaatgcaggcggcagtctgaggattccaccttctacctgggagagaggacatactatatcgcagcagtggaggtggaatgggattattccccacaaagggagtgggattaggagctgcatcatttacaagagcagaatgtttcaaatgcatttttagataagggagagttttacataggctcaaagtacaagaaagttgtgtatcggcagtatactgatagcacattccgtgttccagtggagagaaaagctgaagaagaacatctgggaattctaggtccacaacttcatgcagatgttggagacaaagtcaaaattatctttaaaaacatggccacaaggccctactcaatacatgcccatggggtacaaacagagagttctacagttactccaacattaccaggtaaactctcacttacgtatggaaaatcccagaaagatctggagctggaacagaggattctgcttgtattccatgggcttattattcaactgtggatcaagttaaggacctctacagtggattaattggccccctgattgtttgtcgaagaccttacttgaaagtattcaatcccagaaggaagctggaatttgcccttctgtttctagtttttgatgagaatgaatcttggtacttagatgacaacatcaaaacatactctgatcaccccgagaaagtaaacaaagatgatgaggaattcatagaaagcaataaaatgcatgctattaatggaagaatgtttggaaacct'

    1. How many occurrences of 'gagg' occur in the sequence?
    2. What is the starting position of the first occurrence of 'atta'? Report the actual base pair position as a human would understand it.
    3. How long is the sequence?
    4. What is the GC content of the sequence? The GC content is the percentage of bases that are either G or C (as a percentage of total base pairs) Print the result as “The GC content of this sequence is XX.XX%” where XX.XX is the actual GC content. Do this using a formatted string.
    [click here for output]
  5. -- GC Content 1 --

    A colleague has produced a file with one DNA sequence on each line. Download the file and load it into Python using numpy.loadtxt(). You will need to use the optional argument dtype = str to tell loadtxt() that the data is composed of strings.

    Calculate the GC content of each sequence. The GC content is the percentage of bases that are either G or C (as a percentage of total base pairs). Print the result for each sequence as “The GC content of the sequence is XX.XX%” where XX.XX is the actual GC content.

    [click here for output]
  6. -- Split Strings --

    You have a data file with a single taxonomy column in it. This column contains the family, genus, and species for a single taxonomic group. You need to figure out how to split that information into separate values for family, genus, and species. To solve the basic problem take a single example string, 'Ornithorhynchidae Ornithorhynchus anatinus', split it into three separate strings using a Python command, and then print the family, genus, and species, each on a separate line.

    [click here for output]