Comp 555: BioAlgorithms -- Fall 2012

Problem Set #1

Issued: Tue Sep 11   Due: In class Tue Sep 25

 


Homework Information: Some of the problems are probably too long to attempt the night before the due date, so plan accordingly. No late homework will accepted without agreement. Feel free to work with others, but the work you hand in should be your own.

Problems

Problem set 1

Programming Exercise

9/19 Updated start codon offset for rhodopsin sequence! 95 -> 96

Each organism appears to prefer certain equivalent codons (i.e., they code for the same amino acid) over others. In this exercise you will write a Python program to analyze the codon biases of the human genome based on a sampling of genes. Your program should accept a list of FASTA format files, each of which encodes an mRNA nucleotide sequence for a particular gene. Each file is followed by an integer offset denoting the nucleotide position of the start code. The output of the program should be a list of each amino acid and "STOP" and for each a list of associated codons and the percentage each was used in aggregate in the input sequences.

Your program should examine the following four mRNA sequences (with start codon offset in parentheses): insulin (60), hemoglobinB (51), rhodopsin (96), and collagen1 (127). So your command line should look as follows:

         python MyProg.py insulin.fa 60 hemoglobinB.fa 51 rhodopsin.fa 96 collagen1.fa 127
The .fa files are available here.


This page is maintained by prins@cs.unc.edu. Send mail if you find problems.