Each organism appears to prefer certain equivalent codons (i.e., they code for the same amino acid) over others. In this exercise you will write a Python program to analyze the codon biases of the human genome based on a sampling of genes. Your program should accept a list of FASTA format files, each of which encodes an mRNA nucleotide sequence for a particular gene. Each file is followed by an integer offset denoting the nucleotide position of the start code. The output of the program should be a list of each amino acid and "STOP" and for each a list of associated codons and the percentage each was used in aggregate in the input sequences.
Your program should examine the following four mRNA sequences (with start codon offset in parentheses): insulin (60), hemoglobinB (51), rhodopsin (96), and collagen1 (127). So your command line should look as follows:
python MyProg.py insulin.fa 60 hemoglobinB.fa 51 rhodopsin.fa 96 collagen1.fa 127
The .fa files are available here.