BLAST: User Question and Answer
Course Home Modules Schedule Exercises Comments Credits
Problem Summary:

Identify the possible function of an mRNA sequence

  Sample User Question
Analysis/Comments
Flow Chart
Additional Notes
 

Sample User Question back to top

 

The research group you are helping has obtained the following mRNA sequence for a gene associated with colon cancer:

What function might this gene have?

Nucleotide sequence:
ATTGGCTGAAGGCACTTCCGTTGAGCATCTAGACGTTTCCTTGGCTCTTCTGGCGCCAAAATGTCGTTCG
TGGCAGGGGTTATTCGGCGGCTGGACGAGACAGTGGTGAACCGCATCGCGGCGGGGGAAGTTATCCAGCG
GCCAGCTAATGCTATCAAAGAGATGATTGAGAACTGTTTAGATGCAAAATCCACAAGTATTCAAGTGATT
GTTAAAGAGGGAGGCCTGAAGTTGATTCAGATCCAAGACAATGGCACCGGGATCAGGAAAGAAGATCTGG
ATATTGTATGTGAAAGGTTCACTACTAGTAAACTGCAGTCCTTTGAGGATTTAGCCAGTATTTCTACCTA
TGGCTTTCGAGGTGAGGCTTTGGCCAGCATAAGCCATGTGGCTCATGTTACTATTACAACGAAAACAGCT
GATGGAAAGTGTGCATACAGAGCAAGTTACTCAGATGGAAAACTGAAAGCCCCTCCTAAACCATGTGCTG
GCAATCAAGGGACCCAGATCACGGTGGAGGACCTTTTTTACAACATAGCCACGAGGAGAAAAGCTTTAAA
AAATCCAAGTGAAGAATATGGGAAAATTTTGGAAGTTGTTGGCAGGTATTCAGTACACAATGCAGGCATT
AGTTTCTCAGTTAAAAAACAAGGAGAGACAGTAGCTGATGTTAGGACACTACCCAATGCCTCAACCGTGG
ACAATATTCGCTCCATCTTTGGAAATGCTGTTAGTCGAGAACTGATAGAAATTGGATGTGAGGATAAAAC
CCTAGCCTTCAAAATGAATGGTTACATATCCAATGCAAACTACTCAGTGAAGAAGTGCATCTTCTTACTC
TTCATCAACCATCGTCTGGTAGAATCAACTTCCTTGAGAAAAGCCATAGAAACAGTGTATGCAGCCTATT
TGCCCAAAAACACACACCCATTCCTGTACCTCAGTTTAGAAATCAGTCCCCAGAATGTGGATGTTAATGT
GCACCCCACAAAGCATGAAGTTCACTTCCTGCACGAGGAGAGCATCCTGGAGCGGGTGCAGCAGCACATC
GAGAGCAAGCTCCTGGGCTCCAATTCCTCCAGGATGTACTTCACCCAGACTTTGCTACCAGGACTTGCTG
GCCCCTCTGGGGAGATGGTTAAATCCACAACAAGTCTGACCTCGTCTTCTACTTCTGGAAGTAGTGATAA
GGTCTATGCCCACCAGATGGTTCGTACAGATTCCCGGGAACAGAAGCTTGATGCATTTCTGCAGCCTCTG
AGCAAACCCCTGTCCAGTCAGCCCCAGGCCATTGTCACAGAGGATAAGACAGATATTTCTAGTGGCAGGG
CTAGGCAGCAAGATGAGGAGATGCTTGAACTCCCAGCCCCTGCTGAAGTGGCTGCCAAAAATCAGAGCTT
GGAGGGGGATACAACAAAGGGGACTTCAGAAATGTCAGAGAAGAGAGGACCTACTTCCAGCAACCCCAGA
AAGAGACATCGGGAAGATTCTGATGTGGAAATGGTGGAAGATGATTCCCGAAAGGAAATGACTGCAGCTT
GTACCCCCCGGAGAAGGATCATTAACCTCACTAGTGTTTTGAGTCTCCAGGAAGAAATTAATGAGCAGGG
ACATGAGGTTCTCCGGGAGATGTTGCATAACCACTCCTTCGTGGGCTGTGTGAATCCTCAGTGGGCCTTG
GCACAGCATCAAACCAAGTTATACCTTCTCAACACCACCAAGCTTAGTGAAGAACTGTTCTACCAGATAC
TCATTTATGATTTTGCCAATTTTGGTGTTCTCAGGTTATCGGAGCCAGCACCGCTCTTTGACCTTGCCAT
GCTTGCCTTAGATAGTCCAGAGAGTGGCTGGACAGAGGAAGATGGTCCCAAAGAAGGACTTGCTGAATAC
ATTGTTGAGTTTCTGAAGAAGAAGGCTGAGATGCTTGCAGACTATTTCTCTTTGGAAATTGATGAGGAAG
GGAACCTGATTGGATTACCCCTTCTGATTGACAACTATGTGCCCCCTTTGGAGGGACTGCCTATCTTCAT
TCTTCGACTAGCCACTGAGGTGAATTGGGACGAAGAAAAGGAATGTTTTGAAAGCCTCAGTAAAGAATGC
GCTATGTTCTATTCCATCCGGAAGCAGTACATATCTGAGGAGTCGACCCTCTCAGGCCAGCAGAGTGAAG
TGCCTGGCTCCATTCCAAACTCCTGGAAGTGGACTGTGGAACACATTGTCTATAAAGCCTTGCGCTCACA
CATTCTGCCTCCTAAACATTTCACAGAAGATGGAAATATCCTGCAGCTTGCTAACCTGCCTGATCTATAC
AAAGTCTTTGAGAGGTGTTAAATATGGTTATTTATGCACTGTGGGATGTGTTCTTCTTTCTCTGTATTCC
GATACAAAGTGTTGTATCAAAGTGTGATATACAAAGTGTACCAACATAAGTGTTGGTAGCACTTAAGACT
TATACTTGCCTTCTGATAGTATTCCTTTATACACAGTGGATTGATTATAAATAAATAGATGTGTCTTAAC
ATAA

AA translation:
MSFVAGVIRRLDETVVNRIAAGEVIQRPANAIKEMIENCLDAKSTSIQVIVKEGGLKLIQIQDNGTGIRK
EDLDIVCERFTTSKLQSFEDLASISTYGFRGEALASISHVAHVTITTKTADGKCAYRASYSDGKLKAPPK
PCAGNQGTQITVEDLFYNIATRRKALKNPSEEYGKILEVVGRYSVHNAGISFSVKKQGETVADVRTLPNA
STVDNIRSIFGNAVSRELIEIGCEDKTLAFKMNGYISNANYSVKKCIFLLFINHRLVESTSLRKAIETVY
AAYLPKNTHPFLYLSLEISPQNVDVNVHPTKHEVHFLHEESILERVQQHIESKLLGSNSSRMYFTQTLLP
GLAGPSGEMVKSTTSLTSSSTSGSSDKVYAHQMVRTDSREQKLDAFLQPLSKPLSSQPQAIVTEDKTDIS
SGRARQQDEEMLELPAPAEVAAKNQSLEGDTTKGTSEMSEKRGPTSSNPRKRHREDSDVEMVEDDSRKEM
TAACTPRRRIINLTSVLSLQEEINEQGHEVLREMLHNHSFVGCVNPQWALAQHQTKLYLLNTTKLSEELF
YQILIYDFANFGVLRLSEPAPLFDLAMLALDSPESGWTEEDGPKEGLAEYIVEFLKKKAEMLADYFSLEI
DEEGNLIGLPLLIDNYVPPLEGLPIFILRLATEVNWDEEKECFESLSKECAMFYSIRKQYISEESTLSGQ
QSEVPGSIPNSWKWTVEHIVYKALRSHILPPKHFTEDGNILQLANLPDLYKVFERC
   
 

Analysis/Comments back to top

When a lab sequences a gene, one of the first things the lab does is compare the sequence they obtained against known sequences, to shed light on the possible identity and function of their sequence.

You can use either the nucleotide sequence or protein translation as your query, although a protein query sequence generally produces more specific results. This is true because of the degeneracy in the genetic code, in which multiple codons can code for the same amino acid.

Flow Chart back to top

Try a blastn search, using the nucleotide sequence above as the query and comparing it against the non-redundant (nr) database.

Then try a blastp search, using the protein sequence above as the query and comparing it against the non-redundant database or the Swiss-Prot database. (You might want to searching against each database and comparing the results.)

Additional Notes back to top

Both the blastn and blastp search will reveal a high similarity between the query sequence and DNA mismatch repair genes/proteins from a variety of organisms.

If you BLAST the sequence for the RefSeq protein record for human MLH1 against the SwissProt data set (which generally provides concise results), you can easily see that many of the top hits are to organisms that don't even have a colon! However, those hits provide valuable insight into the putative function of the human MLH1 gene -- a function that is carried out in organisms independent of whether they have colons.


BLAST User Question Return to Slides Revised 11/05/2007
Return to Colon Cancer Umbrella Page