Introduction to Molecular Biology Information Resources
Course Home Modules Schedule Exercises Comments Credits

Colon Cancer Gene Exercises

Use various NCBI resources to gather a wide range of information about MLH1, a human gene associated with colon cancer.

  Background
Format of Sequence Records
Entrez
BLAST
Conserved Domain Search (CD-Search)
Identify known variations in the MLH1 gene
Structures
Genomes and Maps
Genes
Learn More
 

Background back to top

NCBI's online book about Genes and Disease provides an introduction to the genetic factors underlying cancers in general, and includes a section on colon cancer.

Online Mendelian Inheritance in Man (OMIM) provides detailed information about MLH1, a gene associated with familial nonpolyposis type 2 colon cancer. MLH1 is the subject of these exercises.

Format of Sequence Records back to top

Compare the primary (archival) vs. reference (curated) sequence records for human MLH1

 
 
Primary (Archival)
Sequence Record
Reference (Curated)
Sequence Record
Nucleotide
U07343
NM_000249
Protein
AAC50285
NP_000240
 

    Answer
Entrez back to top

Find concise summary of genes pertaining to human colon cancer

 
I would like to retrieve a concise, non-redundant list of human mRNA sequences associated with colon cancer.
 

    Answer
BLAST back to top

Identify the possible function of an mRNA sequence

 

The research group you are helping has obtained the following mRNA sequence for a gene associated with colon cancer:

What function might this gene have?

Nucleotide sequence:
ATTGGCTGAAGGCACTTCCGTTGAGCATCTAGACGTTTCCTTGGCTCTTCTGGCGCCAAAATGTCGTTCG
TGGCAGGGGTTATTCGGCGGCTGGACGAGACAGTGGTGAACCGCATCGCGGCGGGGGAAGTTATCCAGCG
GCCAGCTAATGCTATCAAAGAGATGATTGAGAACTGTTTAGATGCAAAATCCACAAGTATTCAAGTGATT
GTTAAAGAGGGAGGCCTGAAGTTGATTCAGATCCAAGACAATGGCACCGGGATCAGGAAAGAAGATCTGG
ATATTGTATGTGAAAGGTTCACTACTAGTAAACTGCAGTCCTTTGAGGATTTAGCCAGTATTTCTACCTA
TGGCTTTCGAGGTGAGGCTTTGGCCAGCATAAGCCATGTGGCTCATGTTACTATTACAACGAAAACAGCT
GATGGAAAGTGTGCATACAGAGCAAGTTACTCAGATGGAAAACTGAAAGCCCCTCCTAAACCATGTGCTG
GCAATCAAGGGACCCAGATCACGGTGGAGGACCTTTTTTACAACATAGCCACGAGGAGAAAAGCTTTAAA
AAATCCAAGTGAAGAATATGGGAAAATTTTGGAAGTTGTTGGCAGGTATTCAGTACACAATGCAGGCATT
AGTTTCTCAGTTAAAAAACAAGGAGAGACAGTAGCTGATGTTAGGACACTACCCAATGCCTCAACCGTGG
ACAATATTCGCTCCATCTTTGGAAATGCTGTTAGTCGAGAACTGATAGAAATTGGATGTGAGGATAAAAC
CCTAGCCTTCAAAATGAATGGTTACATATCCAATGCAAACTACTCAGTGAAGAAGTGCATCTTCTTACTC
TTCATCAACCATCGTCTGGTAGAATCAACTTCCTTGAGAAAAGCCATAGAAACAGTGTATGCAGCCTATT
TGCCCAAAAACACACACCCATTCCTGTACCTCAGTTTAGAAATCAGTCCCCAGAATGTGGATGTTAATGT
GCACCCCACAAAGCATGAAGTTCACTTCCTGCACGAGGAGAGCATCCTGGAGCGGGTGCAGCAGCACATC
GAGAGCAAGCTCCTGGGCTCCAATTCCTCCAGGATGTACTTCACCCAGACTTTGCTACCAGGACTTGCTG
GCCCCTCTGGGGAGATGGTTAAATCCACAACAAGTCTGACCTCGTCTTCTACTTCTGGAAGTAGTGATAA
GGTCTATGCCCACCAGATGGTTCGTACAGATTCCCGGGAACAGAAGCTTGATGCATTTCTGCAGCCTCTG
AGCAAACCCCTGTCCAGTCAGCCCCAGGCCATTGTCACAGAGGATAAGACAGATATTTCTAGTGGCAGGG
CTAGGCAGCAAGATGAGGAGATGCTTGAACTCCCAGCCCCTGCTGAAGTGGCTGCCAAAAATCAGAGCTT
GGAGGGGGATACAACAAAGGGGACTTCAGAAATGTCAGAGAAGAGAGGACCTACTTCCAGCAACCCCAGA
AAGAGACATCGGGAAGATTCTGATGTGGAAATGGTGGAAGATGATTCCCGAAAGGAAATGACTGCAGCTT
GTACCCCCCGGAGAAGGATCATTAACCTCACTAGTGTTTTGAGTCTCCAGGAAGAAATTAATGAGCAGGG
ACATGAGGTTCTCCGGGAGATGTTGCATAACCACTCCTTCGTGGGCTGTGTGAATCCTCAGTGGGCCTTG
GCACAGCATCAAACCAAGTTATACCTTCTCAACACCACCAAGCTTAGTGAAGAACTGTTCTACCAGATAC
TCATTTATGATTTTGCCAATTTTGGTGTTCTCAGGTTATCGGAGCCAGCACCGCTCTTTGACCTTGCCAT
GCTTGCCTTAGATAGTCCAGAGAGTGGCTGGACAGAGGAAGATGGTCCCAAAGAAGGACTTGCTGAATAC
ATTGTTGAGTTTCTGAAGAAGAAGGCTGAGATGCTTGCAGACTATTTCTCTTTGGAAATTGATGAGGAAG
GGAACCTGATTGGATTACCCCTTCTGATTGACAACTATGTGCCCCCTTTGGAGGGACTGCCTATCTTCAT
TCTTCGACTAGCCACTGAGGTGAATTGGGACGAAGAAAAGGAATGTTTTGAAAGCCTCAGTAAAGAATGC
GCTATGTTCTATTCCATCCGGAAGCAGTACATATCTGAGGAGTCGACCCTCTCAGGCCAGCAGAGTGAAG
TGCCTGGCTCCATTCCAAACTCCTGGAAGTGGACTGTGGAACACATTGTCTATAAAGCCTTGCGCTCACA
CATTCTGCCTCCTAAACATTTCACAGAAGATGGAAATATCCTGCAGCTTGCTAACCTGCCTGATCTATAC
AAAGTCTTTGAGAGGTGTTAAATATGGTTATTTATGCACTGTGGGATGTGTTCTTCTTTCTCTGTATTCC
GATACAAAGTGTTGTATCAAAGTGTGATATACAAAGTGTACCAACATAAGTGTTGGTAGCACTTAAGACT
TATACTTGCCTTCTGATAGTATTCCTTTATACACAGTGGATTGATTATAAATAAATAGATGTGTCTTAAC
ATAA

AA translation:
MSFVAGVIRRLDETVVNRIAAGEVIQRPANAIKEMIENCLDAKSTSIQVIVKEGGLKLIQIQDNGTGIRK
EDLDIVCERFTTSKLQSFEDLASISTYGFRGEALASISHVAHVTITTKTADGKCAYRASYSDGKLKAPPK
PCAGNQGTQITVEDLFYNIATRRKALKNPSEEYGKILEVVGRYSVHNAGISFSVKKQGETVADVRTLPNA
STVDNIRSIFGNAVSRELIEIGCEDKTLAFKMNGYISNANYSVKKCIFLLFINHRLVESTSLRKAIETVY
AAYLPKNTHPFLYLSLEISPQNVDVNVHPTKHEVHFLHEESILERVQQHIESKLLGSNSSRMYFTQTLLP
GLAGPSGEMVKSTTSLTSSSTSGSSDKVYAHQMVRTDSREQKLDAFLQPLSKPLSSQPQAIVTEDKTDIS
SGRARQQDEEMLELPAPAEVAAKNQSLEGDTTKGTSEMSEKRGPTSSNPRKRHREDSDVEMVEDDSRKEM
TAACTPRRRIINLTSVLSLQEEINEQGHEVLREMLHNHSFVGCVNPQWALAQHQTKLYLLNTTKLSEELF
YQILIYDFANFGVLRLSEPAPLFDLAMLALDSPESGWTEEDGPKEGLAEYIVEFLKKKAEMLADYFSLEI
DEEGNLIGLPLLIDNYVPPLEGLPIFILRLATEVNWDEEKECFESLSKECAMFYSIRKQYISEESTLSGQ
QSEVPGSIPNSWKWTVEHIVYKALRSHILPPKHFTEDGNILQLANLPDLYKVFERC
   
 
    Answer
Conserved Domain Search (CD-Search) back to top

 
What conserved domains are found in the human MLH1 protein sequence above (from NP_000240)? What is the function of each one?
 

    Answer
Identify known variations in the MLH1 gene back to top

 
What mutations in the MLH1 gene or protein have been associated with colon cancer patients?
 

    Answer
Structures back to top

Find the 3-dimensional structure for a protein of interest or for similar protein sequences. View residues in active site.

 
Is there a known (resolved) three-dimensional structure for the protein encoded by the human MLH1 gene? If not, are there similar protein sequences that have known structures?

Also, one of the known MLH1 mutations in colon cancer patients is of particular interest to me (GLY67TRP)? Is that mutation possibly in an active site of the protein?
 

    Answer
Genomes and Maps back to top

View the genomic context of a gene:
Find its chromosomal location and download the genomic sequence

 
Although I have the transcript (mRNA) for the MLH1 gene, I would also like to obtain the genomic sequence data, along with some upstream data. How can I do this?
 

    Answer
Genes back to top

Gather sequence data and related information
on a particular genetic locus

 
Where can I find an overview of the human MLH1 gene that summarizes information about the nucleotide and amino acid sequences, map location, known sequence variations, and key literature about the gene's function?
 

    Answer
Learn More back to top

Article narrating the above Entrez demonstration on the human MLH1 gene:

Geer RC, Sayers EW. 2003. Entrez: making use of its power. Brief Bioinform., 4(2):179-84 (June). PMID: 12846398
The Entrez Tutorial page provides a brief summary of the article and a link to the full text *.pdf file.

The search results (number of hits) noted in the article reflect the data that were available as of March 2003. The number of search hits will change as the databases grow, but the general search concepts will continue to apply.
NIH Office of Science Education Educational Materials:

Cell Biology and Cancer -- NIH Curriculum Supplement Series -- Grades 9-12
http://science-education.nih.gov/supplements/nih1/cancer/default.htm

Revised 01/10/2007