CGH Analysis Software
PubMed Entrez BLAST OMIM Taxonomy Structure

Download

CGH software sources

CGH Windows executable

README file only

 

   Overview

This file briefly describes some aspects of the software for modeling oncogenesis described in:

  • Desper R, Jiang F, Kallioniemi O-P, Moch H, Papadimitriou CH, Schäffer AA: Inferring Tree Models for Oncogenesis from Comparative Genome Hybridization Data. J. Computational Biology 6: 37-51, 1999. [PubMed]

Comparative genomic hybridization (CGH) is a laboratory method to measure gains and losses of chromosomal regions in tumor cells. It is believed that DNA gains and losses in tumor cells do not occur entirely at random, but partly through some flow of causality. Models that relate tumor progression to the occurrence of DNA gains and losses could be very useful in hunting cancer genes and in cancer diagnosis.

Cancer is associated with a sequence of genetic changes that cause the cell cycle division, cell differentiation, or cell death processes to go out of control. Cancer genes can be broadly classified into two types: tumor suppressor genes and oncogenes. A tumor suppressor gene leads to cancer when there is a decrease of expression of the corresponding protein; an oncogene leads to cancer when there is an increase of expression of the corresponding protein. Our aim is to infer from the gains and losses of chromosomes and chromosome arms, or from the presence/absence of other aberrations such as breakpoints, which chromosomal regions are most likely to harbor important genes for tumor initiation, and which may be important for progression. We work with gain and loss information for regions of chromosomes, since this kind of a cytogenetic study provides a survey of the entire human genome.

In solid forms of cancer, unlike leukemia and lymphoma, the study of chromosomal alterations has been especially difficult. An important problem in solid tumors is that once a set of critical genetic alterations develops, the cancer cell goes ``out of control'' and starts to accumulate seemingly random alterations. Since solid tumor samples often contain over a dozen chromosomal alterations it has proven difficult to identify the primary disease-causing events.

We assume that the user has collected data on the presence or absence of a set of events E for each tumor in a set of tumors T. Our software is designe primarily to work with CGH data where the events are gains or losses of a chromosomal region. We have also used the software to analyze data on cytogenetic breakpoints where the events are bands containing a breakpoint. The data format is described in the README file.

Once we have the list of aberrations of a particular tumor, we can think of it as a set of genetic events that took place in some unknown order. It is believed that these events do not occur in a random fashion, but they are the result of some unknown flow of causality. That is, once an event occurs, it increases the probability of other events occurring, and so on. In some cases the connection between one event and the next will be specific and directly causal, while in other cases the later event occurs seemingly at random because of the basic genetic instability in a tumor cell.

Our software produces tree shaped models of events. Events near the root are predicted to be early events. Events that are clustered together in subtrees are predicted to be associated with more genetically homogeneous subclasses. If there is an edge from event i to event j, this may predict a causal relationship. If all the events occur independently, then the preferred tree will be a star in which all events are adjacent to the root. The main mathematical result of the paper cited above is that under a set of plausible assumptions, our inference algorithm chooses the best tree model. Although there are exponentially many tree models to choose from, our software chooses one quickly by using a classical algorithm to compute a "maximum weight branching" in a graph.

Our tree modeling methods were used as part of a much larger project to hunt for the third locus for susceptibility to hereditary breast cancer. See:

  • Kainu T, et al: Somatic Deletions in Hereditary Breast Cancers Implicate 13q21 as a Putative Novel Breast Cancer Susceptibility Locus. Proc. Nat. Acad. Sci. USA 97: 9603-9608, 2000. [PubMed]

We have published a case study applying our methods and other statistical methods to analysis of cytogenetic breakpoint data in a large collection of ovarian cancer samples. Although the aberrations measured by cytogenetics and by CGH are quite different, our encoding of aberration in our software is sufficiently general that the tree methods can be directly used on cytogenetic breakpoint data also. See:

  • Simon R, Desper R, Papadimitriou, CH, Peng A, Taetle R, Alberts DS, Trent JM, Schäffer AA: Chromosome Abnormalities in Ovarian Adenocarcinoma III: Using Breakpoint Data to Infer and Test Mathematical Models for Oncogenesis. Genes Chromosomes Cancer 28: 106-120, 2000. [PubMed] [ovar1026.ps]

It is not reasonable to make tree models using all possible genetic aberrations. For example, in the most common form of CGH data, each of 41 chromosome arms may have again or loss, for a total of 82 events. Therefore, some heuristic method is need to choose a small subset of events that are most interesting. Our paper describes a clique heuristic for selecting interesting events and this is included in our software. We recommend and have also implemented a well-known method described in:

  • Brodeur GM, Tsiatis AA, Williams DL, Luthardt FW, Green AA: Statistical analysis of cytogenetic abnormalities in human cancer cells. Cancer Genet Cytogenet 7:137-152, 1982 [PubMed]

Send comments, questions, and suggestions to Richard Desper and Alejandro Schäffer

 

 

 

[Help] [Search]     [NLM NIH] [Disclaimer]