|Genetic Analysis Software|
Richa Agarwala and Alejandro Schäffer are working together and separately on various software packages for analysis of genetic data. This page briefly summarizes several ongoing projects and provides hyperlinks to a more detailed page about each project, download software, and references for papers.
Alejandro Schäffer has led the development of the FASTLINK software package for genetic linkage analysis. Genetic linkage analysis is a statistical technique used to map genes and find the approximate locations of disease genes. FASTLINK aims to replace the main programs of the widely used package LINKAGE by doing the same computations faster. FASTLINK can also run in parallel either on a shared-memory computer or on a network of workstations. FASTLINK adds much new documentation. FASTLINK has been used in over 1000 published genetic studies. FASTLINK is freely available by ftp; follow the hyperlink to the FASTLINK page for more details.
In collaboration with Sandeep Gupta, Alejandro Schäffer developed a significantly faster and more space-efficient version of the program MSA to do multiple sequence alignment. Follow the hyperlink to the MSA page to retrieve the paper and software.
Richa Agarwala, Jeremy Buhler (Washington U.), and Alejandro Schäffer have developed software to do conditional linkage analysis of polygenic diseases such as diabetes, asthma, and glaucoma. The software is called CASPAR (Computerized Affected Sibling Pair Analyzer and Reporter). Other participants in the design of CASPAR are: Kenneth Gabbay (Baylor College of Medicine), Prof. Marek Kimmel (Rice University) and David Owerbach (Baylor College of Medicine). Follow the hyperlink to the CASPAR page to retrieve the software.
Richa Agarwala has developed software called PedHunter to query a genealogical database. Among the problems PedHunter solves is how best to connect a set of relatives with the same disease into a pedigree suitable for input to genetic linkage analysis. PedHunter is currently being used at NCBI to query the Amish Genealogy database(AGDB), a database of over 295,000 members of the Amish and Mennonite religious groups, and their relatives. Other participants in the design of PedHunter and AGDB include Leslie Biesecker (NHGRI/NIH), Clair Francomano (now at NIA/NIH), and Alejandro Schäffer. PedHunter is being used by other research groups to query other genealogical databases. PedHunter query software comes in two flavors that depend on how the genealogy is stored: in a SYBASE database or in ASCII text files. Follow one of the two PedHunter hyperlinks to retrieve a paper and software.
Software to analyze comparative genomic hybridization data
Richard Desper and Alejandro Schäffer have developed software, called oncotrees, to analyze data on tumors to study models of oncogenesis. The software is designed to analyze data generated by a technique called comparative genomic hybridization, but it has also been used to analyze cytogenetic breakpoint data. The focus of the software is to infer tree models that relate genetic aberrations to tumor progression. Participants in the design of the software include Olli Kallioniemi (NHGRI/NIH) and Christos Papadimitriou (UCBerkeley).
Software for radiation hybrid mapping and map integration
Richa Agarwala and Alejandro Schäffer developed software, called rh_tsp_map, to construct radiation hybrid maps and to integrate maps that contain overlapping marker sets. Many improvements in version 3.0 of rh_tsp_map were implemented by Edward Stallknecht Rice. He is also first author of an extensive tutorial and set of man pages that now accompany the rh_tsp_map download shown as a link entitled Mapping software on the left. The radiation hybrid mapping methods are based on: a new strategy to select framework markers, a known reduction from the radiation hybrid mapping problem to the traveling salesman problem, and using the existing software CONCORDE to solve large instances of the traveling salesman problem. The version of CONCORDE that fits together with rh_tsp_map is available at CONCORDE for RH mapping and to use it use this software you also need a version of the QSopt library QSopt Installation instructions can be found in the tutorial included with rh_tsp_map, for which the download link Mapping software is on the left hand side. The map construction software was used at NCBI to construct dense human radiation hybrid maps. Follow the link on the right to learn more about these maps. The software has also been used to construct maps of the cat and the dog, which are described in some of the references, as well as other vertebrates. Participants at NCBI include Donna Maglott, Greg Schuler, Edward Stallknecht Rice and Alejandro Schäffer. David Applegate and William Cook, co-developers of CONCORDE, collaborated on its usage for radiation hybrid map construction. William Murphy (Texas A&M) supplied the data for and collaborated on constructing maps of the cat. Christophe Hitte (University of Rennes, France) constructed the maps of the dog and independently compared our software to other, competing packages.
Software to analyze microarray data
In collaboration with Javed Khan (NCI), Richard Desper and Alejandro Schäffer have developed a software package as an aid to classification problems generated by gene expression data. The software package METrics on EXPression data (METREX) calculates any of a variety of metrics on gene expression data.Expression data typically comes in the form of a matrix of values for a number of genes that have each been measured in a number of different tissues, tumors, or cell lines. One common problem is that the number of variables can be enormous and defy simple comprehension. A number of techniques have been developed to classify the genes (or the cell lines or tumors) based on the patterns seen in the data matrix. The main program metrex provides metrics on the data matrix that can be used by various classification programs to classify the rows or columns of the input matrix. The input format is described in the file readme.metrex that comes with the distribution. The program outputs a distance matrix in the popular Phylip format that can be used as input to most phylogeny building programs, including Fitch and Neighbor from the Phylip package of Joseph Felsenstein, the FastME program of Desper and Gascuel, and the comprehensive phylogeny program Paup of Swofford.