Pedhunter
PubMed Entrez BLAST OMIM Taxonomy Structure

Download

PedHunter
(genealogy in Sybase)


PedHunter
(genealogy in text files)

 

   Overview

PedHunter is a software package that facilitates creation and verification of pedigrees within large genealogies.

  • Agarwala R, Biesecker LG, Hopkins KA, Francomano CA, Schäffer AA: Software for Constructing and Verifying Pedigrees Within Large Genealogies and an Application to the Old Order Amish of Lancaster County. Genome Research 8:211-221,1998. [PubMed] [pedhunter.ps]

Linkage analysis requires describing pedigrees for a set of people exhibiting a specific trait and verifying relationships in pedigrees. PedHunter uses methods from graph theory to solve two versions of the pedigree connection problem for genealogies as well as other pedigree analysis problems. The pedigrees are produced by PedHunter as files in LINKAGE format ready for linkage analysis and for drawing with PEDDRAW.

 

Queries

Queries provided in PedHunter can be divided into 3 categories:

  • testing relationships: is_ancestor, is_cousin, is_half_sib, is_mother, is_father, is_sibling, is_child


  • finding people satisfying a certain relation: mother, father, children, cousins, uncles_aunts, half_sibs, siblings, descendants, ancestors


  • complex queries: minimal_ancestors, inbreeding, kinship, subset, asp, all_shortest_paths, minimal

The complex queries are briefly described below:

  • minimal_ancestors: Given a list of people, find all persons P such that P is an ancestor of everyone in the list, but none of the children of P are ancestors of everyone in the list.


  • inbreeding: Compute the inbreeding coefficients of a list of people with respect to the entire genealogy.


  • kinship: Compute the kinship coefficients of a list of pairs of people with respect to the entire genealogy.


  • subset: Find a maximal subset of a list of people that has a common ancestor. The subset returned is "maximal" in the sense that it cannot be enlarged, but not necessarily of "maximum" size.


  • asp: Find all shortest paths pedigree for a given list of people, if any. The typical use of asp is to find a pedigree to connect several persons with the same phenotype. The program first finds the minimal ancestors (as in the 'minimal ancestors' query) of all persons in the file. Then for each minimal ancestor all shortest (in number of generations) paths are found to each person listed. The collected set of people in the pedigree is output in LINKAGE format. When there are multiple minimal ancestors multiple pedigrees are output. The justification for the "all-shortest paths" pedigree is in the paper cited above.


  • all_shortest_paths: Print all shortest paths from an ancestor to a descendant. This function can be used to help understand the output of asp.


  • minimal: Print minimal tree connecting the given list of people who have the given 'asp' pedigree. This function can be used to find a small pedigree when the asp pedigree is too big for your purpose. When there are multiple minimal pedigrees, one of them is chosen arbitrarily. Unfortunately, the performance of minimal degrades rapidly as the given list of individuals grows. Fortunately, other researchers have developed better software for the general Steiner tree problem in graphs as described in the following paper:
      Koch T and Martin A: Solving Steiner Tree Problems in Graphs to Optimality. Networks 32:207-232, 1998.

 

Genealogy

The genealogy data to be used by PedHunter can be stored as a relational database in Sybase or as column delimited ASCII text files. There are two required tables: person information table, relationship table, and two optional tables: id table, generation table.

  • Person table: This table has information specific to a person; it has fields:
      program identifier (required), name (optional), birth date (optional), death date (optional), address (optional), gender (required for married couples), special status (used to encode twins, adoptions, optional), other information (optional).

  • Relationship table: This table encodes parent-child relationships; it has fields:
      program identifier of father (required), program identifier of mother (required), marriage date (optional), delimited program identifiers for children (with these two parents, required but can be empty).

  • Id table: If you have a system of identifiers for your genealogy that you find convenient and these identifiers are not integers, then an id table with columns for program identifier and your identifier that expresses the 1-to-1 correspondence between them is required.


  • Generation table: This table can be generated automatically using code provided in Pedhunter and is needed only if you are using queries 'inbreeding' and 'kinship'. This table lists the maximum generation level for each person in the database.

 

Send comments, questions, and suggestions to Richa Agarwala and Alejandro Schäffer

 

 

 

[Help] [Search]     [NLM NIH] [Disclaimer]