PedHunter
PubMed Entrez BLAST OMIM Taxonomy Structure

Download

PedHunter

 

 Overview

PedHunter is a software package that facilitates creation and verification of pedigrees within large genealogies.

  • Agarwala R, Biesecker LG, Hopkins KA, Francomano CA, Schäffer AA: Software for constructing and verifying pedigrees within large Genealogies and an application to the Old Order Amish of Lancaster County. Genome Research 8:211-221, 1998. [PubMed] [pedhunter.ps]

  • Lee W-J, Pollin TI, O'Connell JR, Agarwala R, Schäffer AA: PedHunter 2.0 and its usage to characterize the founder structure of the Old Order Amish of Lancaster County. BMC Medical Genetics 11:68, 2010. BMC Medical Genetics 11:68, 2010. [PubMed] [pedhunter2_appeared.pdf]

Linkage analysis requires describing pedigrees for a set of people exhibiting a specific trait and verifying relationships in pedigrees. PedHunter uses methods from graph theory to solve two versions of the pedigree connection problem for genealogies as well as other pedigree analysis problems. The pedigrees are produced by PedHunter as files in LINKAGE format ready for linkage analysis and for drawing with a variety of drawing programs, such as PEDDRAW and cranefoot.

In 2009, we completed version 2.0 of PedHunter that includes improved engineering of the code and many new queries.

 

 Queries

Queries provided in PedHunter 2.0 can be divided into four categories:

  • testing relationships: is_father, is_mother, is_child, is_sibling, is_half_sibling, is_first_cousin, is_ancestor, is_descendant, is_founder

  • finding people satisfying a certain relation: spouses, spouses_file, father, mother, children, siblings, half_siblings, uncles_aunts, first_cousins, ancestors, ancestors_file, descendants, descendants_file, lca, lca_file, birth_death, age, living, lifespan, founder, founder_birth, founder_descendant, count_descendant

  • printing information: person_info, children_info, children_couple_info, family_info

  • complex queries: family, subset, all_shortest_paths, all_shortest_paths_count, kinship, inbreeding, ancestors_ped, descendants_ped, all_relatives, acp, asp, average_r, calculate_r, minimal

Some queries are briefly described below:

  • ancestors_ped: Finds the ancestors of a given person upto specified generations. Output is a pedigree in pre-makeped LINKAGE format. By using non-zero values of the key, it is possible to restrict attention to male or female ancestors, which is useful for Y chromosome and mitochondrial studies.

  • lca (short for lowest common ancestors): Finds the most recent common ancestors of two persons. For each such ancestor, the program also prints the length of the shortest paths from two persons in the input to that ancestor.

  • lca_file: Given a list of people, find all persons P such that each person p in P is an ancestor of everyone in the list, but none of the children of p are ancestors of everyone in the list.

  • subset: Find a maximal subset of a list of people that has a common ancestor. The subset returned is "maximal" in the sense that it cannot be enlarged, but not necessarily of "maximum" size.

  • acp (short for all common paths pedigree): Find all common paths pedigree for persons in an input file. Output is a pedigree in pre-makeped LINKAGE format includes all paths that link more than one person in an input file.

  • asp (short for all shortest paths pedigree): Find all shortest paths pedigree for a given list of people, if any. The typical use of asp is to find a pedigree to connect several persons with the same phenotype. The program first finds the minimal ancestors (as in the lca_file query) of all persons in the file. Then for each minimal ancestor all shortest (in number of generations) paths are found to each person listed. The collected set of people in the pedigree is output in pre-makeped LINKAGE format. When there are multiple minimal ancestors multiple pedigrees are output. The justification for the "all shortest paths" pedigree is in the paper cited above.

  • all_shortest_paths: Print all shortest paths from an ancestor to a descendant. This function can be used to help understand the output of asp.

  • lifespan: Find all persons who lived for a given time. In order to deal with missing birth or death dates, there are five pertinent options as follows.
    1. all - If a person has missing date and has age ≥ queried age, we include that person.
    2. pessimistic - Include only those individuals that have both birth date and death date specified and age ≥ queried age.
    3. optimistic - Include the subset that have birth date specified and if they do not have death date specified, then include only those who could not have beyond LIMIT (default is 96) years in order to throw out ones that are definitely missing death date and are probably not living at present.
    4. optimistic living - Qualifies under criterion 3 and does not have death date specified.
    5. oldest person - Consider all people who have both death dates and birth dates. Print birth date and death date of oldest person among all considered.
    The computed "age" is approximate, since only the birth and death years are considered; months and days are ignored.

  • all_relatives: Find all relatives of a person. Output is a pedigree in pre-makeped LINKAGE format. Here "relative" means either the person given as the argument or anyone connected to that person via any combination of parent, spouse or child links.

  • inbreeding: Compute the inbreeding coefficients of a list of people with respect to the entire genealogy.

  • kinship: Compute the kinship coefficients of a list of pairs of people with respect to the entire genealogy.

  • calculate_r: Calculate the relative representation of each founder in each given descendant. We define the relative founder representation by a given founder in a given descendant as the expected proportion of alleles in the descendant that were inherited identical-by-descent (IBD) from the founder.

  • average_r: Calculate the mean relative founder representation for each founder over all study descendants.

  • minimal: Print minimal tree connecting the given list of people who have the given asp pedigree. This function can be used to find a small pedigree when the asp pedigree is too big for your purpose. When there are multiple minimal pedigrees, one of them is chosen arbitrarily. Unfortunately, the performance of minimal degrades rapidly as the given list of individuals grows. Fortunately, other researchers have developed better software for the general Steiner tree problem in graphs as described in the following paper:
      Koch T and Martin A: Solving Steiner Tree Problems in Graphs to Optimality. Networks 32:207-232, 1998.

 

 Utility Programs

Utility programs include linkage2tables, verify_tables, generations, subped, print_pedigree, renumber_pedigree and trim_pedigree.

Some utility programs are briefly described below:

  • renumber_pedigree: Renumber IDs in an input pedigree file, so that parent IDs are smaller than the child IDs and/or add missing parents. Adding missing parents is necessary for some packages such as LINKAGE that assumes that each person has either 0 or 2 parents shown. Renumbering to make parent IDs smaller than child IDs is useful for some methods that compute kinship coefficients.

  • trim_pedigree: Trim the parents of nuclear family that is at the top of the pedigree and has only one child. Trimmed pedigrees are useful for studying some aspects of inheritance such as the average_r.

 

 Genealogy

The genealogy data to be used by PedHunter can be stored as a relational database in Sybase or as column delimited ASCII plaintext files. There are two required tables: person information table, parent-child relationship table, and two optional tables: id table, generation

  • Person table: This table has information specific to a person; it has fields:
      program identifier (required),
      name (optional),
      birth date (optional),
      death date (optional),
      address (optional),
      gender (required for married couples),
      special status (used to encode twins, adoptions, optional),
      other information (optional).

  • Relationship table: This table encodes parent-child relationships; it has fields:
      program identifier of father (required),
      program identifier of mother (required),
      marriage date (optional),
      delimited program identifiers for children (with these two parents, required but can be empty).

  • Id table: If you have a system of identifiers for your genealogy that you find convenient and these identifiers are not integers, then an id table with columns for program identifier and your identifier that expresses the 1-to-1 correspondence between them is required.

  • Generation table: This table can be generated automatically using code provided in PedHunter and is needed only if you are using queries inbreeding and kinship. This table lists the maximum generation level for each person in the database.

 

 Feedback

Send comments, questions, and suggestions to Richa Agarwala and Alejandro Schäffer.

 

 

[Help] [Search]     [NLM NIH] [Disclaimer]