Download
PedHunter (genealogy in Sybase)
PedHunter (genealogy in text files)
|
|
PedHunter is a software package that facilitates
creation and verification of pedigrees within large genealogies.
- Agarwala R, Biesecker LG, Hopkins KA, Francomano CA,
Schäffer AA:
Software for Constructing and Verifying Pedigrees
Within Large Genealogies and an Application to the
Old Order Amish of Lancaster County. Genome Research
8:211-221,1998.
[PubMed]
[pedhunter.ps]
Linkage analysis requires
describing pedigrees for a
set of people exhibiting a specific trait and
verifying relationships in pedigrees.
PedHunter uses
methods from graph theory to solve two versions of the pedigree
connection problem for genealogies
as well as other pedigree analysis problems.
The pedigrees are produced by PedHunter as files in LINKAGE format
ready for linkage analysis and for drawing with PEDDRAW.
Queries provided in PedHunter
can be divided into 3 categories:
-
testing relationships:
is_ancestor, is_cousin, is_half_sib, is_mother, is_father,
is_sibling, is_child
-
finding people satisfying a certain relation:
mother, father, children, cousins, uncles_aunts, half_sibs,
siblings, descendants, ancestors
-
complex queries: minimal_ancestors, inbreeding, kinship, subset,
asp, all_shortest_paths, minimal
The complex queries are briefly described below:
-
minimal_ancestors: Given a list of people,
find all persons P such that P is an ancestor of everyone in the
list, but none of the children of P are ancestors of everyone in
the list.
-
inbreeding: Compute the inbreeding coefficients of a list
of people with respect to the entire genealogy.
-
kinship: Compute the kinship coefficients of a list of
pairs of people with respect to the entire genealogy.
-
subset: Find a maximal subset of a list of people that
has a common ancestor. The subset returned is "maximal" in the
sense that it cannot be enlarged, but not necessarily of
"maximum" size.
-
asp: Find all shortest paths pedigree for a given
list of people, if any. The typical use of asp is to
find a pedigree to connect several
persons with the same phenotype. The program first finds the
minimal ancestors (as in the 'minimal ancestors' query) of
all persons in the file. Then for each minimal ancestor all
shortest (in number of generations) paths are found to each person
listed. The collected set of people in the pedigree is output
in LINKAGE format. When there are multiple minimal
ancestors multiple pedigrees are output. The justification for
the "all-shortest paths" pedigree is in the paper cited above.
-
all_shortest_paths:
Print all shortest paths from an ancestor to a descendant.
This function can be used to help understand the output of asp.
-
minimal: Print minimal tree connecting the given list of
people who have the given 'asp' pedigree.
This function can be used to find a small pedigree when the
asp pedigree is too big for your purpose. When there are
multiple minimal pedigrees, one of them is chosen arbitrarily.
Unfortunately, the performance of minimal degrades rapidly as the
given list of individuals grows. Fortunately, other researchers
have developed better software for the general Steiner tree
problem in graphs as described in the following paper:
Koch T and Martin A: Solving Steiner Tree Problems in
Graphs to Optimality. Networks 32:207-232, 1998.
The genealogy data to be used by PedHunter
can be stored as a relational database in Sybase or as
column delimited ASCII text files.
There are two required tables: person information table,
relationship table, and two optional tables: id table,
generation table.
-
Person table: This table has information
specific to a person; it has fields:
program identifier (required),
name (optional),
birth date (optional),
death date (optional),
address (optional),
gender (required for married couples),
special status (used to encode twins, adoptions, optional),
other information (optional).
-
Relationship table: This table encodes parent-child
relationships; it has fields:
program identifier of father (required),
program identifier of mother (required),
marriage date (optional),
delimited program identifiers for children
(with these two parents, required but can be empty).
-
Id table: If you have a system of identifiers for your
genealogy that you find convenient and these identifiers are not
integers, then an id table with columns for program identifier
and your identifier that expresses the 1-to-1 correspondence
between them is required.
-
Generation table: This table can be generated
automatically using code provided in Pedhunter and is needed
only if you are using queries 'inbreeding' and 'kinship'.
This table lists the maximum generation level for each
person in the database.
Send comments, questions, and suggestions to
Richa Agarwala and
Alejandro Schäffer
|
|