- Journal List
- Genetics
- v.170(4); 2005 Aug
- PMC1449756

# Construction of Linkage Maps in Full-Sib Families of Diploid Outbreeding Species by Minimizing the Number of Recombinations in Hidden Inheritance Vectors

^{1}

^{1}

*Address for correspondence:*Biometris, P.O. Box 100, 6700 AC Wageningen, The Netherlands. E-mail: ln.ruw@nesnaj.sennahoj

## Abstract

This article investigates the construction of linkage maps by means of the reconstruction of hidden inheritance vectors. An inheritance vector provides a description of the origin of marker alleles in an individual in terms of a binary code indicating the grandparental origin of the alleles. The practical application that is considered is the full-sib family of a diploid outbreeding species. Essentially, map construction is considered as an optimization problem in which marker data are used as restrictions on hidden inheritance vectors. Simulated annealing, a form of combinatorial optimization, is used to minimize the number of recombinations between adjacent markers. The new algorithm is applied to simulated data as well as to practical data obtained from a cross between two apple cultivars. For the practical data, a detailed procedure for examining the reliability of individual markers and their positions on the map is presented. Extensions of the method for more complicated population structures are discussed.

IN situations where progenies have been derived from pure lines (*e.g.*, F_{2}, backcrosses, (doubled) haploid populations, and populations of recombinant inbred lines) numbers of recombinations between markers can, at least approximately, be obtained by simply counting numbers of recombinant individuals. The marker data provide direct information about the grandparental origin of the alleles, and the linkage phases of the markers are known. The problem of constructing a genetic linkage map consists of finding that marker order for which some criterion, *e.g.*, the likelihood or the total number of recombinations between adjacent markers, is optimal. If all the data are complete the likelihood assumes a simple multiplicative form (Jansen* et al.* 2001). As a consequence, multipoint maximum-likelihood and two-point maximum-likelihood estimates of recombination coefficients are identical.

In full-sib families of outcrossing species the construction of genetic linkage maps is far more complicated than in progenies derived from pure lines. Maliepaard* et al.* (1997) mention the following problems. Markers may be different with regard to number of segregating alleles. The number of alleles of a polymorphic marker may be 2, 3, or 4. Markers may also be different with regard to the number of heterozygous parents. The number of heterozygous parents may be 1 or 2. Even if both parents are heterozygous with regard to a particular marker they may be identical. In this case it is impossible to identify the origin of the alleles of heterozygous offspring. Some markers may have so-called null alleles that make it impossible to distinguish between some of the marker phenotypes. Such markers are usually referred to as dominant markers. The linkage phase of a marker in one or both parents may be unknown.

The consequence of the above is that for a full-sib family of an outcrossing species, a part of the data must be considered as incomplete. This means that it is not possible to observe the occurrence of recombination events unambiguously from the marker data.

In the case of incomplete marker data, maximum-likelihood estimates of recombination frequencies for pairs of markers can be obtained by applying the EM algorithm (Dempster* et al*. 1977). Estimates of pairwise recombination frequencies can then be used to construct genetic linkage maps (Maliepaard* et al.* 1997). It is also possible to obtain multipoint maximum-likelihood estimates of recombination frequencies given a marker order (Lander and Green 1987) and to maximize the multipoint likelihood with regard to marker order using the EM algorithm. This latter approach is extremely time consuming if the number of markers is large. Ridout* et al.* (1998) provide arguments for using three-point linkage analysis rather than two-point linkage analysis. Wu* et al.* (2002) use two-point and three-point linkage analysis for the simultaneous estimation of recombination coefficients and linkage phases.

The practice of constructing genetic linkage maps requires an interactive environment. Jansen* et al.* (2001) consider pre- and postmapping diagnostics, usually graphical output, as essential parts of the mapping process. At several points in the mapping process, the geneticist must be able to go back to the data or even to the lab. This may lead to corrections of data or even to elimination of markers or individuals. An interactive environment requires reasonably fast, but also robust algorithms for map construction. Even in the case of gross errors in the marker data, the algorithm must be able to produce results that can help the geneticist to identify and remove errors.

The aim of this article is to develop an efficient method for the construction of genetic linkage maps in full-sib families of outcrossing species. The methods that are developed in this article are based on the reconstruction of the (hidden) inheritance vectors (Lander and Green 1987) of the individuals. Other terms used for inheritance vector are segregation indicator (Thompson 1994) and meiosis indicator (Thompson 2001). Inheritance vectors provide a description of the origin of the marker alleles in terms of a binary code indicating the grandparental origin of the alleles. In the context of a full-sib family of an outcrossing species many sets of inheritance vectors may be underlying one particular set of marker data.

The basic idea is that given the inheritance vectors of all individuals, it is possible to observe the numbers of recombination events by simple counting. Different sets of inheritance vectors may be evaluated with regard to some optimality criterion. In this article we use the total number of recombinations between adjacent markers as the optimality criterion. The approach developed in this article may be used in its own right, but it may also be used as a starting point for a more formal analysis based on genetical and statistical assumptions, *e.g.*, multipoint maximum likelihood (Lander and Green 1987) or a Bayesian analysis (George* et al.* 1999; Guilherme *et al*. 2002).

## MATERIALS AND METHODS

### Marker data:

#### Problems:

In the present context, a molecular genetic marker (in short, a marker) is considered as a point on a pair of homolog chromosomes of a diploid species. A marker is a short segment of DNA. For a marker, different variants or alleles exist. Different alleles arise from differences in DNA sequence.

We consider a cross between two parents that for a particular marker have ordered genotypes A|B and C|D, respectively. In the following, ordered genotypes are referred to as genotypes. In this notation, A is the allele of the marker on the first homolog of the first parent, B is the allele of the marker on the second homolog of the first parent, C is the allele of the marker on the first homolog of the second parent, and D is the allele of the marker on the second homolog of the second parent. The letters A, B, C, and D represent the grandparental origin of the alleles in the offspring of a cross between parents with ordered genotypes A|B and C|D. Knowledge of the grandparental origin of alleles forms the basis of linkage analysis. Unordered genotypes are referred to as phenotypes.

In practice, two major problems may arise. First, the order of the alleles in one parent or in both parents may be unknown. This happens if the phenotypes of the grandparents are unknown or inconclusive. This problem is known as the phase problem. Second, the number of phenotypes in the offspring of a cross between two parents may be less than four. It is also possible that alleles cannot be observed (null alleles). The alleles of a marker in a cross between two parents determine the segregation type of that marker.

#### Segregation type:

The segregation type of marker contains information about the unordered genotypes of the parents in a cross. The segregation type of a marker determines the phenotypes that may occur in the offspring. Ritter* et al.* (1996), Maliepaard* et al.* (1997), and Wu* et al.* (2002) distinguish seven essentially different segregation types that may occur in a cross between two parents. However, in practical applications usually eight segregation types are used (Table 1). Offspring phenotype uu is used if a phenotype could not be observed. The indicator 0 is used if a particular allele is not observable. Apart from segregation types <ab × cd> and <ab × ac>, all other segregation types give rise to less than four possible offspring phenotypes. Segregation types <ab × aa> and <aa × ab> have only two offspring genotypes. They are completely uninformative with regard to the meiosis of the second and the first parent, respectively.

#### Phase:

The phases of the genotypes of the parents of a cross are provided by means of a two-digit binary code {*f*_{1} *f*_{2}}. If *f*_{1} (*f*_{2}) = 0 the order of the alleles for the first (second) parent in the segregation type has to be left unchanged, whereas if *f*_{1} (*f*_{2}) = 1 the order of the alleles has to be reversed relative to the order given in Table 1. The following phase indicators may be used: {00}, {01}, {10}, {11}, {0u}, {1u}, {u0}, {u1}, and {uu}. Again, the letter u stands for unknown. The letter u is always used if a parent is homozygous.

For example, for a marker with segregation type <ab × cd> with phase indicator {01}, the genotypes of the parents are given by a|b and d|c, respectively. However, if the phase indicator is {u1}, the genotype of the first parent is either a|b or b|a and the genotype of the second parent is again d|c.

#### Inheritance vectors:

The algorithm that is described is based on the conversion of observed, possibly incomplete or unknown, phenotypes into complete, hidden inheritance vectors. The inheritance vector of an individual with regard to a marker is a two-digit binary code, which indicates the grandparental origins of its alleles. The first digit refers to the allele obtained from the first parent and the second digit to the allele obtained from the second parent. A 0 indicates that the corresponding parent obtained the allele from its first parent and a 1 indicates that the corresponding parent obtained the allele from its second parent. As a consequence, the following complete inheritance vectors are used: 00, 01, 10, and 11. By using inheritance vectors rather than phenotypes, we are able to record numbers of recombinations between markers by simply counting recombination events. It should be noted that many sets of complete inheritance vectors may be consistent with a particular set of phenotypes. Table 2 provides, by means of a binary code for each segregation type and its corresponding phenotypes, and given the phase, all inheritance vectors that are permitted.

### Algorithm:

#### Reconstruction of inheritance vectors:

The algorithm is based on the reconstruction of inheritance vectors. This is done by converting phenotypes into hidden inheritance vectors in such a way that the total number of recombinations between adjacent markers is minimal. For example, for a marker with segregation type <ab × ab> an offspring individual has phenotype ab. If the phases of the parents are given by {10}, *i.e*., the genotypes of the parents are respectively b|a and a|b, the possible inheritance vectors of the individual are 00 and 11. In other words, phenotype ab excludes inheritance vectors 01 and 10. Phenotypes are used as restrictions on hidden inheritance vectors.

#### Starting configuration:

The mapping problem consists of finding that marker order, those marker phases, and those inheritance vectors that minimize the number of recombination events between adjacent markers. A more formal description of the optimization problem is obtained by considering an initial configuration for the optimization process.

#### Marker order:

The algorithm that is described is started from a configuration in which the markers are placed in random order. The marker at position *s* is denoted by *m*_{[}_{s}_{]} (*s* = 1, 2, … , *M*); *M* represents the number of markers. Dummy markers 0 and *M* + 1 can be added to simplify the calculations.

#### Phase:

If no information concerning marker phases is available, one marker preferably with segregation type <ab × cd> will be assigned phase {00}. Instead of taking a marker with segregation type <ab × cd>, two markers, one with segregation type <ab × aa> and one with segregation type <aa × ab>, may be used as phase anchors. The other markers are assigned at random one of the four phases {00}, {01}, {10}, or {11}. The possible phases of markers with segregation types <ab × aa> are limited to {00} and {10}, and those of markers with segregation types <aa × ab> are limited to {00} and {01}.

#### Inheritance vectors:

Given the phases assigned to the marker (see previous paragraph), each observed genotype is converted into a single inheritance vector. If the phenotype is complete, then given the phase just one inheritance vector is consistent with the phenotype and this inheritance vector will be used. If the observed genotype is incomplete or missing, then given the phase one of the permitted inheritance vectors (given in Table 2) will be chosen at random. The inheritance vectors obtained in the previous step can be arranged in a three-dimensional structure *H* with dimensions *M* × *N* × 2, in which *M* is the number of markers and *N* is the number of individuals. This matrix is referred to as the inheritance matrix. The elements of *H* are denoted by *h _{mnp}* (

*m*= 1, 2, … ,

*M*;

*n*= 1, 2, … ,

*N*; and

*p*= 1, 2).

#### Number of recombination events:

The inheritance matrix *H* may be converted into a three-dimensional structure *R* with dimensions *M* × *M* × 2, containing the numbers of recombinations between pairs of markers. The elements of *R* are given by

(*m*_{1}, *m*_{2} = 1, 2, … , *M*; *p* = 1, 2). If for individual *n* markers *m*_{1} and *m*_{2} are not recombinant in parent *p*, *i.e*., if or . However, if for individual *n*, markers *m*_{1} and *m*_{2} are recombinant in parent *p*, *i.e*., if and or and , .

#### Optimization criterion:

Given restrictions imposed by the phenotypes observed on the individuals, the optimization problem consists of finding an inheritance matrix *H* for which the total number of recombination events is a minimum with regard to the order of the markers, the phases of the markers, and the inheritance vectors that can be assigned to the observed marker data. The total number of recombination events is given by

in which *m*_{[}_{s}_{]} is the marker on position *s* (= 1, 2, … , *M*). In the following an algorithm is developed to minimize *T* with regard to order, phase, and inheritance vector.

### Map optimization:

#### Simulated annealing:

For map construction a form of combinatorial optimization called simulated annealing is used (Kirkpatrick* et al.* 1983; Van Laarhoven and Aarts 1987). Other local search techniques (Aarts and Lenstra 1997) may work equally well but are not considered in this article. With regard to marker order, Jansen* et al.* (2001) used a simple form of simulated annealing and to improve the value of the optimization criterion they used simple moves, replacing one marker at a time. Mester* et al.* (2003) used an evolutionary strategy algorithm. In this case, three types of moves have to be considered: moves with regard to marker order, marker phases, and inheritance vectors.

The difference between the value of the criterion that would be obtained after making a move and the current value of the criterion is denoted by Δ. Negative values of Δ correspond to improvements to the optimality criterion. However, to avoid local optima, moves that increase the value of the criterion with regard to order will sometimes be accepted. A move will always be accepted if Δ ≤ 0. A move may also be accepted, if Δ > 0, but only if exp(−Δ/γ) ≥ *u*, where *u* is sampled at random from the standard uniform distribution. The parameter γ is used to control the acceptance of moves with a positive value of Δ. The acceptance control parameter γ is set to a fixed value in a series of *S* successive moves, which is called a chain. The algorithm runs through *C* + 1 of such chains for which γ_{0} ≥ γ_{1} ≥ … ≥ γ* _{C}*. If γ becomes close to zero, only steps with negative values of Δ will be accepted. The choice of γ

_{0}is discussed in

*Implementation details*. A simple formula to govern the decrease of γ as optimization progresses is given by γ

_{c}_{+1}= γ

*/(1 + α) (*

_{c}*c*= 1, 2, … ,

*C*− 1), in which α is a positive constant.

#### Marker order:

A new order for the markers is obtained by removing the marker from position *s* (= 1, 2, … , *M*) and putting it between the markers on positions *t* (= 0, 1, … , *M*; *t* ≠ *s* − 1, *s*) and *t* + 1. The positions *s* and *t* are chosen at random. The effect of this simple move on the total number of recombination events can be calculated using the current matrix of numbers of recombinations *R*^{c}. Accepting the new order does not affect the current inheritance matrix *H*^{c} and the corresponding matrix of numbers of recombinations *R*^{c}. The increase of the number of recombination events in parent *p* (= 1, 2) is given by

*s* = 1, 2, … , *M*; *t* = 0, 1, … , *M*; *t* ≠ *s* − 1, *s*. The first part of the above formula represents the effect of removing the marker at position *s*, the second part represents the effect of putting this marker between the markers at positions *t* and *t* + 1. The increase in both parents due to a change in marker order is equal to Δ = Δ_{1} + Δ_{2}. The number of recombination events between markers 0 and *M* + 1 and all other markers is fixed to zero.

#### Phase:

The marker at position *s* (= 1, 2, … , *M*) with current phase π^{c}_{ms} is assigned at random a new phase π^{n}_{ms} from one of the permitted phases, π^{n}_{ms} ≠ π^{c}_{ms}. Accepting the new phase will affect the row of the complete inheritance matrix *H* corresponding with the marker from position *s* and also the numbers of recombinations between the marker from position *s* and all other markers, *r*_{msmu} (*u* = 1, 2, … , *M*; *u* ≠ *s*). The increase of the total number of recombination events by a phase change in parent *p* (= 1, 2) is given by

A phase change concerning the marker at position *s* affects all inheritance vectors of the marker and, as a consequence, the number of recombination events of this marker with all other markers will have to be changed. If the move concerns a phase change in parent *p*, ; *i.e*., if , , and if , . By summing over all individuals the new number of recombinations between the markers at positions *s* and *u* becomes (*u* = 1, 2, … , *M*; *u* ≠ *s*).

#### Inheritance vectors:

Suppose that the marker at position *s* is assigned a new inheritance vector for individual *m*. The new inheritance vector *h*^{n}_{msnp} is chosen at random from the inheritance vectors permitted by the observed phenotype and the current phase of the marker *h*^{n}_{msn1} ≠ *h*^{c}_{msn1} or *h*^{n}_{msn2} ≠ *h*^{c}_{msn2}. The effect on thetotal number of recombination events in parent *p* is equal to

In the above formulas the first part relates to the magnitude of the change, which for markers at positions 1 and *M* is −1, 0, or 1. For markers at intermediate positions the change is −2, 0, or 2. The second part determines whether the change becomes an increase or a decrease of the number of recombinations. It can be observed that Δ* _{p}* is always zero if the flanking markers are recombinant in parent

*p*. If for individual

*n*the inheritance vector of the marker at position

*s*is changed for parent

*p*,

*i.e.*, if

*h*

^{n}

_{msnp}≠

*h*

^{c}

_{msnp}, then if

*h*

^{c}

_{msnp}≠

*h*

^{c}

_{munp}, the number of recombinations between markers

*s*and

*u*is reduced by 1, , and if , the number of recombinations between markers

*s*and

*u*is increased by 1, ,

*u*= 1, … ,

*M*,

*u*≠

*s*.

### Implementation details:

At this stage a simple form of simulated annealing is used. Moves consist of a series of *M*_{1} proposals for changing the marker order, *M*_{2} proposals for changing marker phases, and *M*_{3} proposals for changing inheritance vectors. Proposals are evaluated (accepted/rejected) one at a time. Values that were found suitable in the applications are: *M*_{1} = 10, *M*_{2} = 1, and *M*_{3} = 10. Establishing the marker phases requires a smaller effort than establishing the marker order and the inheritance matrix. At this stage of development, proposals for phase and inheritance vector are accepted only if the value of the criterion is not increased. If the marker phases are known (for example, from a two-point analysis) they may be fixed relative to a marker of segregation type ab × cd or ab × ac, which serves as phase anchor.

Proposals for marker order changes are accepted according to the rule described in *Map optimization*. The initial value of the acceptance control parameter γ is determined in such a way that the probability of accepting proposals that increase the optimality criterion equals ∼0.5. A simple rule for decreasing the acceptance control parameter γ is used (Jansen* et al*. 2001). The algorithm with the above settings works well in situations without gross errors in the data. In other cases it will provide solutions that may assist the user in finding such errors in the data.

Alternative proposals for marker order may be more appropriate close to optimum. An alternative way of changing the marker order is to reverse the order of a number of successive markers on the map, (*t* + 1) markers, say. The value of *t* is chosen at random from the values 1, 2, … , *T*. In the current implementation the value of *T* is set equal to 5. The position of the first of the set of the markers to be reversed in order is chosen at random from the values 1, 2, … , *M* − *T*. The increase of the number of recombination events in parent *p* (= 1, 2) is then given by

At this stage reversion of a number of successive markers is used only if the value of the acceptance control parameter γ becomes <5. Such proposals are interchanged randomly with the simple proposals defined previously at a rate of 50%.

### Plausible marker orders:

Jansen* et al.* (2001) used a Metropolis algorithm to obtain a set of marker orders that are plausible with the data. This method is also used in the current situation. In the current situation, moves involve only changes of marker order and changes of inheritance vectors. With regard to changes of marker order, a Metropolis algorithm is obtained by setting the acceptance control parameter γ equal to unity after the optimum marker order has been found. With regard to changes of inheritance vectors, only proposals that do not lead to an increase of the number of recombinations are accepted. Since plausible marker orders are expected to be close to the best marker order found and if distances between markers are small, proposed changes of marker phases would inevitably lead to large increases of the number of recombination events, and, consequently, such proposals will never be accepted. Therefore, proposals for changing the phase of markers are not considered when constructing plausible marker orders (*M*_{2} = 0).

## RESULTS

### Simulated data:

The first application concerns a set of simulated data. A summary of the data is given in Table 3.

Simulated data are useful to understand what will happen with data with known properties. Marker data for 100 individuals were simulated. The distance between the markers is 5 cM. In this case the segregation of marker m12 deviates somewhat from the expected 0.25:0.5:0.25 distribution (too many heterozygotes). Without missing data and without errors, this set of data must be considered very simple to map.

The total number of recombinations for the optimum configuration is 172: 88 recombinations in the maternal meiosis and 84 in the paternal meiosis. Figure 1 shows a graphical presentation obtained using MapChart (Voorrips 2002) of two linkage maps that both have 172 recombinations but are slightly different. Differences may occur only around markers with segregation type <ab × ab>, because these markers are not completely informative. In Figure 1 markers that are completely uninformative in one of the meioses have been omitted in that meiosis.

In the maternal as well as in the paternal meiosis, only the numbers of recombinations between completely informative markers are always the same. If markers are not completely informative, more than one optimum configuration may be found.

Figure 2 shows a graphical representation of the inheritance matrices obtained at convergence. This figure shows not only that the columns of the inheritance vectors consist mainly of long stretches of grandmaternal or grandpaternal origin, but also that a few so-called singletons are present. This is due to the 5-cM distance between the markers, which will produce “singletons” with a frequency of 1/400.

### Missing data and errors:

In practice, some marker data are missing. Missing data will lead to optimum configurations with a smaller number of recombinations. Figure 3a shows the effect of missing data that occur randomly with different rates, using the marker data summarized in Table 3.

*vs*. percentage of missing data using the data summarized in Table 3. For 5, 10, 20, and 40%

**...**

In practice, marker data are also subject to errors. Errors will lead to optimum configurations with a greater number of recombinations. Figure 3b shows the effect of randomly occurring errors, using the marker data summarized in Table 3. The following errors were possible: for segregation types <ab × aa> and <aa × ab>, aa → ab, ab → aa; for segregation type <ab × ab>, aa → ab, bb → ab, ab → aa or bb; and for segregation type <ab × cd>, ac → ad or bc, ad → ac or bd, bc → bd or ac, bd → bc or ad.

In practice, missing data and errors occur simultaneously. Using the data summarized in Table 3 a new data set was obtained by adding errors at a rate of 2% and missing data at a rate of 10%. A summary of the data is given in Table 4.

The total number of recombinations for the optimum configuration is 242. However, in this case the number of recombinations in the maternal and the paternal meiosis is not always the same. The following combinations were found: (122, 120), (123, 119), … , (129, 113). This illustrates that in this set of data there is more uncertainty about whether recombinations occur in the maternal meiosis or in the paternal meiosis. The optimality criterion also appears to be flatter. This can be observed by considering the effect on the optimality criterion of reversing duplets, triplets, and quadruples of markers relative to the true marker order m1, m2, … , m21 for the “clean” marker data summarized in Table 3 and the marker data containing 2% errors and 10% missing data summarized in Table 4 (see Figure 4). In Figure 4 not a single point exceeds the line of equality. It can also be observed that with 10% missing data and 2% errors the true marker order is not optimal anymore; the optimum value of the optimality criterion is found for marker order m1, m2, … , m10, m12, m11, m13, m14, … , m21.

The inheritance matrix of an optimum configuration show many singletons (Figure 5), and a number of these are due to errors in the data. Calculation of plausible maps using the Metropolis algorithm indicates that changes of the following duplets of markers are acceptable:

Markers in the duplets (m3, m4), (m15, m16), and (m18, m19) have segregation types <ab × aa> and <aa × ab>, respectively, and lie next to each other on the optimum map. Reversing the order of these duplets has no effect on the criterion.

### Apple data:

In the following we use marker data obtained from a cross between two varieties of apple (*Malus domestica* Borkh.). The apple varieties are called Prima (used as mother) and Fiesta (used as father). The data refer to one chromosome or linkage group of the apple genome. The problem of forming linkage groups is not discussed in this article. Table 5 provides a summary of the data. Each line contains information of one marker: its number and name, the system that was used to produce the marker, its segregation type, and a summary of observations on 152 individuals. The marker phases are unknown. It can be observed from Table 5 that all markers segregate according to expectation. Some markers (m12, m14, m3, and m5) have very large numbers of missing observations. Full details can be found in Maliepaard* et al.* (1998).

For this set of data, the construction of a linkage map went not without difficulties. Repeated runs of the program led to different outcomes. The smallest value found for the optimization criterion was 156. However, more frequently values of 161 were found. To show the difference between outcomes, Figure 6 contains a linkage map with 161 recombinations and one with 156 recombinations. In both cases, the numbers of recombinations in the maternal meiosis are equal to 93.

Differences in number of recombinations occur in the paternal meiosis. Differences between outcomes are differences in marker order (in combination with differences in inheritance vector). The phases of the markers are always the same. In Figure 6 the numbers of recombinations in the paternal meiosis are 68 and 63, respectively. This difference is caused mainly by a change in the order of markers (m14, m15, m16, m17, m18, and m21). However, going from one outcome to the other would require not only changing the marker order in a complicated way, but also at the same time adapting inheritance vectors. The phases can be kept fixed.

One way to proceed is to construct maps for the maternal and the paternal meiosis separately. In this case the data can be split up into two data sets, one in which all markers have segregation type <ab × aa> (maternal meiosis) and one in which in which all markers have segregation type <aa × ab> (paternal meiosis). Markers with original segregation type <ab × cd> appear in both data sets. The algorithm can be applied to both data sets. Results are presented in Figure 7. The numbers of recombinations in the maternal and paternal meiosis are 93 and 63, respectively. In the maternal meiosis two recombinations are found between (m17, m21) and m18, whereas in the paternal meiosis 12 recombinations are found between (m17, m18) and m21. It may be concluded that m21 may be the source of the mapping problems. It should be noted that the map obtained in this way is only slightly different from the map with 156 recombinations obtained by simultaneous mapping of the maternal and the paternal meiosis. It looks also very similar to the map obtained by Maliepaard* et al*. (1998), who used JoinMap (Stam 1993; Van Ooijen and Voorrips 2001) to obtain separate maps for the maternal and the paternal meiosis. The aberrant behavior of m21 can also be detected from a visual inspection of the inheritance vectors.

Inheritance vectors of the maternal and the paternal meiosis are shown in Figure 8. Figure 8 shows that marker m21 has 10 singletons in the paternal meiosis. Deleting marker m21 from the data set leads to the map given in Figure 9. The number of recombinations for the map is 141: again 93 in the maternal meiosis but only 48 in the paternal meiosis. In the maternal meiosis m17 and m18 are at the same position, and in the paternal meiosis m17 and m18 are four recombinations apart. However, occasionally the program gives an optimum solution with 144 recombinations: 93 in the maternal meiosis and 48 in the paternal meiosis. This problem is caused by the large number of missing observations of marker m14 in combination with the large distance between the group (m3, m4, m5, and m6) and the group (m14, m15, m16, m17, and m18). Marker m12 also has a large number of missing observations; it does not give problems because it is very close to other markers.

Deleting markers m12 and m14 gives an optimum solution with 138 recombinations: 93 in the maternal meiosis and 45 in the paternal meiosis. These results show that deleting m12 has no effect on the optimality criterion. Figure 10 shows plausible positions for the markers in the maternal and the paternal meiosis plotted against the optimum position of the markers. It can be concluded from Figure 10 that some markers are separate from other markers but that markers also occur in clusters.

## DISCUSSION

The algorithm developed in this article provides a simple means for the construction of linkage maps in full-sib families of outbreeding species. It is based on simple changes of order, phase, and inheritance vector. Jansen* et al.* (2001) develop a method to determine a set of maps that are plausible with the current set of data. In the case of backcross with complete data, this method requires that near the optimum any marker order can be reached from any other marker order and the process does not get stuck in an “absorbing barrier.” However, in the case of a full-sib family of an outbreeding species this method requires that near the optimum any configuration (marker order, inheritance vector) can be reached from any other configuration. This may require an algorithm that is able to take larger steps by testing combinations of changes of marker order and changes of inheritance vector. Since phase changes have a large effect on the optimization criterion, marker phases can be fixed.

For dense marker maps, a two-stage approach involving thinning of markers (Jansen* et al.* 2001) may be required to avoid problems due to errors in the data. One of the problems may be that a number of map fragments are produced that are not joined up correctly. In the first stage a set of markers, determined in such a way that they cover the whole map, is used to construct a framework map. In the second stage the other markers are added to the framework map close to their optimum position. However, the two-stage approach is based on pairwise recombination frequencies, and in the current approach pairwise recombination frequencies become available as the result of the mapping process. Therefore, selection of markers can be performed only after running the procedure described in this artilce. This procedure may have to be carried out a number of times, taking into account the information that markers carry relative to other markers.

The above procedure can also be applied to F_{2} populations with dominant markers, with combinations of dominant and codominant markers, and in situations where markers are scored partly dominant. Knapp* et al.* (1995) showed that maximum-likelihood estimates of pairwise recombination frequencies between repulsion phase dominant markers are very unreliable. The recombination frequency is always estimated by zero when the observed number of double-recessive phenotypes is zero. Mester* et al.* (2003) provide a solution by first separating the data into two complementary subsets. They use common codominant markers to integrate the two maps. The method described in this article provides an efficient way to obtain an integrated map for mixed-phase dominant markers in one go (this will be communicated in a separate article).

The construction of maps of autotetraploids is a much greater problem than that of maps of allogamous diploids. The problem is that a larger number of offspring phenotypes cover an even larger number of possible hidden segregation types, phases, and inheritance vectors. The problem can be solved partly by using multiallelic markers. At present, the construction of linkage maps for autotetraploid species is still based on estimates of pairwise recombination frequencies (Luo* et al.* 2001; Hackett* et al.* 2003). Also in this case methods based on inheritance vector reconstruction may provide a powerful tool for the construction of linkage maps and the evaluation of their reliability.

The methods described in this article may also be useful for the integration of marker information from different populations, where markers may have different segregation types/phases in different populations. A special case would be the combination of information from a series of half-sib families. In this case usually many missing data occur, since markers may not have been scored in all populations or families. The efficiency of the approach then depends very much on the treatment of missing data, since missing data put no restriction on unknown parameters.

## Acknowledgments

I thank Chris Maliepaard, Johan van Ooijen, and Marco Bink for their comments on an earlier version of the article. Chris Maliepaard is also thanked for preparing and discussing the apple data. The constructive comments of two anonymous referees are very much appreciated.

## References

- Aarts, E., and J. K. Lenstra, 1997
*Local Search in Combinatorial Optimization*. Wiley, Chichester, UK. - Dempster, A. P., N. M. Laird and D.B. Rubin, 1977. Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. B 39
**:**1–38. - George, A. W., K. L. Mengersen and G. P. Davis, 1999. A Bayesian approach to ordering gene markers. Biometrics 55
**:**419–429. [PubMed] - Guilherme, J. M. R., B. S. Yandell and D. Gianola, 2002. A Bayesian approach for constructing genetic maps when markers are miscoded. Genet. Sel. Evol. 34
**:**353–369. [PMC free article] [PubMed] - Hackett, C. A., B. Pande and G. J. Bryan, 2003. Constructing linkage maps in autotetraploid species using simulated annealing. Theor. Appl. Genet. 106
**:**1107–1115. [PubMed] - Jansen, J., A. G. De Jong and J. W. Van Ooijen, 2001. Constructing dense genetic linkage maps. Theor. Appl. Genet. 102
**:**1113–1122. - Kirkpatrick, S., C. D. Gelatt and M. P. Vecchi, 1983. Optimization by simulated annealing. Science 220
**:**671–680. [PubMed] - Knapp, S. J., J. L. Holloway, W. C. Bridges and B.-H. Lui, 1995. Mapping dominant markers using F
_{2}mating. Theor. Appl. Genet. 91**:**74–81. [PubMed] - Lander, E. S., and P. Green, 1987. Construction of multilocus genetic maps in humans. Proc. Natl. Acad. Sci. USA 84
**:**2363–2367. [PMC free article] [PubMed] - Luo, Z. W., C. A. Hackett, J. E. Bradshaw, J. W. Mcnicol and D. Milbourne, 2001. Construction of a genetic linkage map in tetraploid species using molecular markers. Genetics 157
**:**1369–1385. [PMC free article] [PubMed] - Maliepaard, C., J. Jansen and J. W. Van Ooijen, 1997. Linkage analysis in a full-sib family of an outbreeding plant species: overview and consequences for applications. Genet. Res. 70
**:**237–250. - Maliepaard, C., F. H. Alston, G. Van Arkel, L. M. Brown, E. Chevreau
*et al.*, 1998. Aligning male and female linkage maps of apple (*Malus pumila*Mill.) using multi-allelic markers. Theor. Appl. Genet. 97**:**60–73. - Mester, D., Y. Ronin, D. Minkov, E. Nevo and A. Korol, 2003. Constructing large-scale genetic maps using an evolutionary strategy algorithm. Genetics 165
**:**2269–2282. [PMC free article] [PubMed] - Ridout, M. S., S. Tong, C. J. Vowden and K. R. Tobutt, 1998. Three-point linkage analysis in crosses of allogamous plant species. Genet. Res. 72
**:**111–121. - Ritter, E., C. Gebhardt and F. Salamini, 1996. The calculation of recombination frequencies in crosses of allogamous plant species with applications to linkage mapping. Genet. Res. 67
**:**55–65. - Stam, P., 1993. Construction of integrated genetic linkage maps by means of a new computer package: JoinMap. Plant J. 3
**:**739–744. - Thompson, E. A., 1994. Monte Carlo likelihood in genetic mapping. Stat. Sci. 9
**:**355–366. - Thompson, E. A., 2001 Monte Carlo methods on genetic structures, pp. 175–218 in
*Complex Stochastic Systems*, edited by O. E. Barndorff-Nielsen, D. R. Cox and C. Klüppenberg. Chapman & Hall/CRC Press, London. - Van Laarhoven, P. J. M., and E. H. L. Aarts, 1987
*Simulated Annealing: Theory and Applications*. Reidel, Dordrecht, The Netherlands. - Van Ooijen, J. W., and R. E. Voorrips, 2001
*JoinMap 3.0 Software for the Calculation of Genetic Linkage Maps*. Plant Research International, Wageningen, The Netherlands. - Voorrips, R. E., 2002. MapChart: software for the graphical presentation of linkage maps and QTLs. J. Hered. 93
**:**77–78. [PubMed] - Wu, R., C. Ma, I. Painter and Z. Zeng, 2002. Simultaneous maximum likelihood estimation of linkage and linkage phases in outcrossing species. Theor. Popul. Biol. 61
**:**349–363. [PubMed]

**Genetics Society of America**

## Formats:

- Article |
- PubReader |
- ePub (beta) |
- PDF (579K) |
- Citation

- Construction of a genetic linkage map in tetraploid species using molecular markers.[Genetics. 2001]
*Luo ZW, Hackett CA, Bradshaw JE, McNicol JW, Milbourne D.**Genetics. 2001 Mar; 157(3):1369-85.* - Development of an integrated genetic map of a sugarcane (Saccharum spp.) commercial cross, based on a maximum-likelihood approach for estimation of linkage and linkage phases.[Theor Appl Genet. 2006]
*Garcia AA, Kido EA, Meza AN, Souza HM, Pinto LR, Pastina MM, Leite CS, Silva JA, Ulian EC, Figueira A, et al.**Theor Appl Genet. 2006 Jan; 112(2):298-314. Epub 2005 Nov 24.* - Constructing linkage maps in autotetraploid species using simulated annealing.[Theor Appl Genet. 2003]
*Hackett CA, Pande B, Bryan GJ.**Theor Appl Genet. 2003 Apr; 106(6):1107-15. Epub 2003 Feb 8.* - Linkage analysis: progress and problems.[Philos Trans R Soc Lond B Biol Sci. 1994]
*Bishop DT.**Philos Trans R Soc Lond B Biol Sci. 1994 Jun 29; 344(1310):337-43.* - Basic genetics.[Prim Care. 2004]
*Korf BR.**Prim Care. 2004 Sep; 31(3):461-78, vii.*

- High density SNP and SSR-based genetic maps of two independent oil palm hybrids[BMC Genomics. ]
*Ting NC, Jansen J, Mayes S, Massawe F, Sambanthamurthi R, Ooi LC, Chin CW, Arulandoo X, Seng TY, Alwee SS, Ithnin M, Singh R.**BMC Genomics. 15309* - Oil palm genome sequence reveals divergence of interfertile species in old and new worlds[Nature. 2013]
*Singh R, Ong-Abdullah M, Low ET, Manaf MA, Rosli R, Nookiah R, Ooi LC, Ooi S, Chan KL, Halim MA, Azizi N, Nagappan J, Bacher B, Lakey N, Smith SW, He D, Hogan M, Budiman MA, Lee EK, DeSalle R, Kudrna D, Goicoechea JL, Wing R, Wilson RK, Fulton RS, Ordway JM, Martienssen RA, Sambanthamurthi R.**Nature. 2013 Aug 15; 500(7462)335-339* - Identification of QTLs Associated with Callogenesis and Embryogenesis in Oil Palm Using Genetic Linkage Maps Improved with SSR Markers[PLoS ONE. ]
*Ting NC, Jansen J, Nagappan J, Ishak Z, Chin CW, Tan SG, Cheah SC, Singh R.**PLoS ONE. 8(1)e53076* - Development of a dense SNP-based linkage map of an apple rootstock progeny using the Malus Infinium whole genome genotyping array[BMC Genomics. ]
*Antanaviciute L, Fernández-Fernández F, Jansen J, Banchi E, Evans KM, Viola R, Velasco R, Dunwell JM, Troggio M, Sargent DJ.**BMC Genomics. 13203*

- MedGenMedGenRelated information in MedGen
- PubMedPubMedPubMed citations for these articles
- TaxonomyTaxonomyTaxonomy records associated with the current articles through taxonomic information on related molecular database records (Nucleotide, Protein, Gene, SNP, Structure).
- Taxonomy TreeTaxonomy Tree

- Construction of Linkage Maps in Full-Sib Families of Diploid Outbreeding Species...Construction of Linkage Maps in Full-Sib Families of Diploid Outbreeding Species by Minimizing the Number of Recombinations in Hidden Inheritance VectorsGenetics. 2005 Aug; 170(4)2013

Your browsing activity is empty.

Activity recording is turned off.

See more...