![]() | ![]() |
Formats:
|
||||||||||||||||||||||||
Copyright © 2005 by the Genetics Society of America Insertional Polymorphism and Antiquity of PDR1 Retrotransposon Insertions in Pisum Species *Plant Research Unit, University of Dundee at SCRI, Invergowrie, Dundee, DD2 5DA, United Kingdom and †Department of Crop Genetics, John Innes Centre, Norwich, NR4 7UH, United Kingdom 1These authors contributed equally to this work. 2Present address: Institute of Cytology and Genetics, Novosibirsk 630090, Russia. 3Corresponding author: Plant Research Unit, Division of Applied and Environmental Biology, School of Life Sciences, University of Dundee at SCRI, Invergowrie, Dundee DD2 5DA, United Kingdom. E-mail: a.j.flavell/at/dundee.ac.uk Communicating editor: J. A. Birchler Received May 5, 2005; Accepted June 30, 2005. This article has been cited by other articles in PMC.Abstract Sequences flanking 73 insertions of the retrotransposon PDR1 have been characterized, together with an additional 270 flanking regions from one side alone, from a diverse collection of Pisum germ plasm. Most of the identified flanking sequences are repetitious DNAs but more than expected (7%) lie within nuclear gene protein-coding regions. The approximate age of 52 of the PDR1 insertions has been determined by measuring sequence divergence among LTR pairs. These data show that PDR1 transpositions occurred within the last 5 MY, with a peak at 1–2.5 MYA. The insertional polymorphism of 68 insertions has been assessed across 47 selected Pisum accessions, representing the diversity of the genus. None of the insertions are fixed, showing that PDR1 insertions can persist in a polymorphic state for millions of years in Pisum. The insertional polymorphism data have been compared with the age estimations to ask what rules control the proliferation of PDR1 insertions in Pisum. Relatively recent insertions (< ~1.5MYA) tend to be found in small subsets of the Pisum accessions set, “middle-aged” insertions (between ~1.5 and 2.5 MYA) vary greatly in their occurrence, and older insertions (> ~2.5 MYA) are mostly found in small subsets of Pisum. Finally, the average age estimate for PDR1 insertions, together with an existing data set for PDR1 retrotransposon SSAP markers, has been used to derive an estimate of the effective population size for Pisum of ~7.5 × 105. RETROTRANSPOSONS are mobile genetic elements that transpose into different loci replicatively through reverse transcription of RNA intermediates. Retrotransposons are found in all kingdoms of life and are ubiquitous in the genomes of plants (Flavell et al. 1992a; Voytas et al. 1992; Suoniemi et al. 1998; Noma et al. 1999; Schmidt 1999). Long terminal repeat (LTR) retrotransposons tend to be the dominant retrotransposon class in plants and have been classified into two main groups, the Ty1-copia group and the Ty3-gypsy group, on the basis of conserved sequence features and gene order (Xiong and Eickbush 1990), although more recent findings show this to be an oversimplification (Havecker et al. 2004). Each of the retrotransposon groups typically contains a great variety of different retrotransposons (Konieczny et al. 1991; Flavell et al. 1992b), which are found in widely different numbers of copies per genome, from a few to tens of thousands. Collectively, huge numbers of LTR retrotransposon insertions are found in the genomes of many plant species and can constitute more than half the entire genome in some cases (Sanmiguel et al. 1996; Kumar and Bennetzen 1999). A variety of PCR-based systems have been developed to detect insertional polymorphism of retrotransposons in plants (Waugh et al. 1997; Ellis et al. 1998; Flavell et al. 1998; Kalendar et al. 1999; Provan et al. 1999; Yu and Wise 2000; Porceddu et al. 2002). Most of these are multiplex approaches, which display the regions flanking individual retrotransposon insertions as bands on gels. Such methods can generate large amounts of data easily and are very useful for determining the genetic diversity of germ plasm (Ellis et al. 1998). In contrast, retrotransposon-based insertion polymorphisms (RBIPs) detect individual insertions by PCR with flanking host sequence primers and a retrotransposon-specific primer (Flavell et al. 1998). RBIP produces less data per experiment than do multiplex approaches but is more accurate for studies of deeper phylogeny in wide germ plasm, because it is a codominant method that uses two simple PCRs to detect both presence and absence of the insertion, whereas multiplex approaches detect only insertion presence and absence is inferred by band absence, which can result from mutation in PCR primer sites. The field pea (Pisum sativum) has a large genome (~4.5 × 109 bp), which is relatively stable in size between species (Greilhuber and Ebert 1994; Baranyi et al. 1996). At present, rather little genomic sequence is available for Pisum but the repetitious DNAs of the genus are better understood. A variety of retrotransposons, transposons, and other repetitious DNAs have been characterized for Pisum (Lee et al. 1990; Chavanne et al. 1998; Nouzova et al. 2000; Neumann et al. 2001, 2003; Macas et al. 2003). Pea is a predominantly inbreeding Old World legume crop first cultivated ~10,000 years ago (Blixt 1972; Zohary 1996; Mithen 2003). Cultivated Pisum retains a wide gene pool, both phenotypically and genotypically, and wild Pisum species extend this diversity still further. Traditionally, one cultivated species, P. sativum, and three wild taxa, P. elatius, P. humile, and P. fulvum, have been recognized. However, a combination of molecular and other approaches has led to the conclusion that only P. fulvum is a truly distinct species, with the others forming a single-species complex (Vershinin et al. 2003). Insertional polymorphism for four Ty1-copia group retrotransposons, a Ty3-gypsy group retrotransposon, and a CACTA transposon have been measured by the multiplex sequence-specific amplification polymorphisms (SSAP) approach in Pisum (Vershinin et al. 2003). These diverse mobile elements all produce roughly similar pictures of the diversity of Pisum. P. fulvum represents a distinct, though diverse clade, and P. abyssinicum forms another, far more compact clade. Finally, P. elatius is the most diverse germ plasm set, with P. humile and P. sativum falling within its boundaries. However, individual SSAP markers from any of these species can frequently be found in another one, suggesting that introgression by outbreeding has played a significant role in the genomic evolution of the genus (Vershinin et al. 2003). PDR1 was the first Ty1-copia group retrotransposon to be isolated from Pisum (Lee et al. 1990) and remains the best understood. It is one of the smallest and simplest transposition-competent LTR retrotransposons known, with 156-bp LTRs and the typical gag-pr-int-rt-rnaseH gene order of the Ty1-copia group. PDR1 is present across the entire Pisum genus in ~200 dispersed copies per haploid genome (Lee et al. 1990; Ellis et al. 1998) and >95% of insertions are polymorphic within the genus (Ellis et al. 1998; Vershinin et al. 2003). Linkage mapping has shown it to be broadly distributed within the Pisum genome (Ellis et al. 1998). The purpose of this study was, first, to discover the genomic environment of PDR1 insertions by sequencing the surrounding DNA for a large set of insertions; second, to investigate the distribution of these insertions within the genus Pisum by the RBIP approach; third, to determine the antiquity of the insertions; and finally, to compare the age estimations with the occupancy data to determine what rules control the fates of PDR1 insertions in the Pisum genus. MATERIALS AND METHODS Plant material and DNA isolation: Pisum accessions from the John Innes Pisum Collection were selected on the basis of previous studies (Vershinin et al. 2003) to represent the diversity of the genus. Genomic DNAs were isolated from young leaf tissue using Qiagen (Valencia, CA) DNeasy 96 plant kits following the manufacturer's instructions. Isolation of genomic sequences flanking PDR1 retrotransposon insertions: DNAs from a variety of Pisum accessions were digested with TaqI restriction endonuclease, followed by ligation with TaqI adapters (Table 1). SSAP PCRs were then carried out (Ellis et al. 1998), with a PDR1-specific primer (see below) and TaqI adapter primer 9011 (Table 1), to create pools of mixed PCR products, each containing a fragment of a PDR1 LTR, together with its flanking host genomic DNA. These were either cloned directly into bacterial vector (see below) or separated by polyacrylamide gel electrophoresis before isolation and cloning (see below).
SSAP reactions used either conventional Taq DNA polymerase in conventional buffer (Ellis et al. 1998) or Qiagen Hotstar Taq DNA polymerase in unmodified Qiagen buffer (no extra magnesium or Q buffer) and 0.2 pmol/μl of each primer. Hot-start PCR conditions were 95° for 15 min; then 30× 94°, 60°, 72°, each for 1 min; and then 72° for 7 min. All primers were designed for melting temperatures of 60–65° in 50 mm cation concentration. PCRs amplifying sequences 5′ to the PDR1 insertions (i.e., upstream of the major retrotransposon transcript; 5′ SSAP PCRs) used oligonucleotide 9124 or JIp_101 [Table 1, supplementary Figure S1 (http://www.genetics.org/supplemental/)] and PCRs amplifying sequences 3′ to the insertions (3′ SSAP PCRs) used oligonucleotide 9479, 15940, or nested PCR with primer 15940 followed by 15941. The use of the latter primer pair introduced a SacI cut site into the PDR1 end of the SSAP fragment, which was used with a similarly engineered BamHI site in the TaqI adapter primer (9011; Table 1) to clone fragments directionally into SacI/BamHI double-digested M13mp18 vector DNA. Clones were sequenced by using BigDye v2.0 (PE Biosystems). A total of 131 5′ SSAP sequences and 554 3′ SSAP sequences were obtained, representing 67 unique 5′ sequences and 203 unique 3′ sequences, respectively (415 duplicate sequences were obtained). Isolation of PDR1 RBIP insertions: To develop RBIPs, genome sequence data were needed from both sides flanking the insertion (Flavell et al. 1998). Three variant methods were used for this (Figure 1
Method 1: matching target site duplications of 5′- and 3′-flanking sequences: 5′ and 3′ SSAP reactions, oriented outward in both directions from the PDR1 LTR into the flanking host DNA, were carried out with primer oligonucleotides PL (usually 9124) or PR (usually 9479) and Taq adapter oligonucleotide [Table 1, supplementary Figure S1 (http://www.genetics.org/supplemental/)]. The two pools of PCR products were treated with Klenow fragment DNA polymerase (New England Biolabs, Beverly, MA) to generate blunt ends, followed by T4 polynucleotide kinase (New England Biolabs), before cloning into M13mp18 bacteriophage vector linearized with HincII restriction endonuclease (Roche, Indianapolis), and then treated with calf intestinal phosphatase (Roche) to reduce background from insert-lacking clones. Random subclones were sequenced. Sequences derived from 5′ SSAP PCR were then compared with those from 3′ SSAP PCR to identify pairs possessing identical 5-base target site duplications (TSDs) flanking the retrotransposon insertion. Such sequence pairs, representing putative pairs of LTR-host junctions from the same PDR insertion, were then tested by PCR with primer pairs derived from 5′- and 3′-flanking genomic DNA in six highly diverse Pisum accessions. Any pair that generated a new band in one or more of these accessions was tested by sequencing to determine if it represented host genomic sequence unoccupied by the PDR1 insertion (Figure 1 Method 2: matching segregation patterns for 5′ and 3′ SSAPs in mapping populations: Radioactive 5′ and 3′ SSAP reactions (Ellis et al. 1998) were carried out separately using 33P-labeled PDR1-specific primers, 9124 and 15940 [Table 1, supplementary Figure S1 (http://www.genetics.org/supplemental/)], respectively, on DNAs of 20 recombinant inbred lines (RILs) from a mapping population derived from a cross between accessions JI15 and JI399 of the John Innes Pisum Collection. The products were visualized by polyacrylamide gel electrophoresis, followed by autoradiography. Candidate pairs of SSAP bands, which cosegregated in the 20 RILs, represented putative pairs of LTR-host junctions from the same PDR1 insertion. These pairs of bands were extracted from the dried gels (Knox 2005), reamplified, and sequenced using BigDye v 3.0 (ABI, Columbia, MD) to confirm identity of the 5-bp TSD and then the allelic state of the locus (occupied or unoccupied) was investigated in the parents of the mapping population by PCR as for method 1. Four RBIPs were obtained by this approach. Method 3: SSAP from flanking host sequence: Genomic sequences flanking PDR1 insertions derived from 3′ SSAP (see method 1) were used to design nested primers (Figure 1 Isolation and sequence determination of PDR1 LTRs from RBIP insertions: Matched pairs of complete LTRs from individual PDR1 RBIP insertions were amplified with 5′ and 3′ host genome flanking primers [primers L and R in supplementary Table S1 (http://www.genetics.org/supplemental/)] and PDR1 primer 20762, 9975, or JI_101, respectively [Table 1, supplementary Figure S1 (http://www.genetics.org/supplemental/)], using genomic DNA of the original occupied accession used for RBIP development as template. The PCR products were purified with QIAquick Spin columns (Qiagen), cloned into pGEM-T easy vector (Promega, Madison, WI) following the manufacturer's instructions, and sequenced by BigDye automated sequencing (PE Biosystems). LTRs were sequenced initially on one strand and any polymorphism was confirmed by visual comparison of the sequence trace against the wild-type allele trace. Any remaining ambiguities were resolved by sequencing the complementary strand. The four insertions described in method 2 above were sequenced directly from PCR products as described above. Bioinformatics analysis: Homology searches for all the isolated flanking sequences were performed by running BLASTN and TBLASTX programs against the NCBI database (Altschul et al. 1997). BLAST searches were done to a local version of the NCBI nonredundant nucleotide database downloaded in April 2004. Blastall was used to batch BLAST the sequences, using both BLASTN and TBLASTX searches and the results were limited to 30 hits with E-values <0.2. For BLASTN searches a word size of 11 was used and for TBLASTX BLOSUM62 was used for the matrix with a word size of 3. Database hits were visually checked to verify their identities and apparent hits within gene-coding regions were carefully checked to ensure that they did not derive from insertions into non-protein-coding regions or the coding regions of mobile elements. Estimation of the synonymous nucleotide substitution rate for the Pisum lineage: To calibrate the molecular clock for the Pisum lineage, divergence times for P. sativum vs. Medicago truncatula, Glycine max, and Acacia mangium of 32, 46, and 54 MYA, respectively, were taken from Wojciechowski (2003). Synonymous nucleotide substitution rates were calculated by comparison of exons of two single-copy nuclear genes, namely GdcH from M. truncatula (EST_BF519088) and P. fulvum JI1010 (AJ938069) and Uni from Acacia (AY229890), Pisum (AF035163), and Medicago (AC139708), using DIVERGE [Wisconsin Package Version 10.0; Genetics Computer Group (GCG), Madison, WI]. Ks-values calculated for Acacia-pea (Uni), Medicago-pea (Uni), and Medicago-pea (Gdch) were 1.0, 0.48, and 0.27. Applying the estimated divergence times for the host species described above, these Ks values correspond to substitution rates of 9.3 × 10−9, 7.5 × 10−9, and 4.2 × 10−9 substitutions/site/year, respectively. The average of these values, 7.0 substitutions/site/year, with standard deviation of 2.6 × 10−9, was used as the rate of synonymous substitutions (r). Estimations of the antiquities of PDR1 insertions: For each PDR1 insertion, the two LTR nucleotide sequences were aligned using ClustalW (Thompson et al. 1994) with default options (http://www.ebi.ac.uk/clustalw/index.html). The number of nucleotide substitutions per site was calculated from these alignments using Kimura's two-parameter model (Kimura 1980). Corresponding insertion ages for the PDR1 elements were estimated using the formula T = K/2r, where T is the time of insertion, K is the divergence parameter, and r is the average substitution rate (taken as the Ks value estimated above; Li and Graur 1991). Estimation of effective population size for Pisum: Allele frequencies of PDR1 insertions were calculated in a set of 259 SSAP markers scored in 52 Pisum accessions (Vershinin et al. 2003), using an Excel spreadsheet. The same spreadsheet was used to obtain the average heterozygosity value He from the corresponding average homozygosity value (the sum of squares of the allele frequencies) and 4Neν (= M) from He, using Equation 2 in Results. RESULTS Isolation of sequences flanking PDR1 insertions: The overall goal of this study was to gain understanding of the nature, distribution, and antiquity of PDR1 insertions in the Pisum genus. This required flanking genomic sequence information from both sides of numerous PDR1 insertions, together with corresponding sequence information for both LTRs (see below). Only two cloned PDR1 insertions were available at the start of this study (Lee et al. 1990; Flavell et al. 1998) and, surprisingly, database searches failed to add any more (data not shown). Therefore, an efficient way of cloning multiple PDR1 insertions from a wide variety of genetically diverse individuals was needed. Three different methods for isolating PDR1 insertions were tested in parallel (Figure 1 Method 1 exploits the fact that PDR1 creates a 5-bp TSD of host sequence upon integration (Lee et al. 1990). Thirty-one different sequences containing host-PDR1 5′ junctions (as defined by the polarity of the PDR1 open reading frame; Lee et al. 1990) were compared with 200 different 3′ junctions from the same plant accession, JI399. Six candidate pairs of junction sequences possessed identical 5-bp duplications. These were tested by PCR in a set of five highly diverse Pisum accessions to search for corresponding loci lacking PDR1 insertions (unoccupied sites). Two putative unoccupied sites were identified and both were validated by sequence analysis, yielding RBIP insertions 399-14-9 and 399-80-46. Later bioinformatics analysis showed that a third sequence pair, which did not produce a putative unoccupied site PCR band in the test set of pea DNAs, derived from a third RBIP insertion, 399-3-6, in a gene coding region. The second method for RBIP isolation used cosegregation in a genetic mapping population to identify candidate host-PDR1 junction pairs (Figure 1A The third method used for isolating RBIP insertions is based on the genomic walking method (Figure 1 B and C Initial tests using the genomic walking method on Pisum accession JI1794 were successful, yielding two RBIPs (1794-1 and 1794-2). Subsequently, 973 3′-flanking sequences of PDR1 were obtained from 13 highly diverse Pisum accessions. Cross-comparisons between these sequences showed the presence of large numbers of multiple clonings of the same insertions from different plant samples. The final tally of unique 3′-flanking sequences was 200, 150 of which were long enough to design good nested primers. These 150 primer pairs were subjected to the genomic walking experiment (Figure 1B In summary, the three methods for RBIP isolation yielded 73 PDR1 insertions [supplementary Table S1 (http://www.genetics.org/supplemental/)]. Cross-comparison between these revealed two duplicates, giving 71 newly isolated, unique RBIPs together with the 2 already isolated (Lee et al. 1990; Flavell et al. 1998). Sequence analysis of PDR1 insertion targets: The studies described above revealed 340 different genomic sequences flanking PDR1 insertion sites in 15 diverse Pisum accessions. To investigate the nature of these sequences, searches against the NCBI genome sequence databases were carried out on 320 of these, omitting those <30 nucleotides. Table 2 summarizes the result of this analysis and the complete information is shown in supplementary Table S2 (http://www.genetics.org/supplemental/).
Most of the target sequences (64%) are unknown, with no significant hit in the databases. We believe that this is mainly due to the small sizes of many of these sequences and the incomplete knowledge of the highly diverse repetitious DNAs of Pisum (see discussion). Thirty-nine percent of the identifiable target sequences for PDR1 insertion are themselves transposable elements and a further 31% are unknown repetitive sequences, which are likely to be mainly composed of unidentified mobile elements or their relics. This is unsurprising, because Pisum has a large genome (4.5 × 109 bp haploid) and like other similarly sized plant genomes, including the quite closely related Vicia genus (Hill et al. 2005), is known to be composed predominantly of repetitive DNA (Murray and Thompson 1982). RBIP markers derived from such insertions into repeated sequences should have yielded unoccupied-site PCR products in most or all pea samples but, interestingly, this did not happen in most cases (data not shown). We believe that this is due to the antiquity of these repeats, whose sequences have been eroded by mutation. Twenty percent of classified target sequences for PDR1 insertion (23 sequences or 7% of the 320 sequences analyzed) are protein-coding regions of genes, none of which derive from transposable elements. Each of these insertions probably generated a null allele for the gene concerned. This is a higher percentage than would be expected, considering the expected “gene space” and overall genome size of Pisum (see discussion). To determine whether PDR1 shows any nucleotide site specificity for insertion, the 30 nucleotides either side of all 340 unique insertions were searched for characteristic motifs (Figure 2
The antiquity of PDR1 insertions: It is possible to estimate the age of a retrotransposon insertion by looking at the sequence divergence between its LTRs, because these are synthesized from a single LTR RNA template before insertion (Sanmiguel et al. 1998; Bowen and McDonald 2001; Jiang et al. 2002a,b; Ma et al. 2004). Such estimations require knowledge of the neutral nucleotide substitution rate for the corresponding host nuclear genome. Published estimates for the synonymous nucleotide substitution rates of angiosperm nuclear genes vary a lot (between 1.5 and 7.1 × 10−9/site/year; Wolfe et al. 1987; Gaut et al. 1996; Small et al. 1998). Therefore, to estimate a synonymous substitution rate within the legumes, three synonymous substitution values (Ks) and corresponding substitution rates were obtained for the protein-coding regions of two genes, Unifoliata and GDCH, between P. sativum and the related legume genera, Medicago and Acacia (materials and methods). The average of the three synonymous rate values obtained is 7.0 × 10−9 substitutions/site/year (standard deviation of 2.6 × 10−9), in reasonable agreement with the above published estimates. To estimate the ages of the PDR1 insertions isolated in this study, their LTR pairs were sequenced and K-values (substitutions/site/year; Kimura 1980) were calculated for each pair. Fifty-two pairs of LTRs in total were isolated from the 73 available RBIPs (71 from this study and 2 isolated previously). For the other 21 RBIP markers one or both of the LTRs failed to amplify, presumably because of PCR primer site mutation. The results of this analysis are shown in Figure 3
Insertion site polymorphism for PDR1 within the genus Pisum: To study the distribution of PDR1 RBIP insertions across the Pisum genus a set of 47 highly diverse Pisum accessions, almost identical to a set chosen previously to analyze the evolutionary history of Pisum (Vershinin et al. 2003), was chosen. Sixty-eight of the PDR1 RBIP insertions were scored in the 47 accessions [supplementary Table S4 (http://www.genetics.org/supplemental/)]. To visualize the distribution of the insertions in the Pisum accessions, scores for each PDR1 RBIP insertion were plotted onto a phylogenetic tree previously deduced for these accessions using 892 retrotransposon-based SSAP markers (Vershinin et al. 2003). Representative results from this analysis are shown in Figure 4
Investigation of the relationship between insertional polymorphism and antiquity of PDR1: The availability of both insertion polymorphism data and antiquity estimations for PDR1 insertions provides the opportunity to search for relationships between these two parameters. The results of such an analysis for 43 PDR1 insertions for which both data sets are available are shown in Figure 5
An estimation of effective population size for Pisum: The average K-value deduced above can be used to deduce the effective population size of Pisum. According to the neutral theory (Kimura and Crow 1964; Kimura 1983a,b) the expected frequency distribution of allele abundance Φ(x) is determined by the effective population size Ne and the mutation rate ν:
The 52 RBIP retrotransposon insertion mutations studied here may be considered as neutral alleles but they do not constitute a large enough data set to test reliable fit to the above equation. However, the larger data set of 259 PDR1 SSAP markers in the highly similar Pisum core set used for the generation of the tree shown in Figure 4A
DISCUSSION The goals of this study were to investigate the nature and antiquity of PDR1 insertions in P. sativum, to gain knowledge of distribution of these insertions across the genus Pisum, and to use these data to investigate the rule(s) controlling the fates of PDR1 insertions in the genus. The targets for PDR1 insertions: Three hundred forty distinct sequences flanking PDR1 insertions were obtained in this study, allowing us to deduce a consensus target site sequence for insertion of PDR1, which shows similarities with the specificity for the Tos17 Ty1-copia group retrotransposon of rice (Figure 2 We have also identified the nature of PDR1 insertion sites. Most are other transposons, including retrotransposons and occasionally PDR1 itself [Table 2, supplementary Table S2 (http://www.genetics.org/supplemental/)]. This is unsurprising, as such repetitious DNAs compose a large proportion of pea genomic DNA. More surprisingly, 7% of PDR1 insertion sites are coding regions of nuclear genes that are not derived from transposable elements. This does not take into account PDR1 insertions into introns and gene flanks, because these are difficult to distinguish from nongenic DNA on the basis of short sequence reads, so the actual percentage of insertions within genes is probably higher than this. Assuming that pea has a gene number (~30,000) and average gene size (~2 kb) comparable to Arabidopsis and rice (http://www.ostp.gov/NSTC/html/mpgi2001/sequencing.htm), then the gene space of pea is expected to be ~6 × 107 nucleotides, which represents ~1.3% of the pea genome. There thus appear to be at least fivefold more PDR1 insertions within genes than expected. There are two plausible explanations for this discrepancy and these are not mutually exclusive. First, there may be more genes in Pisum than are found in Arabidopsis or rice; for M. truncatula gene number has been estimated at 37,000–46,000 (http://catg.ucdavis.edu/m.truncatula.pdf). Gene duplication may be a factor in this. There clearly is genetic redundancy in pea since the PDR1 insertions into coding regions described here were probably all complete knockouts (these are kilobase-sized inserts). Incidentally, the only PDR1 element to be sequenced entirely (Lee et al. 1990) resides in the close vicinity of a duplicated gene. However, considerable gene duplications are known for both rice and Arabidopsis and it remains unclear whether the level of gene duplication in pea exceeds that seen for these other species. To our knowledge there is no evidence for ancient polyploidy in Pisum. A second explanation for the higher than expected proportion of PDR1 insertion sites in genes is that PDR1 may have shown a preference for inserting into genic regions. Many cases in the literature of retrotransposons show insertional preferences (e.g., Kim et al. 1998; Presting et al. 1998) and under cell culture conditions both Tos17 in rice (Miyao et al. 2003) and Tnt1 in tobacco (M.-A. Grandbastien, personal communication) insert preferentially in genic regions, presumably as a consequence of an open chromatin configuration. However, in the intact organism every insertion is tested by natural selection and the existence of transposable elements that preferentially target genes would impose a large fitness cost on the host, particularly for diploid, predominantly selfing plants such as pea, as the majority of offspring would be homozygous for the insertions. A preference for insertion into genes may explain why PDR1 copy number appears to be quite strongly constrained to ~200 per genome across the genus Pisum (Lee et al. 1990; Ellis et al. 1998), although the exact relationship between PDR1 copy number and fitness would be critical (Brookfield 2005). Nevertheless, the allele frequency data for PDR1 insertions (Figure 6 Ages of PDR1 insertions and their relationship with insertional polymorphism in the genus Pisum: Our results indicate that PDR1 has been transposing within roughly the last 5 MYA, with a peak at ~1–2 MYA. We cannot comment on the earlier history of PDR1, because our PCR-based approach for isolating insertions becomes progressively less efficient for isolating older insertions as a result of primer site mutation. Nevertheless, our conclusions are quite similar to corresponding sequence-based data from maize and rice (Sanmiguel et al. 1998; Vitte et al. 2004). Surprisingly, none of the 43 insertions studied here have become fixed in the pea genome during this long period. This may be due to the predominantly selfing character of the species but the distribution of alleles across the Pisum diversity tree (Vershinin et al. 2003) indicates that introgression between highly diverse germ plasm has been an important factor in the evolution of the genus. While this introgression has been sufficient to shuffle many alleles it has apparently not been sufficiently widespread to drive many of these PDR1 elements to fixation. In a larger study using multiplex SSAP markers derived from several LTR retrotransposons including PDR1, ~2% of SSAP bands were seen to be fixed in a virtually identical set of pea samples (Vershinin et al. 2003). It should be noted that the PDR1 insertions with both LTR sequence and diversity data that have been studied here have successfully produced PCR products in several successive amplifications in diverse germ plasm. It is possible that this experimental approach reduced the detection frequency for older insertions and so effectively excluded ancient, fixed PDR1 elements from this study. Another possible factor in the distribution of PDR1 insertions is the geography of pea. Wild Pisum is distributed widely across Southern Asia, North Africa, and Southern Europe (Ambrose and Maxted 2000). Ancient insertions that show restricted distributions may represent geographically isolated plant lineages or they may represent the remnants of ancient, more widespread populations that are now in decline. More work is required to clarify this issue. A final caveat to our insertional polymorphism analysis is the possibility that some of the scores may be inaccurate because many of the PDR1 insertions described in this study are in repetitious DNA. In such cases the unoccupied site is present in multiple copies across the genome. This might lead to inaccuracy in scoring the state of the locus by the production of unoccupied-site PCR products, irrespective of whether the PDR1 insertion is present or not. In an extreme instance, such spurious unoccupied-site amplicons might outcompete the production of the occupied-site product, leading to a misscored sample. In practice, the majority of scores obtained in this study are either occupied or unoccupied [supplementary Figure S4 (http://www.genetics.org/supplemental/)], suggesting that this is not a problem for most of the 64 insertions scored and the exceptions have been largely confined to a small number of insertions that misbehave in multiple accessions. Actually, it is surprising that this potential problem has caused so few difficulties in the scoring of these PDR1 insertions and we suggest that this may be due to sequence decay in the insertion sites, which allows repetitious unoccupied sites to be amplified in effect as pseudo-single-copy sites. The effective population size of Pisum: The availability of rate data for retrotransposon insertions has allowed us to reexamine an earlier, larger set of retrotransposon polymorphism data and thereby obtain an estimated value for the effective population size of Pisum. The congruence between the observed allele frequency data and the plot obtained using Equation 1 suggests that this approach is valid. Several assumptions made during the deduction of Equation 1 need to be considered here. Most importantly, these population genetics models assume a random-mating population but Pisum is a predominant inbreeder. Every cross between two different Pisum genotypes would produce effectively a mixed recombinant inbred subpopulation carrying the original parental alleles. For the purposes of analyzing effective population size, such subpopulations approximate to individuals. This consideration suggests that the effective population size is very much smaller than actual population size and dominated by the harmonic mean of the effective population size per lineage per unit time (Kimura 1983a, pp. 40–43). Effective population size is an important parameter with regard to the domestication of crop plants such as Pisum from their wild progenitors. All of the major food crops fall into this category and the diversity of alleles is important with regard to important traits such as pest resistance and abiotic stress tolerance. The wild gene pool for Pisum, as for the other crop species, is wider than that of the domesticated samples. The effective population size is a useful measure of this diversity and the methods shown here offer a way to measure this parameter in Pisum and, by extrapolation, in other crop plants. The value of <1 million individuals, which we have obtained here, seems quite low for a reasonably common species with such broad geographic distribution. Acknowledgments We thank David Martin for help with database searches and Pete Isaac and Alan Schulman for many helpful discussions on all aspects of this work. This work was supported by grants 31502 (TEGERM) and FP6-2002-FOOD-1-506223 (Grain Legumes) from the European Commission under the Frameworks V and VI and by Biotechnology and Biological Sciences Research Council grant 94/BEP17084 (Bioinformatics and E-Science program). Notes References
|
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
|||||||||||||||||||||||
Mol Gen Genet. 1992 Jan; 231(2):233-42.
[Mol Gen Genet. 1992]Proc Natl Acad Sci U S A. 1992 Aug 1; 89(15):7124-8.
[Proc Natl Acad Sci U S A. 1992]Plant J. 1998 Mar; 13(5):699-705.
[Plant J. 1998]Mol Gen Genet. 1999 Feb; 261(1):71-9.
[Mol Gen Genet. 1999]Plant Mol Biol. 1999 Aug; 40(6):903-10.
[Plant Mol Biol. 1999]Mol Gen Genet. 1997 Feb 27; 253(6):687-94.
[Mol Gen Genet. 1997]Mol Gen Genet. 1998 Oct; 260(1):9-19.
[Mol Gen Genet. 1998]Plant J. 1998 Dec; 16(5):643-50.
[Plant J. 1998]Genome. 2000 Oct; 43(5):736-49.
[Genome. 2000]Plant Mol Biol. 1990 Nov; 15(5):707-22.
[Plant Mol Biol. 1990]Plant Mol Biol. 1998 May; 37(2):363-75.
[Plant Mol Biol. 1998]Genome. 2001 Aug; 44(4):716-28.
[Genome. 2001]Plant Mol Biol. 2003 Oct; 53(3):399-410.
[Plant Mol Biol. 2003]Mol Biol Evol. 2003 Dec; 20(12):2067-75.
[Mol Biol Evol. 2003]Plant Mol Biol. 1990 Nov; 15(5):707-22.
[Plant Mol Biol. 1990]Mol Gen Genet. 1998 Oct; 260(1):9-19.
[Mol Gen Genet. 1998]Mol Biol Evol. 2003 Dec; 20(12):2067-75.
[Mol Biol Evol. 2003]Mol Biol Evol. 2003 Dec; 20(12):2067-75.
[Mol Biol Evol. 2003]Mol Gen Genet. 1998 Oct; 260(1):9-19.
[Mol Gen Genet. 1998]Mol Gen Genet. 1998 Oct; 260(1):9-19.
[Mol Gen Genet. 1998]Plant J. 1998 Dec; 16(5):643-50.
[Plant J. 1998]Mol Gen Genet. 1998 Oct; 260(1):9-19.
[Mol Gen Genet. 1998]Nucleic Acids Res. 1997 Sep 1; 25(17):3389-402.
[Nucleic Acids Res. 1997]Nucleic Acids Res. 1994 Nov 11; 22(22):4673-80.
[Nucleic Acids Res. 1994]J Mol Evol. 1980 Dec; 16(2):111-20.
[J Mol Evol. 1980]Mol Biol Evol. 2003 Dec; 20(12):2067-75.
[Mol Biol Evol. 2003]Plant Mol Biol. 1990 Nov; 15(5):707-22.
[Plant Mol Biol. 1990]Plant J. 1998 Dec; 16(5):643-50.
[Plant J. 1998]Mol Gen Genet. 1998 Oct; 260(1):9-19.
[Mol Gen Genet. 1998]Plant Mol Biol. 1990 Nov; 15(5):707-22.
[Plant Mol Biol. 1990]Mol Gen Genet. 1998 Oct; 260(1):9-19.
[Mol Gen Genet. 1998]Nucleic Acids Res. 1990 May 25; 18(10):3095-6.
[Nucleic Acids Res. 1990]Nucleic Acids Res. 1995 Mar 25; 23(6):1087-8.
[Nucleic Acids Res. 1995]Plant Mol Biol. 1993 Apr; 22(1):101-12.
[Plant Mol Biol. 1993]Plant Mol Biol. 1990 Nov; 15(5):707-22.
[Plant Mol Biol. 1990]Plant J. 1998 Dec; 16(5):643-50.
[Plant J. 1998]Mol Genet Genomics. 2005 Jun; 273(5):371-81.
[Mol Genet Genomics. 2005]Nat Genet. 1998 Sep; 20(1):43-5.
[Nat Genet. 1998]Genome Res. 2001 Sep; 11(9):1527-40.
[Genome Res. 2001]Genetics. 2002 Jul; 161(3):1293-305.
[Genetics. 2002]Plant Physiol. 2002 Dec; 130(4):1697-705.
[Plant Physiol. 2002]Genome Res. 2004 May; 14(5):860-9.
[Genome Res. 2004]J Mol Evol. 1980 Dec; 16(2):111-20.
[J Mol Evol. 1980]Nat Genet. 1998 Sep; 20(1):43-5.
[Nat Genet. 1998]Mol Genet Genomics. 2004 Dec; 272(5):504-11.
[Mol Genet Genomics. 2004]Mol Biol Evol. 2003 Dec; 20(12):2067-75.
[Mol Biol Evol. 2003]Genetics. 1964 Apr; 49():725-38.
[Genetics. 1964]Mol Biol Evol. 1983 Dec; 1(1):84-93.
[Mol Biol Evol. 1983]Mol Biol Evol. 2003 Dec; 20(12):2067-75.
[Mol Biol Evol. 2003]Mol Biol Evol. 1983 Dec; 1(1):84-93.
[Mol Biol Evol. 1983]Plant Cell. 2003 Aug; 15(8):1771-80.
[Plant Cell. 2003]Proc Natl Acad Sci U S A. 2000 Jun 20; 97(13):7376-81.
[Proc Natl Acad Sci U S A. 2000]Genome Biol. 2002; 3(12):RESEARCH0084.
[Genome Biol. 2002]Proc Natl Acad Sci U S A. 1982 Jul; 79(13):4143-7.
[Proc Natl Acad Sci U S A. 1982]Plant Physiol. 2002 Dec; 130(4):1697-705.
[Plant Physiol. 2002]Plant Mol Biol. 1990 Nov; 15(5):707-22.
[Plant Mol Biol. 1990]Genome Res. 1998 May; 8(5):464-78.
[Genome Res. 1998]Plant J. 1998 Dec; 16(6):721-8.
[Plant J. 1998]Plant Cell. 2003 Aug; 15(8):1771-80.
[Plant Cell. 2003]Plant Mol Biol. 1990 Nov; 15(5):707-22.
[Plant Mol Biol. 1990]Mol Gen Genet. 1998 Oct; 260(1):9-19.
[Mol Gen Genet. 1998]Nat Rev Genet. 2005 Feb; 6(2):128-36.
[Nat Rev Genet. 2005]Genetics. 1998 Nov; 150(3):1245-56.
[Genetics. 1998]Science. 1996 Nov 1; 274(5288):765-8.
[Science. 1996]Nat Genet. 1998 Sep; 20(1):43-5.
[Nat Genet. 1998]Mol Genet Genomics. 2004 Dec; 272(5):504-11.
[Mol Genet Genomics. 2004]Mol Biol Evol. 2003 Dec; 20(12):2067-75.
[Mol Biol Evol. 2003]Plant Cell. 2003 Aug; 15(8):1771-80.
[Plant Cell. 2003]J Mol Evol. 1980 Dec; 16(2):111-20.
[J Mol Evol. 1980]Mol Biol Evol. 2003 Dec; 20(12):2067-75.
[Mol Biol Evol. 2003]Mol Biol Evol. 2003 Dec; 20(12):2067-75.
[Mol Biol Evol. 2003]Mol Biol Evol. 2003 Dec; 20(12):2067-75.
[Mol Biol Evol. 2003]Mol Biol Evol. 1983 Dec; 1(1):84-93.
[Mol Biol Evol. 1983]Plant J. 1998 Dec; 16(5):643-50.
[Plant J. 1998]Mol Gen Genet. 1998 Oct; 260(1):9-19.
[Mol Gen Genet. 1998]