![]() | ![]() |
Formats:
|
|||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright : © 2005 Kooij et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. A Plasmodium Whole-Genome Synteny Map: Indels and Synteny Breakpoints as Foci for Species-Specific Genes 1 Department of Parasitology, Malaria Group, Leiden University Medical Centre, Leiden, The Netherlands 2 The Institute for Genomic Research, Rockville, Maryland, United States of America 3 Pathogen Sequencing Unit, The Wellcome Trust Sanger Institute, The Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom John Boothroyd, Editor Stanford University, United States of America * To whom correspondence should be addressed. E-mail: Waters/at/lumc.nl ¤ Current address: Nuffield Department of Clinical Laboratory Sciences, University of Oxford and Blood Research Laboratory, National Blood Service, John Radcliffe Hospital, Headington, Oxford, United Kingdom Received August 8, 2005; Accepted November 16, 2005. This article has been cited by other articles in PMC.Abstract Whole-genome comparisons are highly informative regarding genome evolution and can reveal the conservation of genome organization and gene content, gene regulatory elements, and presence of species-specific genes. Initial comparative genome analyses of the human malaria parasite Plasmodium falciparum and rodent malaria parasites (RMPs) revealed a core set of 4,500 Plasmodium orthologs located in the highly syntenic central regions of the chromosomes that sharply defined the boundaries of the variable subtelomeric regions. We used composite RMP contigs, based on partial DNA sequences of three RMPs, to generate a whole-genome synteny map of P. falciparum and the RMPs. The core regions of the 14 chromosomes of P. falciparum and the RMPs are organized in 36 synteny blocks, representing groups of genes that have been stably inherited since these malaria species diverged, but whose relative organization has altered as a result of a predicted minimum of 15 recombination events. P. falciparum-specific genes and gene families are found in the variable subtelomeric regions (575 genes), at synteny breakpoints (42 genes), and as intrasyntenic indels (126 genes). Of the 168 non-subtelomeric P. falciparum genes, including two newly discovered gene families, 68% are predicted to be exported to the surface of the blood stage parasite or infected erythrocyte. Chromosomal rearrangements are implicated in the generation and dispersal of P. falciparum-specific gene families, including one encoding receptor-associated protein kinases. The data show that both synteny breakpoints and intrasyntenic indels can be foci for species-specific genes with a predicted role in host-parasite interactions and suggest that, besides rearrangements in the subtelomeric regions, chromosomal rearrangements may also be involved in the generation of species-specific gene families. A majority of these genes are expressed in blood stages, suggesting that the vertebrate host exerts a greater selective pressure than the mosquito vector, resulting in the acquisition of diversity. Synopsis Malaria, caused by the parasite Plasmodium falciparum, is one of the most devastating infectious diseases. Rodent malaria parasites (RMPs), such as P. berghei, P. chabaudi, and P. yoelii, are used as models for P. falciparum. For the use of these models in studies of human disease, insight into both the similarities and differences in the genomics and biology of these parasites is important. The availability of significant but partial genome data of the RMPs enabled the construction of a virtual composite RMP genome and its comparison with the P. falciparum genome, generating a so-called synteny map. Analysis of this map provided the desired comparative insights. A high level of conservation exists between roughly 85% of the genes at the level of content and order, but 168 P. falciparum-specific genes that disrupted the conserved genome segments were identified. The majority of these genes were predicted to play a role in host–parasite interactions. This study indicates that determination of the synteny breakpoints may help to rapidly identify the species-specific gene content of future Plasmodium genomes, providing the malaria research community with a powerful investigative tool. The findings may also be of interest to those studying chromosomal evolution. Introduction Comparative genomics enables inferences to be drawn concerning the coding potential of related genomes and the evolutionary forces that have influenced genome organization [1]. The resolving power of whole-genome comparisons to a large extent depends upon the proximity of the phylogenetic relationship between the species. Comparative eukaryotic genome studies of several species from a wide range of lineages and different times of divergence have revealed that the level of both the conservation of organization and the recombination rates are relatively variable. Human and mouse, which diverged ~75 million years (My) ago, have a predicted gene content that is 80% orthologous [2] arranged in 281 synteny blocks (SBs) larger than 1 Mb [3]. Three-way alignment of the human genome with that of mouse and rat confirmed the conservation of ~280 SBs between human and each of the rodent genomes, while the more closely related rat and mouse genomes are ~90% orthologous with a reduced number of 105 shared SBs of larger average size [4]. Subsequent publication of the chicken genome, which diverged from the mammalian genomes ~310 My ago, provided the first nonmammalian amniote genome sequence and allowed a four-way whole-genome comparison [5] revealing 586 smaller, conserved SBs. Here, roughly 50% of the human genes have a chicken ortholog reducing to 35% that have orthologs in both chicken and pufferfish (estimated time of divergence ~450 My). These data show that, in terms of the extent of organization and gene homology, the level of genomic conservation can generally be considered to be relatively proportional to the time of divergence, within these species. However, a more recent comparison of genome sequences from eight mammals demonstrated that the rates of chromosomal rearrangements can vary both between species and in time (about 0.2–2 breaks/My) [6]. In contrast with the relatively slow evolution of mammalian and chicken chromosome structure, gene order and linkage in Diptera species has altered at a much higher rate. Although 50% of the genes are orthologs, little conservation of synteny could be observed in comparisons of the genomes of the fruit fly with two different malaria mosquitoes, which diverged ~250 My ago [7,8]. Even in the more closely related Diptera [8,9], extensive reshuffling and inversion have altered the gene order and organization, although genes were found to be located on the same chromosome arms. Similarly, the genomes of the nematodes Caenorhabditis elegans and C. briggsae, which diverged ~100 My ago, share 60% gene orthology but are arranged as 4,837 microsyntenic clusters [10]. The continuing efforts to sequence a variety of unicellular parasites has resulted in the publication of a comparison of the genome sequences of three human protozoan pathogens, Trypanosoma brucei, T. cruzi, and Leishmania major [11], and two apicomplexan parasites infecting cattle, Theileria annulata and T. parva [12]. The two Theileria species are very closely related, with 81% (T. annulata) and 86% (T. parva) orthologous genes and no interchromosomal rearrangements [12], comparable to the well-conserved genomes of four yeast species that diverged only 5–20 My ago and show relatively few (1–5) translocations [13]. The trypanosomatid species T. brucei and L. major share 68% and 75% gene orthology, respectively, organized in 110 SBs, despite having diverged as long as 200–500 My ago (chromosomal recombination rate of ~0.2–0.5 breaks/My) [11]. In conclusion, these comparative genome studies indicate that effective recombination rates and levels of gene orthology can vary greatly between species but are relatively low in protozoa. In both pathogenic bacteria and certain unicellular eukaryotes (e.g., the trypanosomatids listed above), including members of the genus Plasmodium that are the etiological agents of malaria, the organization and gene content of the subtelomeric regions of chromosomes are highly variable and typically contain large gene families encoding proteins that may be involved in host-pathogen interactions and antigenic variation [14]. The subtelomeric regions of P. falciparum, for example, harbor a repertoire of unique gene families, including 59 var [15–17], 149 rif, and 28 stevor [18,19]. The var family encodes the erythrocyte membrane protein 1 (PfEMP1), which is a variant antigen expressed at the erythrocyte surface. PfEMP1 is involved in the binding of parasite-infected erythrocytes to receptors of host endothelial cells, erythrocytes, lymphocytes, and blood platelets [14], is subject to antigenic variation, and is thought to play a role in virulence. Other Plasmodium species lack the P. falciparum-specific var, rif, and stevor families, but the subtelomeric regions of their chromosomes also harbor (species-specific) gene families. For example, the human parasite P. vivax; P. knowlesi, which infects primates; and three rodent malaria parasites (RMPs; P. berghei, P. chabaudi, and P. yoelii) share the pir superfamily [20,21]. Proteins encoded by the pir superfamily are also found on the surface of infected erythrocytes and may be implicated in antigenic variation [21]. It is generally believed that the subtelomeric location of gene families confers an enhanced capacity for gene diversification and amplification through mechanisms of ectopic recombination that may be between different chromosomes [22]. Such recombination may be facilitated through the clustering of telomeres at the nuclear periphery [23]. Genome sequence data for Plasmodium species are extensive and include a complete genome sequence for the major human pathogen P. falciparum [24] and 5× coverage of the genome of a RMP, P. yoelii [25]. The P. yoelii contigs, when aligned with the 14 P. falciparum chromosomes, demonstrated extensive similarity over the relatively short length of these contigs. However, similarity was evident only in the core regions of the chromosomes mainly containing conserved genes (4,500) that are present in all characterized Plasmodium species [20] and which are bounded by the variable subtelomeric regions that contain the different gene families. In addition to the genome sequence of P. yoelii, partial genome sequence and analysis have been published for two other RMPs, P. berghei and P. chabaudi, whose core genome sequence and organization are so similar [26–28] that it has proved possible to merge the sequenced DNA contigs of the three RMPs to form composite RMP (cRMP) contigs that cover 90% of the core RMP genomes [20,25]. In this study, the cRMP contigs and 138 sequence tagged site (STS) markers (Table S1) have been used to produce a whole-genome synteny map for the three RMPs that, when compared with the P. falciparum genome, identified 36 SBs describing the core genome. This synteny map shows that species-specific genes—including rapidly evolving P. falciparum gene families—are found not only in the subtelomeric regions but also at synteny breakpoints (SBPs) and as intrasyntenic indels. Our data suggest that chromosomal rearrangements in the core regions might be involved in the generation and subsequent dispersal of one such P. falciparum-specific gene family. These results show that not only recombination in the more frequently recombining subtelomeric regions but also chromosome-internal rearrangements may influence diversity and complexity of the Plasmodium genome, increasing the ability of the parasite to successfully interact with its vertebrate host. Results A Whole-Genome Synteny Map of Four Plasmodium Species A total of 7,392 contigs of the three RMPs, aligned with the P. falciparum genome, were used to generate 910 cRMP contigs (see Materials and Methods, Figure 1
When the alignment of the cRMP contigs with the P. falciparum genome was examined, 19 were identified with MUMmer hits to two different P. falciparum chromosomes, indicating that these contigs covered a SBP between the cRMP and the P. falciparum genomes. In addition, three SBPs were determined by chromosome mapping of STS markers and confirmed by PCR analysis, linking the cRMP contigs on either side of the SBP (unpublished data). In total, we found 22 SBPs in the core regions of the P. falciparum genome when compared to the core cRMP genome. Since the cRMP and P. falciparum genomes comprise 14 chromosomes, these 22 SBPs define a total of 36 SBs. Chromosome mapping of 138 P. berghei and P. yoelii STS markers (see Table S1) confirmed the 22 SBPs and the chromosomal location of the 36 SBs in the RMPs. The majority (23 of 28) of P. falciparum subtelomeric regions coincided with putative locations of cRMP subtelomeric regions, while the remaining five P. falciparum subtelomeric linked SBs were linked to SBPs in the cRMP genome. Conversely, five SBs that are linked to SBPs in P. falciparum were linked to subtelomeric regions in the cRMP genome. Figure 2
Centrally located AT-rich (CAT) regions of 2–3 kb (average >97% AT) found on all P. falciparum chromosomes (with the exception of P. falciparum Chromosome 13 [Pfchr13]) have been predicted to be centromeres [29], and functional proof for their centromere function is accruing (S. Iwanaga, CJJ, and APW, unpublished data). While no CAT regions had been sequenced in the RMP genomes, genes immediately up- and downstream of 11 of the P. falciparum CAT regions were syntenic and located at 11 different cRMP chromosomes (Figure 2 Comparison of the organization and location of common orthologous gene families of RMPs and P. falciparum allowed species-specific features of these families to be defined. For example, P. falciparum possesses a cluster of eight genes encoding putative serine proteases known as sera [30,31]. The P. berghei and P. yoelii databases both contain five sera, whose organization in the individual RMP genomes was unresolved, yet could be reconstructed using the cRMP contigs, demonstrating one utility of the cRMP contig construction (see Figure 1 Inferring the Pathway of Synteny Rearrangements Between the cRMP and P. falciparum Genomes The organization of the three RMP genomes is highly conserved, and only one or two chromosomal rearrangements were noted when the genomes of the individual RMP species were compared with the cRMP genome (Figure 2 The P. falciparum genome organization could be generated from the cRMP genome in a minimum of 15 recombination events when the following assumptions were made: (i) that the resulting genome always consists of 14 chromosomes; (ii) that all chromosomes always contain only one of the SBs containing a CAT region; and (iii) that a recombination event generating a subtelomeric from a chromosome-internal region (or vice versa, collectively termed telomere conversions) has happened only once. These 15 recombination events included eight single crossover events, five telomere conversions, one inversion of an entire SB, and one insertion involving an intersyntenic var cluster (Figure 3
P. falciparum-Specific Genes Are Found Both at SBPs and in Intrasyntenic Indels The average size of species-specific DNA regions located between SBs (intersyntenic regions) is significantly smaller in the cRMP genome (~2.5 kb, range 0.4–15 kb) than in the P. falciparum genome (~16 kb, range 0.7–106 kb). Only four of the 19 intersyntenic regions in the cRMP genome for which sequence data are available contain a species-specific open reading frame, but only the nonsyntenic c-rrna gene unit on cRMPchr5 is known to be expressed (Tables 2 and S32). In contrast, eight of the 22 intersyntenic regions in P. falciparum contain clusters of one to 13 genes without RMP orthologs (Tables 2 and S33). These 42 intersyntenic genes include 14 var and six rif genes, as well as five other genes, which all encode proteins containing the Plasmodium export element/vacuolar transport signal motif (PEXEL/VTS) [32,33]—e.g., glycophorin-binding protein 130 precursor: GBP130 [34] and two receptor-associated protein kinases: PfTSTK7a, and PfTSTK10a (see also below). The PEXEL/VTS motif is one element that is associated with transport of the proteins to the surface of the infected erythrocyte. A further 12 genes encode proteins with a transmembrane domain at the N-terminal end (e.g., MAL7P1.58 of the pfmc-2tm family, which encodes proteins localized to the Maurer's clefts [35]), seven of which also have a signal peptide (e.g., PF10_0164 of the etramp family [36] and five var internal cluster associated repeat [vicar] genes; see also below). Figure 4
In addition to the species-specific genes located at SBPs, P. falciparum-specific genes were also found clustered in small intrasyntenic regions that interrupt the SBs (i.e. indels, Tables 2 and S34). These 82 indels, including four var clusters, range in size from one to nine genes but are generally less gene-rich than the intersyntenic regions (1.5 genes/indel compared to 5.3 genes/SBP). Whereas only two of eight SBPs contain a single P. falciparum-specific gene, 65 of 82 of the intrasyntenic indels contain only one gene. The 126 intrasyntenic, P. falciparum-specific genes include nine var and four rif genes as well as an additional six genes with the PEXEL/VTS motif [32,33] including pftstk13 (MAL13P1.109, see also Discussion). Another 59 of these genes encode proteins with an N-terminal transmembrane domain, 40 of which also contain a signal peptide, giving a total of 78 genes encoding potential secreted or surface proteins. For example, a multigenic indel on Pfchr10 (Figure 4 Evolution of Gene Families Associated with Recombination Events at SBPs In order to analyze whether recombination events in the core regions that resulted in the loss of synteny are associated with the dispersal and formation of species-specific gene families, all intersyntenic genes of P. falciparum and the RMPs were analyzed for the presence and location of orthologous genes in their respective genomes. In addition to members of the var, rif, and rrna families, one intrasyntenic (pftstk13) and two intersyntenic (pftstk7a and pftstk10a) P. falciparum genes were identified that belong to a gene family encoding 21 transforming growth factor β receptor-like serine/threonine protein kinases (PfTSTK) [41–43]. In addition to these three genes, 17 members are located in the subtelomeric regions of 10 different chromosomes (Table S35), and one member is located adjacent to the Pfchr8 CAT region (M. Berriman, personal communication). In the RMP genome there is a single member of this family on cRMPchr12 syntenic to the copy near the Pfchr8 CAT region. Phylogenetic analysis groups these syntenic kinases in the same clade as the unique members of all other characterized Plasmodium species, with exception of the proteins encoded by the multiple tstk genes found in Plasmodium reichenowi, a very close relative of P. falciparum infecting chimpanzees [44]. These findings suggest that the syntenic pftstk on Pfchr8 could be the progenitor gene of this P. falciparum-specific gene family (Figure 5
Two different recombination pathways that would generate the pftstk family are consistent with the data. (i) A copy of the syntenic, orthologous progenitor pftstk on Pfchr8 relocated to a subtelomeric region, where it underwent extensive gene duplication and redistribution. The centrally located pftstk genes could then have originated from telomere changes. (ii) Combining the information on the location and phylogeny of the pftstk family with the predicted 15 synteny rearrangements suggests that both chromosome-internal rearrangements resulting in the loss of synteny and subtelomeric recombination are associated with the evolution and distribution of this family (Figure 5 Identification of a New Putative Gene Family Associated with Chromosome-Internal var Clusters Since repetitive sequences might be associated with recombination events between SBs, the intergenic regions flanking SBPs were examined using the MEME algorithm. This analysis resulted in the identification of a highly conserved P. falciparum-specific gene family consisting of seven putative genes and eight pseudogenes termed var internal cluster associated repeat (vicar) genes. These genes were found to be associated with five of seven chromosome-internal var clusters. Of these seven genes, five have a signal peptide and five genes have one or two transmembrane domains; only one of these genes is identified in the current annotation (MAL7P1.39) and is supported by transcriptome data [38]. The sequences correspond to the previously described GC-rich elements that were suggested to serve as regulatory elements for var-related genetic processes [29]. No other repetitive sequences were identified that, in the light of current knowledge, could be associated with chromosomal recombination events. Discussion The generation of composite contigs from three closely related Plasmodium species infecting rodents greatly facilitated the construction of a synteny map between the RMPs and P. falciparum and significantly reduced the need for experimental data from PCR and STS mapping studies. Current contig assembly algorithms rely upon a minimum of 95% sequence identity between sequence reads [46], a criterion not met by the RMP sequences. The high degree of synteny and similarity of gene content of the core Plasmodium genome enabled the compilation of cRMP contigs using sequences of the three RMPs with a lower sequence identity by aligning them to the assembled P. falciparum sequence. With only 229 gaps remaining and the location of 138 STS markers identified, the synteny map is a comprehensive tool for identifying the location of most genes. Individually, cRMP contigs are not sufficient to build an entire composite genome, since coverage and linkage of the cRMP scaffolds are incomplete. An unknown proportion of small rearrangements such as single gene insertions, inversions, or deletions will have been missed. Thus the need for continued sequencing to completion of at least one RMP genome remains. Approximately 4,500 (85%) of the 5,300 predicted P. falciparum genes have an ortholog in at least one of the RMPs, and these likely represent the core set of Plasmodium genes [20]. A similar level of orthology is seen in the genome organization, since the 36 SBs cover 84% of both genomes. The synteny maps of P. falciparum and cRMP demonstrated that only a minimum of 15 recombination events are needed to generate the P. falciparum genome from the 36 SBs of the RMPs, compared with 245 events needed to convert the human genome organization to that of the mouse [3]. This relatively low number of Plasmodium genome rearrangements suggests either that divergence of P. falciparum and the RMPs might be relatively recent or that chromosomal rearrangements in Plasmodium are infrequent, either as a result of unknown (intrinsic) features of the DNA or due to some higher order organization of the genome [26]. Because the evolutionary relationships and the time of divergence between P. falciparum and other Plasmodium species is unclear [44,47–52], it is not yet possible to draw conclusions on the rate of chromosomal rearrangements in Plasmodium. A rough estimate consistent with published data would be that P. falciparum diverged and developed separately between 50 and 200 My ago. Thus the effective chromosomal recombination rate would be between 0.08 and 0.3 breaks/My. In comparison, the recombination rate in yeast species appears to be ~0.2 breaks/My [13]. Both are at the lower end of the range of rates observed for different mammalian species [6]. The genomes of different trypanosomatid species were also suggested to have a low recombination rate [11]. In many species, centromeres have been associated with chromosomal rearrangements and have proven to be positionally dynamic, with transposable elements often found to function in centromere relocation [1]. Plasmodium centromeres have not been functionally characterized but based on previous predictions, preliminary functional evidence (S. Iwanaga, CJJ, and APW), and the distribution of the CAT regions as demonstrated by the Plasmodium synteny map, it is tempting to suggest that the predicted centromeres of Plasmodium are positionally static. One of the assumptions upon which the initial intuitive derivation of the minimum 15 recombination events was based was that each chromosome at any time always contains one CAT region and one only, in keeping with their still-hypothetical function as centromeres. The GRIMM analysis did not include such an assumption, yet it predicted the same number of rearrangements, while maintaining a single SB containing a CAT region in each newly formed chromosome, emphasizing their predicted lack of involvement in the recombination events identified in this study. Furthermore, these recombination events are also unlikely to involve transposable elements, since these were not found in a cross-species comparison of the sequences in the vicinity of SBPs, consistent with previous studies [24]. In contrast to the low number of chromosomal rearrangements in the Plasmodium genomes, a relatively large proportion (15%) of the P. falciparum genes have no readily identifiable ortholog in any of the RMPs. These genes (including the well known var, rif, and stevor families) are mainly located in the subtelomeric regions, which appear to have a higher rate of gene evolution in many organisms, including Plasmodium [1,22]. However, this study shows that a significant proportion of P. falciparum-specific genes and members of gene families are not restricted to the subtelomeric region of the chromosomes but can be found as intrasyntenic indels and at SBPs. The majority (115 genes [68%]) of these 168 genes encode predicted or known surface or secreted proteins that are predominantly expressed in asexual blood stage parasites (both infected erythrocytes and merozoites) and thus are involved in parasite interactions with the human host and possibly associated with immune selection/evasion. Interestingly, several of the larger clusters of genes, such as the indel containing msp3 and msp6, appear to be coordinately expressed and may even be transcribed in an operon-like manner [53], despite earlier analyses that did not find evidence for the existence of such clusters [37]. Perhaps surprisingly, indels containing RMP-specific genes were not readily found, and although this may be in part due to the incomplete RMP genome sequence data that are currently available, the depth of coverage of the cRMP genome suggests that RMP indels are not as frequent as in P. falciparum. However, indels are not absent from the RMP genomes, and evidence is accumulating for RMP indels that contain members of the pir superfamily normally found in the subtelomeric regions reminiscent of the organization of the var family in the P. falciparum genome (see Tables S3–S30) [20,21]. To test whether SBPs are significantly more associated with chromosome-internal P. falciparum-specific genes than what might be expected based on a random distribution of the SBPs, we used computer simulations to generate randomly distributed SBPs in the genome and compared these with the inter- and intrasyntenic gene content. Using a conservative and a more relaxed approach (see Materials and Methods), we showed that based on a random breakage model, between 1.9 and 3.0 of the 22 SBPs on average could be expected to be associated with P. falciparum-specific gene clusters. This is significantly different (p < 0.001) from the observed association of eight (36%) of the 22 SBPs with P. falciparum-specific genes. This result indicates a nonrandom distribution of P. falciparum-specific genes associating with a higher frequency to SBPs and, therefore, with chromosomal rearrangements that have led to loss of synteny. Interestingly, from comparisons of the human and mouse genomes, evidence has emerged for a similar nonrandom distribution of repeat sequences in the genome and their association with SBPs [54,55]. The presence of members of species-specific gene families at the SBPs suggests that recombination events resulting in loss of synteny helped shape species-specific gene content. SBPs and the intrasyntenic indels might therefore distinguish islands where variations in gene content occur (and then evolve) between the different Plasmodium species. The location and phylogeny of the pftstk family and the chromosomal rearrangements between SBs were consistent with different possible recombination pathways and mechanisms. Interestingly, the processes of gene duplication and translocation described for the tstk family could also be associated with the generation of two other gene families in P. falciparum encoding acyl-CoA binding proteins (ACP; four P. falciparum genes and one cRMP gene) and acyl-CoA synthetases (ACS; 11 P. falciparum genes and three cRMP genes). Both families have one syntenic copy in P. falciparum and the RMPs that are located in the P. falciparum genome next to an indel. The syntenic acp is located next to an indel on Pfchr8, and the syntenic acs next to an indel on Pfchr2 (PFB0685c). This latter gene appears to have undergone local gene duplication, followed by relocalization and expansion to seven subtelomeric copies in P. falciparum (unpublished data). In conclusion, our data show that both SBPs and intrasyntenic indels can be foci for species-specific genes with a predicted role in host-parasite interactions and indicate that not only rearrangements in the subtelomeric regions but also chromosomal rearrangements are involved in the generation of species-specific gene families. The majority are expressed in blood stages (complete list in Table S34), suggesting that the vertebrate host exerts a greater selective pressure than the mosquito vector, resulting in the acquisition of diversity. It is already evident that a single recombinational mechanism underlying the origin of the inter- and intrasyntenic gene content or the generation of gene families in P. falciparum cannot be postulated. The 42 SBP-associated genes of P. falciparum can be classified into three groups: (i) two single genes that are associated with single crossover events; (ii) three clusters of genes (total 12 genes) that might have their origin in subtelomeric regions that became chromosome-internal after a telomere change (these include the SBPs containing pftstk genes); and (iii) three var clusters, two associated with the insertion of SBs “VIIc:14b” and “VIIb:14c” and one associated with a single crossover event (total 28 genes; see Table S33). Thus it is clear that different recombination mechanisms were involved in shaping the P. falciparum genome. Evidence from both the 15 SBP-associated recombination events and previous var gene classifications [56] cannot be reconciled with an origin of central var clusters associated with telomere recombination changes and subsequent internalization of subtelomeric var genes. Both SBP and intrasyntenic var clusters are associated with the vicar genes identified in this study and previously described as the GC-rich elements [29]. The position of vicar elements is consistent with an as yet unproven role in recombination. The pairwise whole-genome comparison presented here, while indicating that 15 chromosomal rearrangements can create the P. falciparum genome organization from that of the RMP, does not resolve the organization of the most recent common ancestor, which requires more complete Plasmodium genomes. Genome-wide comparison of the location and distribution of SBPs between different Plasmodium species should provide a reliable dataset enabling construction of a definitive phylogeny of the genus and resolving issues of precise clade topology [45]. In addition, whole-genome comparisons and the identification of SBPs might prove to be an effective means of identifying species-specific genes and members of gene families that are involved in host-parasite interactions and immune evasion, including antigenic variation. Materials and Methods Creation of a cRMP genome. 7,215 contigs of three RMP genomes, P. yoelii yoelii (17XNL line) [25], P. berghei (ANKA strain), and P. chabaudi chabaudi (AS strain) [20] were previously aligned with the P. falciparum genome using MUMmer to identify annotation-independent protein similarities [57]. We manually aligned an additional 177 contigs using linkage data from the P. yoelii genome publication and by performing BLASTN analyses with ~500-bp sized sequences from the ends of the RMP contigs, thus closing gaps in the synteny map and “walking” toward the telomeric ends. Linking of these 7,392 contigs through identification of overlapping contigs resulted in the generation of 910 cRMP contigs (see Figure 1 Analysis of the synteny map of the cRMP and P. falciparum genomes. Intergenic sequences flanking the SBs at all 22 P. falciparum SBPs as well as the five subtelomere linked ends that are chromosome-internal in the RMPs (92 kb in total) were analyzed for repetitive motifs using MEME [59]. The intergenic sequences of the 20 RMP SBPs for which sequence was available were also analyzed. Nonsyntenic genes were compared with the genome data of the different Plasmodium species by TBLASTN analysis, and the expression profiles and putative functions of these genes were investigated using data available from PlasmoDB [30,31,38,39]. The predicted protein sequences of the tstk family members were analyzed for functional domains by SMART [60]. GRIMM [3] was used to confirm the suggested minimum 15 recombination events. To test the significance of the association between SBPs and P. falciparum-specific gene content, we used computer simulations to reassign the 22 chromosome-internal SBPs to random positions in the core genome of P. falciparum, thus excluding the subtelomeric regions. We used two different approaches: The first approach utilized the sizes of the entire SBP regions, including the species-specific gene content, while the second approach utilized fixed SBP sizes (5 kb, slightly larger than the largest noncoding intergenic, intersyntenic regions). For both approaches, we counted the number of associations of the virtual SBPs of 1,000 random distributions with the locations of all inter- and intrasyntenic genes. Phylogenetic analyses of members of the TSTK and SERA families were performed using manually corrected ClustalW alignments [61]. Protein parsimonies, pairwise distances and maximum likelihood distances were calculated using different regions of alignment with algorithms and matrices from the phylogeny inference package (PHYLIP) [62] and gave comparable results. For the final tree construction, 100 bootstrap trees were generated (each with 10× jumbling) of a manually corrected alignment of roughly 400 amino acids of the C-terminal ends of all TSTKs containing the serine/threonine protein kinase domain using SEQBOOT [63]. Maximum likelihood distances [64] were calculated using default parameter settings and 10× jumbling. The 100 bootstrap trees thus constructed were combined using CONSENSE [65]. The tree was rooted using the clade of non-Plasmodium TSTKs as the outgroup with RETREE, and the final tree was drawn using DRAWTREE, both also available from PHYLIP [62]. Accession Numbers The GenBank (http://www.ncbi.nlm.nih.gov) accession numbers for the sequences of two putative P. yoelii centromeres (Chromosomes 5 and 13) are DQ054838 and DQ054839, respectively. All datasets will become available through the official Web site of the Plasmodium genome project, PlasmoDB (http://plasmodb.org) [30,31]. The PlasmoDB accession numbers for the P. falciparum cluster of eight genes encoding putative serine proteases known as sera are PFB0325c–PFB0360c. The PlasmoDB accession numbers for other genes and gene products discussed in this paper are, for P. falciparum: etramp (PF10_0164), gbp130 (PF10_0159), glurp (PF10_0344), H101 (PF10_0347), H103 (PF10_0352), hypothetical protein (PF10_0342), lsa1 (PF10_0356), msp3 (PF10_0345), msp6 (PF10_0346), pftstk1 (PFA0130c), pftstk7a (MAL7P1.144), pftstk10a (PF10_0160), pftstk13 (MAL13P1.109), and S-antigen (PF10_0343); for P. berghei: H103 (PB105993.00.0), lsa1 (PB101910.00.0+PB105996.00.0), and five sera (PB000649.01.0, PB000352.01.0, PB000107.03.0, PB107093.00.0, PB000108.03.0); for P. yoelii: etramp (PY00205), H103 (PY01016), lsa1 (PY01014), and five sera (PY02063, PY02062+PY00294, PY00293, PY00292, PY00291). P. berghei and P. yoelii gene models referred to in the text are available from GeneDB (http://www.genedb.org), and GeneIndices (http://www.tigr.org/tdb/tgi/protist.shtml). Acknowledgments We would like to thank Matthew Berriman and The Wellcome Trust Sanger Institute for kindly providing prepublication P. falciparum sequences and Ross Coppel for constructive criticism. TWAK was supported by a Leiden University PhD fellowship. We would like to thank the anonymous reviewers for their constructive criticism that resulted in a significant reshaping of this manuscript. Abbreviations
Footnotes Competing interests. The authors have declared that no competing interests exist. Author contributions. TWAK, JMC, NH, CJJ, and APW conceived and designed the experiments. TWAK, JMC, SLB, JR, and CJJ performed the experiments. TWAK, CJJ, and APW analyzed the data. TWAK, JMC, CJJ, and APW wrote the paper. Citation: Kooij TWA, Carlton JM, Bidwell SL, Hall N, Ramesar J, et al. (2005) A Plasmodium whole-genome synteny map: Indels and synteny breakpoints as foci for species-specific genes. PLoS Pathog 1(4): e44. References
|
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
||||||||||||||||||||||||||||||||||||||||||||||||||
Science. 2003 Aug 8; 301(5634):793-7.
[Science. 2003]Nature. 2002 Dec 5; 420(6915):520-62.
[Nature. 2002]Genome Res. 2003 Jan; 13(1):37-45.
[Genome Res. 2003]Nature. 2004 Apr 1; 428(6982):493-521.
[Nature. 2004]Nature. 2004 Dec 9; 432(7018):695-716.
[Nature. 2004]Science. 2002 Oct 4; 298(5591):149-59.
[Science. 2002]J Hered. 2004 Mar-Apr; 95(2):103-13.
[J Hered. 2004]Science. 2002 Oct 4; 298(5591):182-5.
[Science. 2002]PLoS Biol. 2003 Nov; 1(2):E45.
[PLoS Biol. 2003]Science. 2005 Jul 15; 309(5733):404-9.
[Science. 2005]Science. 2005 Jul 1; 309(5731):131-3.
[Science. 2005]Nature. 2003 May 15; 423(6937):241-54.
[Nature. 2003]Annu Rev Microbiol. 2001; 55():673-707.
[Annu Rev Microbiol. 2001]Cell. 1995 Jul 14; 82(1):89-100.
[Cell. 1995]Cell. 1995 Jul 14; 82(1):77-87.
[Cell. 1995]Mol Biochem Parasitol. 1998 Nov 30; 97(1-2):161-76.
[Mol Biochem Parasitol. 1998]Proc Natl Acad Sci U S A. 1999 Aug 3; 96(16):9333-8.
[Proc Natl Acad Sci U S A. 1999]Nature. 2002 Oct 3; 419(6906):498-511.
[Nature. 2002]Nature. 2002 Oct 3; 419(6906):512-9.
[Nature. 2002]Science. 2005 Jan 7; 307(5706):82-6.
[Science. 2005]Int J Parasitol. 2000 Apr 10; 30(4):357-70.
[Int J Parasitol. 2000]Mol Biochem Parasitol. 1998 Jun 1; 93(2):285-94.
[Mol Biochem Parasitol. 1998]Nature. 2002 Oct 3; 419(6906):512-9.
[Nature. 2002]Science. 2005 Jan 7; 307(5706):82-6.
[Science. 2005]Nature. 2002 Oct 3; 419(6906):527-31.
[Nature. 2002]Nature. 2002 Oct 3; 419(6906):490-2.
[Nature. 2002]Nucleic Acids Res. 2003 Jan 1; 31(1):212-5.
[Nucleic Acids Res. 2003]Genome Res. 2003 Jan; 13(1):37-45.
[Genome Res. 2003]Science. 2004 Dec 10; 306(5703):1934-7.
[Science. 2004]Science. 2004 Dec 10; 306(5703):1930-3.
[Science. 2004]J Exp Med. 1984 Sep 1; 160(3):788-98.
[J Exp Med. 1984]Genome Res. 2004 Jun; 14(6):1052-9.
[Genome Res. 2004]Mol Biol Cell. 2003 Apr; 14(4):1529-44.
[Mol Biol Cell. 2003]Science. 2004 Dec 10; 306(5703):1934-7.
[Science. 2004]Science. 2004 Dec 10; 306(5703):1930-3.
[Science. 2004]Nature. 2002 Oct 3; 419(6906):520-6.
[Nature. 2002]Mol Biochem Parasitol. 2005 Feb; 139(2):141-51.
[Mol Biochem Parasitol. 2005]Proteins. 2005 Jan 1; 58(1):180-9.
[Proteins. 2005]BMC Genomics. 2004 Oct 12; 5(1):79.
[BMC Genomics. 2004]Proc Natl Acad Sci U S A. 1994 Nov 22; 91(24):11373-7.
[Proc Natl Acad Sci U S A. 1994]Nature. 2000 Oct 26; 407(6807):1018-22.
[Nature. 2000]Nat Genet. 1997 Jan; 15(1):6-7.
[Nat Genet. 1997]Science. 2003 Sep 12; 301(5639):1503-8.
[Science. 2003]Nature. 2002 Oct 3; 419(6906):527-31.
[Nature. 2002]Science. 2005 Jan 7; 307(5706):82-6.
[Science. 2005]Genome Res. 2003 Jan; 13(1):37-45.
[Genome Res. 2003]Int J Parasitol. 2000 Apr 10; 30(4):357-70.
[Int J Parasitol. 2000]Proc Natl Acad Sci U S A. 1994 Nov 22; 91(24):11373-7.
[Proc Natl Acad Sci U S A. 1994]Proc Natl Acad Sci U S A. 1991 Apr 15; 88(8):3140-4.
[Proc Natl Acad Sci U S A. 1991]Adv Parasitol. 2003; 54():255-80.
[Adv Parasitol. 2003]Science. 2003 Aug 8; 301(5634):793-7.
[Science. 2003]Nature. 2002 Oct 3; 419(6906):498-511.
[Nature. 2002]Science. 2003 Aug 8; 301(5634):793-7.
[Science. 2003]Int J Parasitol. 2003 Jan; 33(1):29-45.
[Int J Parasitol. 2003]Parasitol Today. 1999 May; 15(5):178-9.
[Parasitol Today. 1999]Nature. 2002 Oct 3; 419(6906):520-6.
[Nature. 2002]Science. 2005 Jan 7; 307(5706):82-6.
[Science. 2005]Genome Biol. 2004; 5(4):R23.
[Genome Biol. 2004]Hum Mol Genet. 2003 Sep 1; 12(17):2201-8.
[Hum Mol Genet. 2003]Mol Microbiol. 2003 Dec; 50(5):1527-38.
[Mol Microbiol. 2003]Nature. 2002 Oct 3; 419(6906):527-31.
[Nature. 2002]Nat Genet. 1997 Jan; 15(1):6-7.
[Nat Genet. 1997]Nature. 2002 Oct 3; 419(6906):512-9.
[Nature. 2002]Science. 2005 Jan 7; 307(5706):82-6.
[Science. 2005]Nucleic Acids Res. 2002 Jun 1; 30(11):2478-83.
[Nucleic Acids Res. 2002]Science. 1998 Nov 6; 282(5391):1126-32.
[Science. 1998]Mol Biochem Parasitol. 1994 Dec; 68(2):285-96.
[Mol Biochem Parasitol. 1994]Proc Int Conf Intell Syst Mol Biol. 1994; 2():28-36.
[Proc Int Conf Intell Syst Mol Biol. 1994]Nature. 2002 Oct 3; 419(6906):490-2.
[Nature. 2002]Nucleic Acids Res. 2003 Jan 1; 31(1):212-5.
[Nucleic Acids Res. 2003]Science. 2003 Sep 12; 301(5639):1503-8.
[Science. 2003]Proc Natl Acad Sci U S A. 1998 May 26; 95(11):5857-64.
[Proc Natl Acad Sci U S A. 1998]Genome Res. 2003 Jan; 13(1):37-45.
[Genome Res. 2003]Nucleic Acids Res. 1994 Nov 11; 22(22):4673-80.
[Nucleic Acids Res. 1994]Methods Enzymol. 1996; 266():418-27.
[Methods Enzymol. 1996]Comput Appl Biosci. 1993 Dec; 9(6):653-6.
[Comput Appl Biosci. 1993]Nature. 2002 Oct 3; 419(6906):490-2.
[Nature. 2002]Nucleic Acids Res. 2003 Jan 1; 31(1):212-5.
[Nucleic Acids Res. 2003]Nature. 2002 Oct 3; 419(6906):527-31.
[Nature. 2002]