![]() | ![]() |
Formats:
|
||||||||||||||||||
Copyright © 2002 Eliáš et al; licensee BioMed Central Ltd. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose, provided this notice is preserved along with the article's original URL. Molecular diversity of phospholipase D in angiosperms 1Department of Plant Physiology, Faculty of Science, Charles University, Viničná 5, Prague 2, Czech Republic 2Department of Biochemistry and Microbiology, Institute of Chemical Technology, Technická 5, Prague 6, Czech Republic 3Institute of Experimental Botany, The Academy of Sciences of the Czech Republic, Rozvojová 135, Prague 6, Czech Republic Corresponding author.Marek Eliáš: melias/at/natur.cuni.cs; Martin Potocký: potockym/at/vscht.cz; Fatima Cvrčková: fatima/at/natur.cuni.cz; Viktor Žárský: zarsky/at/ueb.cas.cz Received November 30, 2001; Accepted February 1, 2002. This article has been cited by other articles in PMC.Abstract Background The phospholipase D (PLD) family has been identified in plants by recent molecular studies, fostered by the emerging importance of plant PLDs in stress physiology and signal transduction. However, the presence of multiple isoforms limits the power of conventional biochemical and pharmacological approaches, and calls for a wider application of genetic methodology. Results Taking advantage of sequence data available in public databases, we attempted to provide a prerequisite for such an approach. We made a complete inventory of the Arabidopsis thaliana PLD family, which was found to comprise 12 distinct genes. The current nomenclature of Arabidopsis PLDs was refined and expanded to include five newly described genes. To assess the degree of plant PLD diversity beyond Arabidopsis we explored data from rice (including the genome draft by Monsanto) as well as cDNA and EST sequences from several other plants. Our analysis revealed two major PLD subfamilies in plants. The first, designated C2-PLD, is characterised by presence of the C2 domain and comprises previously known plant PLDs as well as new isoforms with possibly unusual features-catalytically inactive or independent on Ca2+. The second subfamily (denoted PXPH-PLD) is novel in plants but is related to animal and fungal enzymes possessing the PX and PH domains. Conclusions The evolutionary dynamics, and inter-specific diversity, of plant PLDs inferred from our phylogenetic analysis, call for more plant species to be employed in PLD research. This will enable us to obtain generally valid conclusions. Background Phospholipase D (PLD, EC 3.1.4.4.) is a ubiquitous eukaryotic enzyme participating in various cellular processes (for a review see [1,2]). Biochemically distinct types of PLDs have been described, but only two, the mammalian glycosylphosphatidylinositol-specific PLD (GPI-PLD) and a family usually referred to as phosphatidylcholine-specific PLD (PC-PLD), have been characterised also on the molecular level. Two distinct PC-PLD genes have been identified in mammals; they seem to be involved in signal transduction and vesicular trafficking. The yeast Saccharomyces cerevisiae contains only one gene from the PC-PLD family and its function in sporulation has been recognised. Plants are a traditional model for PLD research. Indeed, PLD activity was first described from a plant source [3], and the first cloned eukaryotic cDNA coding for a PLD was isolated from the castor bean, Ricinus communis[4]. Using mainly biochemical and pharmacological approaches, plant PLD has been implicated in many cellular processes (reviewed in [2]). Beside its roles in membrane degradation and turnover during senescence, seed germination and under stress conditions, plant PLD is emerging as an important component of signal transduction cascades, e.g. in response to wounding, abscisic acid [2] or Nod factors [5]. Earlier pharmacological evidence for the involvement of heterotrimeric G-proteins in plant PLD regulation [6] has been recently strengthened by a report on direct interaction of an alpha subunit of a G-protein with PLDα in tobacco [7]. Also products of PLD action, i.e. phosphatidic acid (PA), diacylglycerol and N-acylethanolamine, are potential signalling molecules in plants (reviewed in [8,9]). Up to now about 20 PLDs have been cloned from plants. Multiple isoforms have been found in some species, complicating the study of plant PLD. Application of a reverse genetic approach, combining the knowledge of genomic sequences and molecular genetic techniques, holds greatest promises here. This can be documented e.g. by successful inactivation of the AtPLDα1 gene in Arabidopsis thaliana by antisense strategy, which allowed identification of a novel PLD activity in plants [10,11]. Thorough characterisation of the gene family concerned is an obvious prerequisite for productive application of the reverse-genetic approach. Here we present the results of a detailed comparative analysis of the Arabidopsis and rice PLD families, combining data from the complete Arabidopsis genome sequence [12], publicly available rice (Oryza sativa) genomic and cDNA sequences, and the draft rice genome data made available by Monsanto [13]. Extensive EST collections from three plant species, tomato (Lycopersicon esculentum), Medicago truncatula and Sorghum bicolor, have been included into the analysis to provide the insight into the inter-specific variability of plant PLDs. Our results indicate that the angiosperm PLD family, comprising two major subfamilies (C2- and PXPH-PLDs), is evolutionarily very dynamic, and conclusions based on a single species (such as Arabidopsis) might not therefore be simply applicable to others. Results and Discussion A dozen Arabidopsis PLDs Up to now several cDNAs representing six distinct PLD-encoding genes have been reported from Arabidopsis (Table 1). Using the cloned PLDs from Arabidopsis and other organisms we conducted exhaustive BLAST searches of the Arabidopsis sequences available from GenBank and found 12 genes from the eukaryotic PC-PLD family, five of them not yet recorded in the literature. All genes found code for proteins containing all the conserved sequence motifs characteristic of eukaryotic PLDs, including two copies of the invariant catalytic HxKxxxxD motif [1], suggesting that all probably posses the genuine PLD enzymatic activity (thougt this must be proven experimentally). The catalytic HxKxxxxD motif is shared also by other proteins put together with the eukaryotic PLDs into the PLD superfamily [14], we however did not identify any other members of the superfamily in Arabidopsis besides the 12 PC-PLDs.
Before attempting a detailed phylogenetic analysis of the Arabidopsis PLDs, we used a combination of computational tools, comparison with cDNAs/ESTs and information from protein alignments to verify the exon-intron structures proposed by AGI annotators (see Materials and Methods). In several cases, prediction ambiguities and cloning or sequencing errors have been uncovered, and refined gene models and protein sequence predictions have been obtained and used in further analysis (see the discussion below and Additional file 1. All Arabidopsis PLDs can be classified into two subfamilies (Fig. (Fig.1A).1A
We suggest the term AtPLDα2 for a newly identified homolog closely related to AtPLDα1 (88 % sequence identity at the protein level). Interestingly, AtPLDα1 and AtPLDα2 genes reside within one of several large-scale intragenomic duplications believed to be remnants of a tetraploidisation event dated 112 Myr ago [12], pointing toward probable evolutionary origin of these two paralogs. The remaining Arabidopsis C2-PLD genes do not correspond to any of the previously established group, so we propose terming them AtPLDε and AtPLDζ. (although a PLDε has already been mentioned in a recent review [17], it is not clear from the text to which of the Arabidopsis PLD genes it corresponds to.) Members of the PXPH-PLD subfamily typically bear different two phospholipid-binding domains, the PX (phox) and the PH (pleckstrin-homology) domain, in the N-terminal region (Fig. (Fig.2).2 Functionality of those Arabidopsis PLD genes, for which full-length cDNAs have been cloned, is undisputed. Moreover, proteins encoded by three of these genes, PLDβ1, PLDγ1 and PLDδ, have been characterised biochemically (see [2,16]). Expression of several other isoforms is documented by ESTs in GenBank, but there are currently no ESTs available cognate for AtPLDα2, AtPLDβ2 and AtPLDζ genes (Table 1). However, absence of cognate ESTs is not exceptional, since in general only about 60% of predicted Arabidopsis genes are recorded in available EST collections [12]. It is therefore very likely that expression of most of the genes without ESTs is very low or limited only to some special developmental stages or conditions. Exon-intron organisation of Arabidopsis PLD genes Limitations of theoretical prediction of exon-intron structures are well known and cDNA sequencing is often necessary for building accurate gene models. This proves true also for many of the Arabidopsis PLDs. Unfortunately, four reported cDNAs, i.e. AtPLDα1, AtPLDβ1, AtPLDγ1 and AtPLDγ2 (Table 1), contain mismatches compared to the highly accurate genomic sequences (reported to contain less than 1 error per 104-105 bp; [12]). While some of the discrepancies may represent a natural polymorphism, others, particularly those associated with frame shifts, are most likely due to sequencing errors or cloning artefacts. This suspicion is also supported by available EST sequences, which nearly always match the genomic sequences, not the cDNAs. For example, within the coding portion of the AtPLDα1 cDNA there are four regions with the reading frame shifted relative to the genomic sequence. As a result, the protein sequence derived from the cDNA (AAC49274.1) is highly divergent from other PLDs in these four regions, while that predicted from the genome data (NP_188194.1) matches well the PLD consensus. We found similar discrepancies also for AtPLDβ1, AtPLDγ1 and AtPLDγ2 cDNAs. Moreover, published AtPLDγ1 and AtPLDγ2 cDNAs appear to be chimeric, perhaps due to cloning artefacts. The last ~180 nucleotides of AtPLDγ1 cDNA apparently originate from a gene encoding a pseudo-response regulator (AB046955, chromosome 5). Similarly, the 3' third of the cDNA reported as AtPLDγ2[18] is actually derived from the AtPLDγ3 gene. We therefore believe that the cDNA sequences have to be interpreted very cautiously, and we base our conclusions mainly on the genome project data. In several cases, however, we proposed corrections of the AGI annotation of PLD genes. Details and refined coding sequences can be found in the Additional files, most important aspects are also discussed below. Despite sequencing errors, the AtPLDα1 cDNA is in good agreement with the previously suggested gene structure, the gene contains three coding exons and a 5' non-coding one (Fig. (Fig.3),3
AtPLDβ1 gene was found to consist of 10 exons [2]. Current database annotation should, however, be corrected in some points (see Table 1 and Additional file for details). Most importantly, there is a long region devoid of in-frame STOP-codons upstream from the first predicted exon, and the ORF could be thus extended in the 5' direction (Fig. (Fig.3).3
The AtPLDβ2 gene was originally described as PLDδ1, and 11 exons predicted by AGI were proposed as a unique feature [2]. However, this hypothetical gene structure was not supported by the results of gene-finding programmes that we employed (see Materials and Methods). Inclusion of the 1st originally predicted intron into the ORF, which is supported by the programmes, introduces a conserved portion of the C2 domain and adjusts the splicing pattern to the 10-exonic scheme exhibited by several other PLD genes. The resulting predicted protein sequence belongs clearly to a beta PLD type (Fig. 1A,B The three very similar paralogs of PLDγ reside in a tandem triplication (arranged AtPLDγ1 – AtPLDγ3 – AtPLDγ2) on the Arabidopsis chromosome 4 [2], indicating a relatively recent origin of the triplet. The predicted gene structure of all three genes fits the 10-exonic scheme typical for some other PLD types. However, there appears to be a probable obscure intron of 96 nucleotides in the AtPLDγ2 gene delimited by GT-AG borders and supported by a matching cDNA sequence (see Fig. Fig.3).3 The AtPLDδ gene had been predicted by AGI annotators as consisting of 16 exons, but, as revealed by EST sequences and cDNAs, AtPLDδ possesses only 10 conserved exons shared with beta and gamma PLDs. Interestingly, there is an evidence for alternative splicing of the AtPLDδ gene, because one of the independently cloned cDNAs (AB031047, [15]) differs from the others by extension of the second exon at the 3' boundary by 33 nucleotides (Table 1, Fig. Fig.3).3 The two genes classified into the PXPH-PLD subfamily appear to exhibit the most complex exon-intron structure of all Arabidopsis PLDs. A corresponding full-length cDNA has been reported only for AtPLDp1 (Table 1), so the prediction of the AtPLDp2 gene remains tentative. A minor correction should perhaps be introduced into the current database prediction of the AtPLDp2 gene to restore a highly conserved region (see Table 1 and the Additional files). Despite a difference in the number of exons (20 and 16, respectively), the structures of AtPLDp1 and AtPLDp2 genes are clearly related, as the difference is due to 4 introns probably lost from AtPLDp2 (20-exonic structure seems to be primordial in the plant PXPH-PLD subfamily, see the rice homologs below and Fig. Fig.11 PLDs in rice: an alternative view Arabidopsis is presently the only plant for which the complete PLD set can be catalogued. Nonetheless, other species are emerging as important models for genome-wide studies. Rice genome sequencing is highly advanced, with a substantial portion (more than 230 Mbp up to now, see http://rgp.dna.affrc.go.jp for updates) already sequenced by the International Rice Genome Sequencing Project (IRGSP). Even greater portion of the genome (about 250 Mbp) has been sequenced by the Monsanto company, who have made their sequences publicly available [13]. With redundancy between the two data resources taken into account, we could analyse about three quarters of the whole rice genome. We identified at least 16 complete or partial sequences of putative rice PLD genes (Table 2). Five of them have been cloned individually, 13 genes or their portions have been already sequenced by the IRGSP, sequences coming from 13 genes could be found in the Monsanto genome draft and fragments of at least one PLD gene are available only as EST or GSS sequences. Since a systematic nomenclature of rice PLD genes has not been established, we propose a terminology that would reflect phylogenetic and structural relationship with PLDs from other species (Table 2, Fig. 1A,B Only the five individually cloned PLD genes have been annotated. Complete cDNA has been reported for OsPLDα1[20]; the gene has the exon-intron structure closely related to other alpha-type PLDs [2,21]. Delimitation of the coding region of the OsPLDη1 gene has also been verified experimentally and found to have a similar organisation [21]. Structures of the other genes could be predicted only theoretically, but comparison with EST sequences and other PLD genes proved helpful, as exon-intron junctions appear to be highly conserved within individual subgroups of the PLD family (see below; predicted or corrected coding sequences available in Additional files). Thus, we introduced a minor correction into the previously proposed OsPLDν1 gene structure (see Table 2 and Additional files). Annotated OsPLDη2 and OsPLDη3 appear to have a similar splicing pattern as OsPLDη1. Interestingly, we found the three OsPLDη genes residing in the genome adjacently in a series OsPLDη2-OsPLDη3-OsPLDη1, but, in contrast to the AtPLDγ cluster in Arabidopsis, OsPLDη2 is inverted with respect to the remaining two genes. Exon-intron structures proposed by us for other rice PLD genes reflect phylogenetic affinity to Arabidopsis orthologs (compare Table 1 and Table 2). The novel OsPLDλ, and OsPLDθ genes probably have 3 coding exons with introns occupying conserved positions shared with the PLDα and PLDη prototypes. The novel OsPLDμ gene also resembles PLDα and PLDη, although comparison with a highly similar barley EST revealed 4 coding exons. The second exon is very short and encodes a part of the first non-conserved loop of the C2 domain (Fig. (Fig.5).5 In summary, comparison of Arabidopsis and rice PLD genes revealed that they exhibit generally non-conserved exon-intron structures (Table 1, Table 2; Fig. Fig.3),3 PLD diversity and expression as recorded in the EST collections Databases of expressed sequence tags (ESTs) are available for a number of plant species and represent invaluable resource for both functional and evolutionary studies, providing information on both genetic diversity and expression profiles. To assess these aspects of the angiosperm PLD family, we identified a number of PLD-derived ESTs from Arabidopsis, rice, tomato, Medicago truncatula and Sorghum bicolor (see Table 1 and Table 2, and Additional files). For exploration of PLD diversity beyond Arabidopsis and rice, tomato is a suitable starting point, with more than 140,000 ESTs available and five full-length PLD cDNAs cloned representing three PLDα and two PLDβ genes [19,22,23]. With the exception of LePLDα2, all cloned tomato PLDs are recorded among ESTs, but expression of additional isoforms is documented, too, including a PLD similar to alpha types, at least two putative delta isoforms, a PLD most similar to AtPLDε and at least one gene from the PXPH-PLD group. The second species analysed was Medicago truncatula with more than 137,000 ESTs in GenBank. No PLD has yet been reported from this plant, but ESTs again indicate the presence of a complex PLD family, comprising at least two indisputable PLDα homologs highly similar to each other, at least two additional genes less similar to alpha types, potentially three PLDβ isoforms, at least one delta ortholog, a PLD most similar to AtPLDε and two members of the PXPH-PLD subfamily. As a monocotyledonous model for EST analysis we chose Sorghum bicolor, for which more that 84,000 ESTs had been sequenced. Multiple homologs could again be found among the ESTs, including at least two obvious alpha PLDs, a gene related to the rice PLDη1, one PLDβ and a PLD most similar to the rice PLDμ. In summary, our EST analysis revealed that PLD types identified in Arabidopsis and rice are widespread in angiosperms, but there might be additional types not yet characterised. With the help of EST clones, full-length genes/cDNAs can be easily isolated and characterised, so deeper insight into PLD diversity in plants can become soon available (the list of ESTs analysed is available in Additional files). The relative abundance of ESTs can provide information on expression of individual genes [24]. Unfortunately, for most PLD genes there are too few ESTs for statistically significant estimation of their expression in specific tissues, developmental stages or conditions, and only general level of expression can be inferred. According to the total number of cognate ESTs in the GenBank, the most highly expressed PLD gene in Arabidopsis is AtPLDα1 (40 EST entries) followed with AtPLDδ (29 entries), while other genes seem to be expressed at a considerably lower level or not recorded at all (see Table 1). In rice, expression of OsPLDα1 predominates to a similar extent as in Arabidopsis (~40% of all ESTs from PLD genes), and expression of the other genes is markedly lower as well (Table 2). The EST collection from Medicago provides even more strongly substantiated evidence for an expression bias with 63% (of 74 PLD-derived ESTs in total) matching one of multiple PLDα paralogs. Among tomato ESTs that can be assigned to the cloned cDNAs, 14 come from LePLDα1, 2 from LePLDα3, 4 from LePLDβ1 and 1 from LePLDβ2, suggesting that expression of LePLDα1 might again be prevalent. However, LePLDα2 and LePLDα3 have higher expression levels than LePLDα1 when measured by Northern blots [19], so caution must be paid when few ESTs are used for conclusions on expression profiles. In Sorghum one of two alpha-type genes also accounts for ~40% of PLD-derived ESTs, but, in contrast to previous collections, a PLDβ-like isoform appear to be sampled to a similar extent. Interestingly, all the EST corresponding to the latter gene are derived from a cDNA library prepared from a pathogen-infected plants. It is tempting to speculate that the expression of this PLDβ gene might be induced by a pathogen-derived signal, similarly to LePLDβ1 reported to be induced upon treatment with an elicitor xylanase [19]. Predominant expression of alpha-type PLDs inferred from our EST analysis fits with biochemical experience, since the enzymatic activity usually ascribed to alpha-type PLDs is much more abundant in plant tissues compared to the activity of the beta and gamma types [2]. Interestingly, in most plants studied (except for tomato), two PLDα genes could be found, but only one of them was highly expressed (see Table 1, Table 2 and Additional files). Similarly, two PLDα paralogs have been cloned from the resurrection plant Craterostigma plantagineum, one of them expressed constitutively and the other one induced only upon desiccation stress [25]. Differential expression mode for two very similar PLDs has been observed also in tomato, where the elicitor xylanase stimulated expression of LePLDβ1 but not of LePLDβ2[19]. Henceforth, if the differences in expression did relate to differences in physiological function, it could be concluded that there is only little functional redundancy within plant PLD family, even among highly similar isoforms. Functional aspects of the primary structure of plant PLDs As already noted, all known eukaryotic PC-PLDs belong to two subfamilies differing in their N-terminal portion (Fig. (Fig.2),2
C2- and PXPH-PLDs are believed to differ in their dependence on Ca2+. Animal and fungal PLDs are not directly dependent on Ca2+[1], and the same is likely also for plant PXPH-PLDs. On the other hand, most (but perhaps not all, see below) C2-PLDs will exhibit dependence on and regulation by Ca2+, as the C2 domains usually bind phospholipids in a Ca2+-dependent manner [26]. Structural characterisation of several C2 domains revealed that three Ca2+-coordinating sites occupied by aspartate or asparagine residues are used generally, while other ligands are specific for individual domains (see [26] and Fig. Fig.5).5 All characterised PC-PLDs from both major subfamilies are stimulated by or even dependent on PIP2 under physiological or near-physiological conditions [1,2,29]. Mammalian and yeast PLDs appear to interact with PIP2 by the PH domain [30] and via a novel highly conserved motif located between the two copies of the catalytic HKD motif (Fig (Fig2;2 Two motifs rich in basic residues and allegedly similar to a polyphosphoinositide-binding motif from gelsolin or phospholipase C ([KR]X3-4KX [KR] [KR]) have been found in plant PLDs flanking the second catalytic HKD domain and proposed to mediate PIP2 binding by PLDs [27]. It was claimed that all the basic residues are conserved only in Arabidopsis PLDβ1, whereas some are replaced with non-polar or acidic residues in AtPLDα1 and AtPLDγ1. However, our inspection of revised AtPLDβ1 sequence shows that the first "motif" of AtPLDβ1 has actually also only three basic residues, since the original motif definition was based on the inaccurate cDNA sequence. A genuine gelsolin/PLC consensus motif is found only in OsPLDβ1, OsPLDv1 and OsPLDv2 (Fig. (Fig.4A).4A Conclusions Our analysis of Arabidopsis and rice genomic data complemented by searches of EST sequences revealed that plant PLDs are unexpectedly structurally diverse in two aspects. First, individual plant genomes harbour various PLD types from both main PLD subfamilies. This is in sharp contrast to other large eukaryotic lineages. Species with completely or almost completely sequenced genomes, i.e. Saccharomyces cervisiae, Schizosaccharomyces pombe, Caenorhabditis elegans and Drosophila melanogaster, all posses only one gene from the PC-PLD family, and mammalian diversity is perhaps limited to two thoroughly characterised isoforms (our findings and [1]). All characterised animal and fungal PLDs belong to the PXPH-PLD subfamily. The occurrence of C2-PLDs beyond plants is unsure, they have been described only from angiosperms, and mosses are the most remote group for which C2-PLD sequences can be reliably found in databases (at least four distinct genes in Physcomitrella patens, see Additional files). It can be inferred from our phylogenetic analysis (see Fig. Fig.1A)1A The second aspect of plant PLD diversity relates to inter-specific differences in the repertoire of distinct PLD types. For instance, there are no Arabidopsis orthologs of rice OsPLDη, OsPLDθ or OsPLDκ, while rice may lack counterparts of AtPLDγ or AtPLDζ from Arabidopsis. Similarly, only LePLDα1 from tomato is a true ortholog of other dicotyledonous alpha PLDs, while LePLDα2 and LePLDα3 form together a separate lineage within the PLDα cluster (Fig. (Fig.1B).1B Diversity of plant PLDs raises the question of functional specificities of individual isoforms. Although only limited functional predictions can be made solely on the basis of sequence data, the principal difference in domain structure between C2- and PXPH-PLDs suggests that their cellular functions will also differ. PXPH-PLDs in animals and yeasts appear to be involved in regulation of vesicular and membrane trafficking (reviewed in [1]), and plant orthologs could be used in a similar context [32]. Ca2+-independent PLD activity, which is probably exhibited by all PXPH-PLDs, has not been reported from plant tissues, but this is perhaps due to overabundant activity of C2-PLDs (especially PLDα) and to the notably low abundance of regulatory enzymes in general. Moreover, some stimulatory factors might be necessary for measurable activity of plant PXPH-PLDs, as is the case for mammalian PLD1 [1]. On the other hand, C2-PLDs may fulfil plant-specific tasks. Evolutionary dynamics of this subfamily in angiosperms indicates that environmental factors might exert big influence on these enzymes. Recognised role for C2-PLDs in processes such as response to wounding, pathogen attack and multiple abiotic stresses seems to fit this view, but other processes including membrane degradation during senescence also have to be considered [2]. Functioning in signalling cascades may be common to both C2- and PXPH-PLDs, although the distinction between signalling function and the previously suggested roles does not have to be unambiguous. Directions for future research on the plant PLD are straightforward. Besides the routinely used biochemical or pharmacological approaches, methods of reverse genetics (including anti-sense silencing and screening for insertional mutants) have to be employed. Partial functional redundancy, which can be expected for some plant PLD isoforms, could be coped with by generation of multiple mutants, accompanied by monitoring of expression of individual genes upon various circumstances and by experimental analysis of promoters. For deeper understanding of PLD regulation and interconnections within cellular context, attention must be focused on possible posttranslation modifications and interacting partners. Coordination of all these approaches has the potential to answer the question why plants farm so many PLDs. Materials and Methods For searches of public data we used BLAST toolkit at the National Centre for Biotechnology Information (http://www.ncbi.nlm.nih.gov/BLAST; [33,34]). Searches were done in parallel with low complexity filter on and off, respectively, other parameters were kept default. All sequence databases containing plant data were exploited, including non-redundant nucleotide database, HTGS, GSS and EST databases and the non-redundant protein database. The final check of these databases was done between January 24, 2002, and January 26, 2002. Rice sequence data generated by Monsanto were searched using BLASTN and TBLAST facility at the rice-research.org web-page http://www.rice-research.org/. Hits from all BLAST searches with E-value above 0.1 were not considered for further analysis. Multiple alignments were constructed by CLUSTALW (version 1.8) at the BCM Search Launcher (http://searchlauncher.bcm.tmc.edu/multi-align/multi-align.html; [35]), with default parameters. Manual editing of the alignments was done with the assistance of GENEDOC (Free Software Foundation, Inc.). Alternatively, multiple alignments were constructed using MACAW [35], with PAM 120 matrix used for protein sequences. Exon-intron structures of Arabidopsis PLD genes were predicted employing GENESCAN (http://genes.mit.edu/GENSCAN.html; [37]), GRAIL http://grail.lsd.ornl.gov/Grail-1.3/, NetGene2 http://www.cbs.dtu.dk/services/NetGene2/, FGENEP http://dot.imgen.bcm.tmc.edu:9331/gene-finder/gf.html and SplicePredictor http://bioinformatics.iastate.edu/cgi-bin/sp.cgi. Models proposed by each programme were compared and final structures were proposed relating to the information from cognate cDNAs, ESTs and multiple alignments. Rice genes were predicted manually with the assistance of GENESCAN (with the options set up for maize). These predictions were again checked by comparison with ESTs, cDNAs and protein sequences of PLDs. Specific domains in PLD protein sequences were searched by SMART (http://smart.embl-heidelberg.de/; [38]) and the Search Pfam tool (http://www.sanger.ac.uk/Software/Pfam/; [39]). Searches for targeting signals were performed using the TargetP programme http://www.cbs.dtu.dk/services/TargetP/. Phylogenetic trees were inferred from multiple alignments of protein sequences using appropriate programmes from the PHYLIP package, version 3.57c [40]. Neighbour-joining trees were constructed as described previously [41], PROTPARS programme was employed for maximum parsimony methods and confidence of the tree topology was estimated from 500 bootstrap replications. In the case that multiple alignments generated by CLUSTALW were used for phylogenetic inference, regions that could not be aligned unambiguously or containing deletions/insertions had been removed prior. For phylogenetic inference from alignments generated by MACAW only the most conserved boxes were used. Levels of sequence identity/similarity occasionally noted through the text refer to values calculated by the BLAST 2 Sequences programme [42] with the low complexity filter off. Note added in proof A reannotation of the Arabidopsis genome released into GenBank after submission of the manuscript removes some inaccuracies in predictions of exon-intron structures of PLD genes independently uncovered also by our analysis. An updated list of Arabidopsis PLD genes has been deposited into the TAIR gene families database (http://www.arabidopsis.org/info/genefamily.html). Additional file 1 Corrected versions of coding sequences of previously annotated plant PLD genes Click here for file(37K, doc) Additional file 2 Novel rice PLD genes found in GenBank genomic sequences or Monsanto rice genome draft Click here for file(40K, doc) Additional file 3 Protein sequences of plant PLDs used for structural and phylogenetic analysis Click here for file(52K, doc) Acknowledgements This work was supported by MSMT CR LN00A081, J13/98:113100003, and GACR 206/99/1138 grants. We are highly indebted to Monsanto for being allowed to search their rice genome data and to make the results publicly available. We thank Marta Čadyová for perfect technical assistance. References
|
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
|||||||||||||||||
Prog Lipid Res. 2000 Mar; 39(2):109-49.
[Prog Lipid Res. 2000]J Biol Chem. 1994 Aug 12; 269(32):20312-7.
[J Biol Chem. 1994]Prog Lipid Res. 2000 Mar; 39(2):109-49.
[Prog Lipid Res. 2000]Plant J. 2001 Jan; 25(1):55-65.
[Plant J. 2001]Biochim Biophys Acta. 1998 Jan 23; 1389(3):222-72.
[Biochim Biophys Acta. 1998]Biochim Biophys Acta. 2001 Feb 26; 1530(2-3):172-83.
[Biochim Biophys Acta. 2001]Plant Cell. 1997 Dec; 9(12):2183-96.
[Plant Cell. 1997]J Biol Chem. 1997 Mar 14; 272(11):7055-61.
[J Biol Chem. 1997]Nature. 2000 Dec 14; 408(6814):796-815.
[Nature. 2000]Plant Physiol. 2001 Mar; 125(3):1164-5.
[Plant Physiol. 2001]Protein Sci. 1996 May; 5(5):914-22.
[Protein Sci. 1996]Prog Lipid Res. 2000 Mar; 39(2):109-49.
[Prog Lipid Res. 2000]Prog Lipid Res. 2000 Mar; 39(2):109-49.
[Prog Lipid Res. 2000]Plant J. 2001 Jun; 26(6):595-605.
[Plant J. 2001]Plant Physiol. 2001 Nov; 127(3):1102-12.
[Plant Physiol. 2001]J Biol Chem. 1997 Nov 7; 272(45):28267-73.
[J Biol Chem. 1997]EMBO J. 1999 Nov 1; 18(21):5911-21.
[EMBO J. 1999]Nature. 2000 Dec 14; 408(6814):796-815.
[Nature. 2000]Annu Rev Plant Physiol Plant Mol Biol. 2001 Jun; 52():211-231.
[Annu Rev Plant Physiol Plant Mol Biol. 2001]Prog Lipid Res. 2000 Mar; 39(2):109-49.
[Prog Lipid Res. 2000]Plant Physiol. 2001 Nov; 127(3):1102-12.
[Plant Physiol. 2001]Nature. 2000 Dec 14; 408(6814):796-815.
[Nature. 2000]Nature. 2000 Dec 14; 408(6814):796-815.
[Nature. 2000]Prog Lipid Res. 2000 Mar; 39(2):109-49.
[Prog Lipid Res. 2000]Prog Lipid Res. 2000 Mar; 39(2):109-49.
[Prog Lipid Res. 2000]Prog Lipid Res. 2000 Mar; 39(2):109-49.
[Prog Lipid Res. 2000]Prog Lipid Res. 2000 Mar; 39(2):109-49.
[Prog Lipid Res. 2000]Prog Lipid Res. 2000 Mar; 39(2):109-49.
[Prog Lipid Res. 2000]Plant J. 2001 Jun; 26(6):595-605.
[Plant J. 2001]Plant Physiol. 2001 Nov; 127(3):1102-12.
[Plant Physiol. 2001]Plant Physiol. 2001 Mar; 125(3):1164-5.
[Plant Physiol. 2001]Plant Cell Physiol. 1995 Jul; 36(5):903-14.
[Plant Cell Physiol. 1995]Prog Lipid Res. 2000 Mar; 39(2):109-49.
[Prog Lipid Res. 2000]Physiol Plant. 2001 May; 112(1):87-94.
[Physiol Plant. 2001]Genome Res. 1999 Oct; 9(10):950-9.
[Genome Res. 1999]Prog Lipid Res. 2000 Mar; 39(2):109-49.
[Prog Lipid Res. 2000]Plant Cell. 2000 Jan; 12(1):111-24.
[Plant Cell. 2000]Biochim Biophys Acta. 1998 Jan 23; 1389(3):222-72.
[Biochim Biophys Acta. 1998]Physiol Plant. 2001 May; 112(1):87-94.
[Physiol Plant. 2001]J Biol Chem. 1998 Jun 26; 273(26):15879-82.
[J Biol Chem. 1998]J Biol Chem. 1997 Nov 7; 272(45):28267-73.
[J Biol Chem. 1997]J Biol Chem. 2000 Jun 30; 275(26):19700-6.
[J Biol Chem. 2000]Prog Lipid Res. 2000 Mar; 39(2):109-49.
[Prog Lipid Res. 2000]Prog Lipid Res. 2000 Mar; 39(2):109-49.
[Prog Lipid Res. 2000]Arch Biochem Biophys. 1999 Aug 15; 368(2):347-53.
[Arch Biochem Biophys. 1999]Curr Biol. 2000 Jan 13; 10(1):43-6.
[Curr Biol. 2000]EMBO J. 1999 Nov 1; 18(21):5911-21.
[EMBO J. 1999]J Biol Chem. 1997 Nov 7; 272(45):28267-73.
[J Biol Chem. 1997]Arch Biochem Biophys. 1999 Aug 15; 368(2):347-53.
[Arch Biochem Biophys. 1999]Sci STKE. 2001 Dec 4; 2001(111):pe42.
[Sci STKE. 2001]Prog Lipid Res. 2000 Mar; 39(2):109-49.
[Prog Lipid Res. 2000]J Mol Biol. 1990 Oct 5; 215(3):403-10.
[J Mol Biol. 1990]Nucleic Acids Res. 1997 Sep 1; 25(17):3389-402.
[Nucleic Acids Res. 1997]Nucleic Acids Res. 1994 Nov 11; 22(22):4673-80.
[Nucleic Acids Res. 1994]J Mol Biol. 1997 Apr 25; 268(1):78-94.
[J Mol Biol. 1997]Nucleic Acids Res. 2000 Jan 1; 28(1):231-4.
[Nucleic Acids Res. 2000]Nucleic Acids Res. 2000 Jan 1; 28(1):263-6.
[Nucleic Acids Res. 2000]Methods Enzymol. 1996; 266():418-27.
[Methods Enzymol. 1996]Genome Biol. 2000; 1(2):comment1002.1-1002.2.
[Genome Biol. 2000]FEMS Microbiol Lett. 1999 May 15; 174(2):247-50.
[FEMS Microbiol Lett. 1999]