![]() | ![]() |
Formats:
|
||||||||||||||||||||||||||
Copyright © 2004, Cold Spring Harbor Laboratory Press Comparative Genomics of Gene Expression in the Parasitic and Free-Living Nematodes Strongyloides stercoralis and Caenorhabditis elegans 1 Genome Sequencing Center, Department of Genetics, Washington University School of Medicine, St. Louis, Missouri 63108, USA 2 Divergence Inc., St. Louis, Missouri 63141, USA 3 Laboratory of Parasitic Diseases, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, Maryland 20892, USA 4 Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA 5These authors contributed equally to this manuscript. 6Corresponding author. E-MAIL mmitreva/at/watson.wustl.edu; FAX (314) 286-1810. Received May 8, 2003; Accepted November 24, 2003. This article has been cited by other articles in PMC.Abstract Although developmental timing of gene expression is used to infer potential gene function, studies have yet to correlate this information between species. We analyzed 10,921 ESTs in 3311 clusters from first- and infective third-stage larva (L1, L3i) of the parasitic nematode Strongyloides stercoralis and compared the results to Caenorhabditis elegans, a species that has an L3i-like dauer stage. In the comparison of S. stercoralis clusters with stage-specific expression to C. elegans homologs expressed in either dauer or nondauer stages, matches between S. stercoralis L1 and C. elegans nondauer-expressed genes dominated, suggesting conservation in the repertoire of genes expressed during growth in nutrient-rich conditions. For example, S. stercoralis collagen transcripts were abundant in L1 but not L3i, a pattern consistent with C. elegans collagens. Although a greater proportion of S. stercoralis L3i than L1 genes have homologs among the C. elegans dauer-specific transcripts, we did not uncover evidence of a robust conserved L3i/dauer `expression signature.' Strikingly, in comparisons of S. stercoralis clusters to C. elegans homologs with RNAi knockouts, those with significant L1-specific expression were more than twice as likely as L3i-specific clusters to match genes with phenotypes. We also provide functional classifications of S. stercoralis clusters. Strongyloides Pathogenesis and Biology The human round worm Strongyloides stercoralis causes chronic infections of the gastrointestinal tract. In immune-competent hosts, the disease is not life-threatening, but immunodeficiency can lead to dangerous disseminated infections with pulmonary hemorrhage, necrotizing colitis, and 80% mortality if untreated (Igra-Siegman et al. 1981). Strongyloidiasis is difficult to diagnose (Genta 1988), and estimates of worldwide infections range from 70–600 million (Chen et al. 1994). Research goals include development of vaccines (Herbert et al. 2002) and diagnostics (Siddiqui and Berk 2001). Strongyloides has a unique life cycle, with parasitic and free-living generations. Parasitic females in the intestine produce eggs by mitotic parthenogenesis, and first-stage (L1) larvae are excreted in stool. Larvae use environmental and genetic cues to determine their developmental path, becoming free-living adults (heterogonic pathway) or third-stage infective (L3i) larvae (homogonic pathway; Schad 1990; Ashton et al. 1998; Grant and Viney 2001). S. stercoralis free-living worms can complete one life cycle of sexual reproduction outside the host, generating progeny that must re-enter parasitic development (Yamada et al. 1991). Homogonic development resembles the life cycle of other parasitic nematodes (e.g., hookworms), whereas the heterogonic life cycle is much like that of free-living nematodes, including Caenorhabditis elegans, in nutrient-rich conditions. L3i, derived from either parasitic or free-living parents, are suited for long-term survival and dispersal in the environment and are the only stage capable of infection, entering the host by skin penetration before traveling to the lungs and on to the intestine. L3i of S. stercoralis and many parasitic nematodes are developmentally arrested, nonfeeding, and resistant to extreme temperatures and desiccation. They are morphologically similar to the dauer larvae formed by free-living nematodes under unfavorable environmental conditions, a stage that has been extensively studied in C. elegans (Hawdon and Schad 1991; Lopez et al. 2000). C. elegans dauers (L3d) can arrest for months, molting to L4 when favorable conditions return, and much is known about the molecular genetic control of dauer entry and exit (Riddle and Albert 1997). In S. stercoralis, host factors are likely critical to the exit of L3i from arrest, but little is known about the genes involved. Nematode Comparative Genomics The C. elegans genome is complete (The C. elegans Sequencing Consortium 1998), and substantial annotation has been added by gene expression (Hill et al. 2000; Jones et al. 2001; Kim et al. 2001) and RNA interference (RNAi) studies (Kamath et al. 2003). Parasitic nematode genomes are being explored via expressed sequence tags (ESTs); projects on >30 species have generated nearly 300,000 parasitic nematode ESTs (McCarter et al. 2002; Parkinson et al. 2003) including collections from parasites of mammals (Tetteh et al. 1999; Daub et al. 2000; Blaxter et al. 2002) and plants (Popeijus et al. 2000; Dautova et al. 2001; McCarter et al. 2003). Comparative genomic studies that begin to look for correlation in gene expression patterns across species are an important step in understanding the degree of relevance of model species, such as C. elegans, to the biology of species of interest including parasites. Previous characterization of the S. stercoralis genome was limited to 57 ESTs (Moore et al. 1996) and studies of individual genes of interest (Siddiqui et al. 1997, 2000; Massey et al. 2001). Strongyloididae species (S. stercoralis, S. ratti, Parastrongyloides trichosuri) are useful parasites for comparative studies with C. elegans because they can be maintained outside the host for a generation or more, depending upon the species (Viney 1999; Dorris et al. 2002). To create an inventory of S. stercoralis genes and to support studies of Strongyloides pathogenesis and biology, we analyzed an estimated 2947 genes expressed during L1and L3i. Compared to L3i-expressed genes, L1-expressed transcripts from S. stercoralis are more likely to have C. elegans homologs that are expressed and essential during growth in nutrient-rich conditions. RESULTS AND DISCUSSION As part of a larger effort to examine the genomes of parasitic nematodes, we submitted to GenBank 5′ ESTs from staged S. stercoralis cDNA libraries including 4473 from L1and 6435 from L3i. Here we present the first large-scale analysis of S. stercoralis genes, including a comparison to gene expression patterns in C. elegans. NemaGene Cluster Formation To reduce data redundancy and determine gene representation, the 10,908 S. stercoralis sequences were grouped by identity into 3479 contigs and further organized into 3311 clusters. ESTs within a contig derive from nearly identical transcripts, whereas contigs within a cluster may represent splice isoforms of a gene. Clusters ranged in size from a single EST (1868 cases) to 1097 ESTs (Fig. 1
Distribution of BLAST Matches and Homologs in C. elegans The Figure 2
As expected for a clade IVA Strongyloididae nematode with phylogenetic proximity to the clade V Rhabditida (Blaxter et al. 1998), the C. elegans genome provided the best source of information for interpreting S. stercoralis sequences: 89.5% of clusters with matches showed homology to a C. elegans gene product (Fig. 2 Abundant Transcripts Expressed in L1 Versus L3i The 25 most highly represented clusters accounted for 27% of ESTs (Table 1A). Representation in a cDNA library generally correlates with abundance in the original biological sample (Audic and Claverie 1997), although artifacts can occur. Among the most abundant clusters, four have homology to known parasite antigens. Two are highly represented in L3i; IGG immunoreactive antigen (SS00012.cl) is observed in patients with chronic Strongyloidiasis (S. Ramachandran, W. Thompson, and F.A. Neva, unpubl.), and L3NIEAG.01, represented by three clusters, is a putative member of the Ancylostoma secreted protein (ASP) family (Hawdon et al. 1996). Among transcripts abundant in both stages are SS01569.cl with homology to genes encoding calreticulin-related antigens in Necator americanus (Pritchard et al. 1999) and Onchocerca volvulus (Rokeach et al. 1994), and SS01566.cl with weak homology to the immunodominant hypodermal SXP-1antigen used as a filarial diagnostic tool (Dissanayake et al. 1992, 1994; Klion et al. 2003). Highly abundant L1-specific clusters (Tables 1A, 1B) include genes encoding specific collagens (SS00069.cl, SS00001.cl, SS01498.cl, SS01492.cl, and SS01490.cl). Remarkably, 38 S. stercoralis collagen-encoding clusters are detected in L1, with none found in L3i. In C. elegans, the cuticular collagen superfamily consists of about 100 members (Mayne and Brewton 1993), and several dozen are characterized in other nematodes (Selkirk et al. 1989; Selkirk and Blaxter 1990; Cox 1992). Although sharing conserved sequences, nematode collagens are often developmentally regulated and not functionally redundant (Levy et al. 1993). In C. elegans, collagens are expressed in waves coinciding with the four molts (L1to L2, etc.; Johnstone 1994; Johnstone and Barry 1996). A survey of genes expressed in C. elegans dauer versus other stages found no collagen genes among 358 dauer-specific transcripts, but numerous collagens among the genes expressed during nutrient-rich conditions where molting worms were present, including six of the 20 most abundant transcripts (Jones et al. 2001). Likewise, in our survey of 5713 ESTs from root-knot nematode Meloidogyne incognita L2 (infective dauer-like stage), only three collagen ESTs were found (0.05% of transcripts; McCarter et al. 2003), though collagens are more common in other stages (M. Mitreva and J. McCarter, unpubl.). Down-regulation of collagen expression may be a general feature of the long-lived nonmolting dauer/infective stage in many nematodes, a possibility that is now being explored. Abundant L3i-specific clusters (Tables 1A, 1C) encode several novel proteins (SS01581.cl, SS01616.cl) as well as the first sheath protein (SHP3)-encoding transcript described in S. stercoralis (SS01534.cl). Nematode surface proteins (SHP1–SHP5) have been studied in Brugia malayi and Litomosoides sigmodontis (Selkirk et al. 1991; Zahner et al. 1995; Conraths et al. 1997).
Comparative Genomics of Transcription in S. stercoralis L1 versus L3i and C. elegans Nutrient-Rich Conditions versus Dauer Figure 3
Based on morphology and behavior, C. elegans dauer larvae and infective-stage L3i of many animal parasites are believed to be equivalent, and searches for homologs of C. elegans dauer pathway genes are underway in many parasites, including S. stercoralis (Massey et al. 2001). As a first comprehensive comparative genomics approach to examining conservation of gene function during nematode evolution, we compared transcripts with stage-specific or stage-biased gene expression between S. stercoralis (L1 vs. L3i) and C. elegans (dauer vs. other stages). The aim of this comparison was to determine whether there is any pattern of shared expression of homologs in like-stages between species (L3i with dauer, and L1with other stages). In C. elegans, gene expression in dauer versus other stages had previously been compared in our lab by serial analysis of gene expression (SAGE; Jones et al. 2001). In that study, nondauer stages were mixtures of all feeding larval stages (L1–L4) and adults containing embryos growing in nutrient-rich conditions where L1s made up ~5% of the sample volume; for simplicity we refer to these mixed stages as “nutrient-rich-specific”. Of 11,130 detected C. elegans genes, 6496 were common to both groups, 328 were identified as significant dauer-specific, and 489 were nutrient-rich-specific by the Fisher exact test (P < 0.05; Jones et al. 2001). S. stercoralis L1- and L3i-biased or -specific clusters were compared to the C. elegans dauer-specific and nutrient-rich-specific genes at a variety of BLAST thresholds (1e-30, 1e-15, and 1e-05). In all cases, the overwhelming result was that BLAST matches were dominated by S. stercoralis L1/C. elegans nutrient-rich matches, with L1/dauer, L3i/nutrient-rich, and L3i/dauer matches less common. This was true even though L1s made up only a fraction of the C. elegans mixed-stage starting material used for SAGE, perhaps because of shared expression between all feeding and growing stages or specifically between L1s and embryos. One example is given in Figure 4
Next, we restricted consideration to just the larger 178 L1-biased and 83 L3i-biased S. stercoralis clusters with significant biased representation (Suppl. Fig. 2; Audic and Claverie 1997). In this case as well, the comparison to C. elegans showed a strong preference toward L1/nutrient-rich matches, with highly significant χ2 values at BLAST cut-offs of 1e-15 and 1e-05. For example, at 1e-15, 36 of 53 matches were in the L1/nutrient-rich quadrant, versus an expectation of 21.6. Sample sizes were inadequate for comparison at 1e-30. In addition to the assumption that the distribution of matches would reflect the relative sizes of the starting data sets, we also considered the null hypothesis that the S. stercoralis L3i data set should show a distribution of matches similar to the C. elegans nutrient-rich versus dauer categories, as is seen with the L1data set. This null hypothesis was also rejected at P < 0.05; the L3i-specific or L3i-biased data sets were significantly more likely than the L1data sets to distribute their matches in dauer by measures of 1.3- to 3.9-fold, depending on the data sets and thresholds used. At least two factors could account for the concentration of S. stercoralis/C. elegans BLAST matches in the L1/nutrient-rich quadrant. First, these matches may reflect actual evolutionary conservation of expression pattern by homologs between the two species; that is, genes excluded from L3i in S. stercoralis tend to be excluded from dauer in C. elegans. Second, genes expressed during L1and nutrient-rich growth may be more likely to have conserved sequences than genes expressed during L3i/dauer. Evidence suggests that both these factors are involved. Addressing sequence conservation, Jones et al. (2001) noted that 15 of the 20 most abundant dauer-specific transcripts in C. elegans were of unknown function, whereas fewer of the most abundant nondauer-specific transcripts were unknowns (8 of 20), suggesting the possibility that dauer-specific genes are more rapidly evolving or less likely to be found in other species (Jones et al. 2001). Other C. elegans studies have also found that genes with roles in later or specialized stages of development (i.e., dauer as opposed to embryonic or germ line) tend to be less conserved (Castillo-Davis and Hartl 2002; Kamath et al. 2003). Similarly, S. stercoralis L3i clusters appear to be less conserved than L1clusters. At a BLAST threshold of 1e-05, 95% of significant L1-biased clusters have C. elegans homologs, compared to only 82% of the L3i-biased clusters. Also at 1e-05, 84% of L1-specific clusters have C. elegans homologs, compared to only 66% for L3i-specific clusters. Recalculating the expected values for the quadrants in the comparisons described above, counting only clusters with BLAST matches for input data set sizes, still results in significant χ2 values, but less severe than those seen before the adjustment. For instance, in the Figure 4 While a greater proportion of S. stercoralis L3i-specific genes than L1-specific genes have homologs among the C. elegans dauer-specific transcripts, we did not uncover evidence of a robust L3i/dauer `expression signature' conserved between the two nematodes. There are a number of challenges that may prevent use of this EST-based comparative genomics approach to identify key genes involved in these stages of presumably shared origin. First, a major limitation of this approach is sample size. By using only stage-biased or stage-specific transcripts in our comparison, we are essentially limiting analysis to 4.2% of C. elegans genes and 1.2% or 13.3% of S. stercoralis genes (assuming 19,500 genes per species). The intersection of these comparisons tends to be rather small (several dozen to several hundred matches). Second, use of stage specificity as a criterion may be inappropriately severe, because expression of a gene in other stages does not prevent it from still playing a critical role in the stage of interest (i.e., L3i/dauer). In addition, we currently have no available information to make comparisons based on organ or tissue of expression; substantial changes in gene expression that occur in the context of specific cells may be lost in our whole-organism analysis. Third, the speed of evolution of genes involved in L3i and dauer may make detection of homologs involved in these stages more difficult than for genes expressed in other more conserved stages. It is also possible that although they are structurally and functionally very similar, the C. elegans dauer stage and S. stercoralis L3i stage may have evolved to be substantially different at the molecular level or could even conceivably have arisen by convergent evolution. Overcoming these challenges would likely involve having the full S. stercoralis genome sequence (for ortholog mapping) as well as full-genome microarray or SAGE data that could be compared to data generated by the equivalent methods in C. elegans. Homologs of C. elegans Genes Involved in Dauer Determination and Biology As an additional approach to uncover S. stercoralis genes involved in L3i/dauer biology, clusters were examined for homologs of 37 genes involved in dauer entry or maintenance in C. elegans (Jones et al. 2001). Twenty-five such C. elegans genes identified 36 S. stercoralis homologs, including 12 with high identity matches that may indicate orthology (Table 2). Included among the list are eight homologs of daf (dauer formation defective) genes (Georgi et al. 1990; Estevez et al. 1993; Larsen et al. 1995; Lin et al. 1997), as well as glutathione peroxidase genes (Vanfleteren 1993) and superoxide dismutase. Five S. stercoralis homologs showed a bias toward expression in L3i versus L1, including homologs of the daf-12 nuclear hormone receptor (SS01351.cl), F26E4.12 glutathione peroxidase (SS01468.cl), F38E11.2 heat shock protein (SS01374.cl), F22F1.1 histone H1 (SS01412.cl), and most strikingly a homolog of T26C11.2 (SS00028.cl) with 136 L3i ESTs and zero L1 ESTs. The availability of these sequences will aid in a more thorough study comparing C. elegans dauer and S. stercoralis L3i.
Comparison to C. elegans Genes with RNAi Phenotypes RNAi, whereby the introduction of a sequence-specific double-stranded RNA leads to degradation of corresponding mRNAs (Fire et al. 1998), has allowed the surveying of thousands of C. elegans genes for knockout phenotypes (Fraser et al. 2000; Gonczy et al. 2000; Maeda et al. 2001; Kamath et al. 2003). Such information is potentially transferable to understanding which genes play crucial roles in other nematodes, including parasites. RNAi has been demonstrated in three parasitic nematodes (Hussein et al. 2002; Urwin et al. 2002), but is not yet adaptable to rapid screening. We compared a list of 4786 C. elegans genes assayed by RNAi as of June 2002 to the list of the 2528 S. stercoralis clusters with C. elegans homologs. RNAi experimental information was available for the most closely related homolog in 1059 cases, and a phenotype was apparent in 401cases (38%; Suppl. Tables 2, 3). In contrast, RNAi surveys of all predicted genes in C. elegans resulted in phenotypes in just 10%–14% of cases (Kamath et al. 2003) and 27% for genes with evidence of expression (Maeda et al. 2001). Additionally, C. elegans genes with expressed S. stercoralis homologs were more likely to have severe RNAi phenotypes such as embryonic lethality and sterility (Fig. 5
To determine whether genes expressed at various stages and levels in S. stercoralis differ in the likelihood that their C. elegans homologs have RNAi phenotypes, we compared phenotypes observed for the best-scoring homologs of L1- and L3i-expressed clusters. C. elegans homologs of S. stercoralis clusters with significant L1-biased (178) or L3i-biased (83) expression show a significant difference, with 69% (62/90) of L1homologs having phenotypes versus only 30% for L3i (10/33; χ2 test, P<0.05; Snedecor and Cochran 1967). Nearly half of the L1 homologs with phenotypes are ribosomal proteins (20) or structural proteins such as actin and myosin (9), categories not found among the L3i-biased genes. Clusters with lower levels of expression did not show a significant difference between L1and L3i; using the full sets of S. stercoralis L1-specific (1342) and L3i-specific (1573) clusters resulted in C. elegans homologs with phenotypes for 42% of L1 (230/551) versus 37% of L3i clusters (209/563). Previous data showed that evidence of expression in C. elegans or other nematodes enriches for genes with RNAi phenotypes (Fraser et al. 2000; Maeda et al. 2001; McCarter et al. 2003). The comparison here between L1and L3i demonstrates that the particular stage and level of expression in another nematode is also an important predictor of phenotype, with C. elegans genes having S. stercoralis homologs highly expressed in L1being nearly six times as likely to have a phenotype as the average C. elegans gene surveyed by RNAi. High-level expression in L3i does not have quite as dramatic an effect of enriching for genes with phenotypes (2.5-fold vs. sixfold increase) for perhaps two reasons. First, high-throughput RNAi screens in C. elegans have observed nematodes during growth in nutrient-rich conditions when dauer larvae are not present. Screening for RNAi phenotypes in worms induced to enter dauer may detect phenotypes not seen in standard screens. It is also possible that the repertoire of genes expressed in dauer/L3i are truly less likely to result in phenotypes following knockout as genes active in L1. Functional Classification Based on Gene Ontology and KEGG Assignments To categorize transcripts by function, we utilized the Gene Ontology (GO) classification (www.geneontology.org). InterProScan (ftp://ftp.ebi.ac.uk/pub/software/unix/iprscan) was used to match S. stercoralis clusters to InterPro protein domains which themselves are already mapped into the GO hierarchy. Of 3311 clusters, 1298 (39%) align to InterPro domains, and 870 (26%) map to GO. Among the more highly expressed stage-biased clusters, 49% of L1-biased clusters map to the GO hierarchy, compared to only 36% for L3i. GO representation for S. stercoralis clusters is shown by biological process, cellular component, and molecular function (Fig. 6
Conclusions Increasing information on stage of gene expression now makes possible comparisons of gene expression patterns between related species. In one of the first such studies, we examined expression of homologous genes in S. stercoralis and C. elegans, observing conservation of genes expressed during growth in nutrient-rich conditions and exclusion of collagen expression from the dauer-equivalent stage in both species. Information on additional species and stages will help to refine our view of how patterns of gene expression have changed during nematode evolution. However, based on this analysis we anticipate that detecting robust stage-specific `expression signatures' conserved between distant nematodes will be quite challenging. Microarray experiments, which can better detect levels of gene expression than ESTs, will aid in these comparisons. As recently as February 2000, only 57 ESTs from Strongyloididae nematodes had been deposited in dbEST. As of March 2003, our submissions have brought that number to 30,115, including ESTs from the closely related species S. stercoralis (11,392), S. ratti (10,760), and P. trichosuri (7963; Dorris et al. 2002). Stages represented include L1, L2, L3, and adult, and derive from both the homogonic and heterogonic life cycles. Unlike most parasites, the heterogonic cycle of Strongyloididae species (Viney 1999) allows maintenance of cultures outside the host mammal in lab conditions identical to those used for C. elegans. Strongyloididae species are therefore more amenable to attempts at transferring techniques developed in C. elegans to parasitic nematodes, including transformation (Lok and Massey 2002) and RNAi, as well as mutagenesis and gene mapping. The number of generations for which a Strongyloididae species can be maintained in culture away from its host varies greatly from only one for S. stercoralis to upwards of 50 for P. trichosuri (W. Grant, pers. comm.). Such technical advantages for study, as well as the medical importance of S. stercoralis, make Strongyloididae species good candidates for the eventual generation of a draft genome sequence. METHODS Libraries and EST Generation The L1cDNA library was created from 2×103 S. stercoralis larvae recovered from jirds (Meriones unquiculatus) infected with a strain maintained in dogs. The L3i cDNA library used 4×105 larvae from a strain passed repeatedly in Patas monkeys (Harper et al. 1984). The genetic or environmental propensity of the strains to favor heterogonic versus homogonic development is not known. Unidirectional libraries were constructed in Uni-ZAP XR (Stratagene; McCarrey and Williams 1994). The L1 library had an unamplified titer of 1×105 plaque-forming units per mL (pfu/mL), an average insert size of 675 bp, and ~15% nonrecombinants. The L3i library had an unamplified titer of 1.5×106 pfu/mL, an average insert size of 957 bp, and ~3% nonrecombinants. For sequencing, the phagemid was excised and replicated in XL-1 Blue MRF′ cells. Sequencing and EST processing were performed as described (Hillier et al. 1996; Marra et al. 1999; McCarter et al. 2000, 2003). Prior to dbEST submission, sequences were processed to assess quality, trim vector, remove contaminants and cloning artifacts, and identify BLAST similarities (Hillier et al. 1996). The Web site www.nematode.net provides information on trace files and clone ordering. From 14,950 attempts, 11,335 sequences (76%) passed filtering and were submitted to dbEST (www.ncbi.nlm.nih.gov/dbEST). The average submitted read length was 435±101 nucleotides (457 for L1, 420 for L3i). The 10,921 ESTs analyzed here include 4473 L1and 6435 L3i submissions. An additional 414 ESTs submitted later are not included. Reads were failed for poor trace quality (~19.8% of all reads); missing insert (~3.7%); E. coli contamination (~0.4%); and small insert size (~0.1%). Clustering and Sequence Analysis Clustering was performed as described (McCarter et al. 2003) using Phred, Phrap, Consed, and BLAST programs (Ewing et al. 1998; Ewing and Green 2000). The completed assembly, Nema-Gene Strongyloides stercoralis v 2.0, is available at www.nematode.net. Fragmentation, defined as the representation of one gene by multiple nonoverlapping clusters, was estimated by examining S. stercoralis clusters with homology to C. elegans. Best-scoring BLASTX matches against Wormpep found matches to 2348 nonoverlapping regions of 2090 C. elegans proteins, for a fragmentation index of 11%. Two hundred-sixteen proteins were represented in two nonoverlapping regions, 16 proteins in three regions, and two proteins in four regions. Overlapping matches by multiple clusters to the same region of a C. elegans gene was not considered fragmentation, as these clusters had already been directly compared to one another by BLASTN. WU-BLAST sequence comparisons were performed as described (Altschul et al. 1990; http://blast.wustl.edu [Gish 2002]; McCarter et al. 2003) using contig sequences as queries versus multiple databases, including the SWIR v.21(5/19/2000) protein database, Wormpep v.54 C. elegans protein database (Wellcome Trust Sanger Institute, unpubl.), and internal databases constructed using intersections of GenBank data, such as nematode sequences excluding C. elegans and S. stercoralis. This allows examination of sequences in specific phylogenetic distributions (Wheeler et al. 2001). Homologies were reported for e-value scores of 1e -05 and better. TRANSLATE was used to translate contigs for ORF analysis (S. Eddy, unpubl.). Comparison of S. stercoralis and C. elegans Stage-Specific Transcripts To examine shared gene expression patterns between nematodes, all 1342 L1-specific and 1573 L3i-specific S. stercoralis clusters were compared by BLAST to 489 nondauer-specific and 328 dauer-specific C. elegans genes (Jones et al. 2001). Additionally, 178 L1-biased clusters and 83 L3i-biased clusters from S. stercoralis with significant stage-biased expression based on sample size were selected using a pairwise test (Audic and Claverie 1997). Null hypotheses about the distribution of matches between data sets were tested using the χ2 statistic with one or three degrees of freedom as appropriate (Steel and Torrie 1960). Comparisons used BLASTX to match cluster nucleotide sequences to C. elegans gene translations in Wormpep v.54 at 1e-05, 1e-15, and 1e-30. SAGE tag sequences were used only as a means of identifying C. elegans genes (Jones et al. 2001), and were not used in sequence comparisons to S. stercoralis. Functional Assignments Clusters were assigned putative functional categorization as described (McCarter et al. 2003) using InterProScan v.3.1(ftp://ftp.ebi.ac.uk/pub/software/unix/iprscan), InterPro domains (11/08/02; InterProScan; Apweiler et al. 2001; Zdobnov and Apweiler 2001), InterPro to GO mappings, and Gene Ontology categorization (go_200211_assocdb.sql; The Gene Ontology Consortium 2000). Mappings are stored in a MySQL database and displayed using AmiGo (11/25/02; www.godatabase.org/cgi-bin/go.cgi). Clusters were assigned by enzyme commission number to metabolic pathways using the KEGG database (IUBMB 1992; Bono et al. 1998; Kanehisa and Goto 2000). To identify cases where S. stercoralis homologs in C. elegans have been surveyed for knockout phenotype using RNAi, Wormpep BLAST matches were cross-referenced to a list of all 7212 available C. elegans RNAi experiments (6107 genes; 5/5/2002; www.wormbase.org). For each S. stercoralis cluster, only the highest-scoring C. elegans match was considered. Supplemental Data The following files are available online: Table 1, most conserved nematode genes between S. stercoralis and C. elegans; Table 2, complete list of C. elegans RNAi phenotypes for genes with S. stercoralis homologs; Table 3, classification of C. elegans RNAi phenotypes for genes with S. stercoralis homologs; Table 4, S. stercoralis Gene Ontology mappings: (A) biological process, (B) cellular component, (C) molecular function; Table 5, comparison of Gene Ontology mappings among nematode species. Figure 1 Acknowledgments S. stercoralis EST sequencing at Washington University was supported by NIH-NIAID research grant AI 46593 to R.W. J.M. was supported by a Helen Hay Whitney/Merck Fellowship. We thank all members of the GSC's EST group, Barry Shortt for assistance with statistics, and the reviewers for their improvements to the manuscript. J.P.M. and B.C. are employees and equity holders of Divergence Inc; this research was not company funded. The publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked “advertisement” in accordance with 18 USC section 1734 solely to indicate this fact. Notes Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.1524804. Footnotes [Supplemental material is available online at www.genome.org. EST sequences are available from GenBank, EMBL, and DDJB under the accession numbers AW495499–AW496706, AW587864–AW588186, AW588989–AW589121, BE028808–BE030358, BE223–115–BE224723, BE579014–BE582028, BF014868–BF015393, BG224323–BG227958, and BI772815–BI773227. The sequences are also available at www.nematode.net.] References
WEB SITE REFERENCES
|
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
|||||||||||||||||||||||||
Rev Infect Dis. 1981 May-Jun; 3(3):397-407.
[Rev Infect Dis. 1981]Am J Clin Pathol. 1988 Mar; 89(3):391-4.
[Am J Clin Pathol. 1988]Endoscopy. 1994 Feb; 26(2):272.
[Endoscopy. 1994]Exp Parasitol. 2002 Feb; 100(2):112-20.
[Exp Parasitol. 2002]Clin Infect Dis. 2001 Oct 1; 33(7):1040-7.
[Clin Infect Dis. 2001]Science. 1998 Dec 11; 282(5396):2012-8.
[Science. 1998]Science. 2000 Oct 27; 290(5492):809-12.
[Science. 2000]Genome Res. 2001 Aug; 11(8):1346-52.
[Genome Res. 2001]Science. 2001 Sep 14; 293(5537):2087-92.
[Science. 2001]Nature. 2003 Jan 16; 421(6920):231-7.
[Nature. 2003]Science. 1998 Dec 11; 282(5396):2012-8.
[Science. 1998]Nature. 1998 Mar 5; 392(6671):71-5.
[Nature. 1998]J Cell Biol. 1994 Oct; 127(1):79-93.
[J Cell Biol. 1994]J Cell Biol. 1999 Aug 9; 146(3):659-72.
[J Cell Biol. 1999]Biochemistry. 1991 Feb 26; 30(8):2195-203.
[Biochemistry. 1991]Genome Biol. 2003; 4(6):R39.
[Genome Biol. 2003]Genome Res. 1997 Oct; 7(10):986-95.
[Genome Res. 1997]J Biol Chem. 1996 Mar 22; 271(12):6672-8.
[J Biol Chem. 1996]Parasite Immunol. 1999 Sep; 21(9):439-50.
[Parasite Immunol. 1999]Infect Immun. 1994 Sep; 62(9):3696-704.
[Infect Immun. 1994]Mol Biochem Parasitol. 1992 Dec; 56(2):269-77.
[Mol Biochem Parasitol. 1992]Curr Opin Cell Biol. 1993 Oct; 5(5):883-90.
[Curr Opin Cell Biol. 1993]Mol Biochem Parasitol. 1989 Jan 15; 32(2-3):229-46.
[Mol Biochem Parasitol. 1989]Acta Trop. 1990 Jul; 47(5-6):373-80.
[Acta Trop. 1990]J Parasitol. 1992 Feb; 78(1):1-15.
[J Parasitol. 1992]Mol Biol Cell. 1993 Aug; 4(8):803-17.
[Mol Biol Cell. 1993]Nat Genet. 2001 Jun; 28(2):160-4.
[Nat Genet. 2001]Genome Res. 1997 Oct; 7(10):986-95.
[Genome Res. 1997]Int J Parasitol. 2001 Apr; 31(4):377-83.
[Int J Parasitol. 2001]Genome Res. 2001 Aug; 11(8):1346-52.
[Genome Res. 2001]Genome Res. 1997 Oct; 7(10):986-95.
[Genome Res. 1997]Genome Res. 2001 Aug; 11(8):1346-52.
[Genome Res. 2001]Mol Biol Evol. 2002 May; 19(5):728-35.
[Mol Biol Evol. 2002]Nature. 2003 Jan 16; 421(6920):231-7.
[Nature. 2003]Genome Res. 2001 Aug; 11(8):1346-52.
[Genome Res. 2001]Cell. 1990 May 18; 61(4):635-45.
[Cell. 1990]Nature. 1993 Oct 14; 365(6447):644-9.
[Nature. 1993]Genetics. 1995 Apr; 139(4):1567-83.
[Genetics. 1995]Science. 1997 Nov 14; 278(5341):1319-22.
[Science. 1997]Nature. 1998 Feb 19; 391(6669):806-11.
[Nature. 1998]Nature. 2000 Nov 16; 408(6810):325-30.
[Nature. 2000]Nature. 2000 Nov 16; 408(6810):331-6.
[Nature. 2000]Curr Biol. 2001 Feb 6; 11(3):171-6.
[Curr Biol. 2001]Nature. 2003 Jan 16; 421(6920):231-7.
[Nature. 2003]Nature. 2000 Nov 16; 408(6810):325-30.
[Nature. 2000]Curr Biol. 2001 Feb 6; 11(3):171-6.
[Curr Biol. 2001]Nucleic Acids Res. 2001 Jan 1; 29(1):159-64.
[Nucleic Acids Res. 2001]Int J Parasitol. 2000 Apr 10; 30(4):347-55.
[Int J Parasitol. 2000]Science. 2000 Oct 27; 290(5492):809-12.
[Science. 2000]Genome Res. 2001 Aug; 11(8):1346-52.
[Genome Res. 2001]Cold Spring Harb Symp Quant Biol. 1997; 62():353-9.
[Cold Spring Harb Symp Quant Biol. 1997]Int J Parasitol. 2002 Nov; 32(12):1507-17.
[Int J Parasitol. 2002]Parasitol Today. 1999 Jun; 15(6):231-5.
[Parasitol Today. 1999]Mol Biochem Parasitol. 2002 Feb; 119(2):279-84.
[Mol Biochem Parasitol. 2002]Am J Trop Med Hyg. 1984 May; 33(3):431-43.
[Am J Trop Med Hyg. 1984]Curr Opin Biotechnol. 1994 Feb; 5(1):34-9.
[Curr Opin Biotechnol. 1994]Genome Res. 1996 Sep; 6(9):807-28.
[Genome Res. 1996]Nucleic Acids Res. 1999 Dec 15; 27(24):e37.
[Nucleic Acids Res. 1999]Genome Res. 1998 Mar; 8(3):175-85.
[Genome Res. 1998]Nat Genet. 2000 Jun; 25(2):232-4.
[Nat Genet. 2000]J Mol Biol. 1990 Oct 5; 215(3):403-10.
[J Mol Biol. 1990]Nucleic Acids Res. 2001 Jan 1; 29(1):11-6.
[Nucleic Acids Res. 2001]Genome Res. 2001 Aug; 11(8):1346-52.
[Genome Res. 2001]Genome Res. 1997 Oct; 7(10):986-95.
[Genome Res. 1997]Nucleic Acids Res. 2001 Jan 1; 29(1):37-40.
[Nucleic Acids Res. 2001]Bioinformatics. 2001 Sep; 17(9):847-8.
[Bioinformatics. 2001]Nat Genet. 2000 May; 25(1):25-9.
[Nat Genet. 2000]Genome Res. 1998 Mar; 8(3):203-10.
[Genome Res. 1998]Nucleic Acids Res. 2000 Jan 1; 28(1):27-30.
[Nucleic Acids Res. 2000]Genome Res. 2001 Aug; 11(8):1346-52.
[Genome Res. 2001]