![]() | ![]() |
Formats:
|
||||||||||||||||||||||||
Copyright © 2002, Cold Spring Harbor Laboratory Press Annotated Expressed Sequence Tags and cDNA Microarrays for Studies of Brain and Behavior in the Honey Bee 1Department of Entomology and Neuroscience Program, University of Illinois, Urbana, Illinois 61801, USA; 2Departments of Pediatrics and Biochemistry, University of Iowa, Iowa City, Iowa 52242, USA; 3W.M. Keck Center for Comparative and Functional Genomics, University of Illinois, Urbana, Illinois 61801, USA 4Corresponding author. Received January 2, 2002; Accepted February 14, 2002. This article has been cited by other articles in PMC.Abstract To accelerate the molecular analysis of behavior in the honey bee (Apis mellifera), we created expressed sequence tag (EST) and cDNA microarray resources for the bee brain. Over 20,000 cDNA clones were partially sequenced from a normalized (and subsequently subtracted) library generated from adult A. mellifera brains. These sequences were processed to identify 15,311 high-quality ESTs representing 8912 putative transcripts. Putative transcripts were functionally annotated (using the Gene Ontology classification system) based on matching gene sequences in Drosophila melanogaster. The brain ESTs represent a broad range of molecular functions and biological processes, with neurobiological classifications particularly well represented. Roughly half of Drosophila genes currently implicated in synaptic transmission and/or behavior are represented in the Apis EST set. Of Apis sequences with open reading frames of at least 450 bp, 24% are highly diverged with no matches to known protein sequences. Additionally, over 100 Apis transcript sequences conserved with other organisms appear to have been lost from the Drosophila genome. DNA microarrays were fabricated with over 7000 EST cDNA clones putatively representing different transcripts. Using probe derived from single bee brain mRNA, microarrays detected gene expression for 90% of Apis cDNAs two standard deviations greater than exogenous control cDNAs. [The sequence data described in this paper have been submitted to Genbank data library under accession nos. BI502708–BI517278. The sequences are also available at http://titan.biotec.uiuc.edu/bee/honeybee_project.htm.] The honey bee (Apis mellifera) is an important model for studies of neural and behavioral plasticity, particularly with respect to social behavior, learning, and memory (Fahrbach and Robinson 1995; Robinson 1998; Menzel 2001; Maleszka et al. 2000). The neuroanatomy, neurophysiology, and neurochemistry of the honey bee brain have been studied extensively, and several functions have been mapped to particular brain regions (e.g., Menzel 2001; Fahrbach and Robinson 1995). Honey bees also have been used extensively to study the genetic underpinnings of behavior (Rothenbuhler 1967; Page and Robinson 1991). In the past few years, these lines of inquiry have been extended to the discovery of quantitative trait loci (Hunt et al. 1995, 1998) and analyses of expression levels of genes in the brain (Kucharski et al. 1998, 2000; Fiala et al. 1999; Toma et al. 2000; Shapira et al. 2001; Kucharski and Maleszka 2002). One strong advantage of working with honey bees is that it is possible to study behavior under both laboratory and natural conditions. The natural social life of honey bees, though arguably as complex as in many vertebrate societies, can be extensively manipulated with precision. Insights gained from both lab and field studies ultimately will enable information on genes influencing neural and behavioral plasticity to be interpreted from ecological and evolutionary perspectives, contributing to a more comprehensive understanding of genes, brain, and behavior (Robinson 1999). Molecular analyses in the honey bee have been constrained by the high investment required to identify and clone individual genes and the need to have an a priori hypothesis about each gene. The public databases contained only about 101 complete or near-complete A. mellifera gene sequences (nonredundant entries in SWISS-PROT and TrEMBL, as of December 2001) and, prior to this study, a total of 800 nucleotide sequences, most of them expressed sequence tags (ESTs) from antennae (H.M.R., unpubl.) or larvae (Evans and Wheeler 2001). The value of studying many genes simultaneously in the honey bee was demonstrated by Evans and Wheeler (2001) who identified gene expression profiles that were characteristic for worker/queen caste differentiation. This study involved the initial identification of 158 candidate clones using subtractive methods, and was thus limited by the small number of genes analyzed. Current DNA microarray technologies allow expression studies of many thousands of genes at the same time (Schena et al. 1995; DeRisi et al. 1997). ESTs provide an economical approach to identifying large numbers of genes that can be used in gene expression and other genomic studies (reviewed by Gerhold and Caskey 1996; see also Dimopoulos et al. 2000 and Porcel et al. 2000). Here, we describe a collection of more than 20,000 ESTs generated from the A. mellifera brain, putatively representing 8912 different transcripts after sequence assembly. To facilitate gene identification and functional genomic studies in the honey bee, the brain EST set has been annotated using the structured vocabulary provided by the Gene Ontology Consortium (2001), based on molecular studies of gene function in Drosophila melanogaster. We describe a DNA microarray resource composed of over 7000 EST cDNA clones putatively representing different transcripts. We demonstrate the utility of this resource by reporting on gene expression measured in single honey bee brains. Additionally, comparative genomics approaches were used to predict or improve predictions for 122 genes in Drosophila, as well as to identify 126 genes conserved between Apis and other organisms that apparently have been lost from the Drosophila genome. RESULTS AND DISCUSSION Generation and Assembly of Brain ESTsA normalized, unidirectional cDNA library was generated from dissected honey bee brains. An initial 7968 clones were sequenced from the 5′ end. The library was then subtracted, and 12,288 more clones were sequenced (also from the 5′ end). An additional 1152 sequences (3′ and duplicate 5′ ends) were obtained from previously sequenced clones. Thus, the EST set represents 20,256 cDNA clones and 21,408 total sequences. The 21,408 sequences were trimmed of vector and low-quality sequence and filtered for minimum length (200 bp), identifying 15,311 high-quality ESTs of 494 bp average length (Table 1). The estimated number of ESTs per putative transcript was initially 1.2 when sequencing was initiated and rose to 1.7 at the time sequencing was terminated (based on phrap analyses of high-quality ESTs after each batch of sequences; see below).
The 15,311 high-quality ESTs were analyzed with the CAP3 assembly program to identify those that represent redundant transcripts (Table 2; see Table 8 for all program references). A total of 9481 ESTs were assembled into 3136 contiguous sequences (contigs). The remaining 5830 ESTs did not assemble into contigs (referred to as singlets). Thus, the combined set of contigs and singlets included 8966 sequences (hereafter referred to as “assembled sequences”), putatively representing different transcripts. Only 40 contig sequences contained more than 10 ESTs, and the largest number of ESTs assembled into one contig was 44.
We separately processed the high-quality ESTs using PHRAP and CAP3 using different levels of stringency (Table 2). These different assemblies produced very similar results, and we retained the CAP3 results for further analyses. Fifty-four assembled sequences were removed from the database (sequencing artifacts and/or exogenous contaminants; see Methods), leaving 8912 assembled sequences used in subsequent analyses. EST Quality Analysis and Sequence SurveyOf the 8912 assembled sequences, 3501 (39%) were similar to known protein sequences in the Non-Redundant Protein (nr) database (BLASTX; E ≤10−5). To estimate the proportion of transcript sequences that represent truly novel genes, the assembled sequences were screened to identify only those with clear protein coding capacity. A total of 3449 assembled sequences have an open reading frame (ORF) of at least 450 bp. Of these, 2616 (76%) had matches in the nr database and 833 (24%) had no matches (Fig. (Fig.1A).1
ESTs were analyzed to identify a variety of other possible artifacts (see Methods). We estimated that 10% of the clones in the library are at least partially unspliced (often resulting from priming of the oligo(dT) primer within an unspliced AT-rich intron). Approximately 18% of the cDNA clones appear to be inserted in a reverse orientation. Finally, a single chimeric clone was identified that contained linker sequence within an EST flanked by back-to-back poly(A)+ sequences. No chimeras were identified by comparing BLASTX matches for 3′ and 5′ ESTs corresponding to the same cDNA clones (68 clones with 3′ and 5′ BLASTX matches were tested). Figure Figure1B1 Separate BLASTX searches of Arthropoda and Chordata protein databases revealed that the majority of assembled sequences with matches (80%) were similar to predicted protein sequences from both Arthropoda and Chordata (Fig. (Fig.1C).1 The assembled EST database was searched for simple sequence repeats using BLASTN and a database of simple sequence repeats of one to four bases (excluding (A)n repeat). This search identified simple sequence repeats in 767 of the assembled sequences using a highest scoring pair (HSP) cutoff value of 50, and 76 sequences using an HSP cutoff value of 100. These HSP cutoff values roughly correspond to 25 and 50 bp of perfect match, respectively (note that identified repeats are not necessarily contiguous because default BLAST parameters allow gaps in alignment). Repeat sequences are likely to reside primarily in EST noncoding sequence (which constitute a large fraction of the ESTs, see above). Gene Number EST assembly is expected to generate an overestimation of the actual number of genes represented, as failure of ESTs to assemble can result from nonoverlapping ESTs, alternate splicing, sequence polymorphism, and sequencing errors. Assuming approximately one-to-one correspondence between genes in Apis and Drosophila, the level of redundancy can be estimated based on BLASTX searches of Drosophila predicted proteins. A total of 3362 Apis assembled sequences had “best hits” to 2672 different Drosophila sequences, suggesting 19.6% redundancy in the Apis assembled sequence set. Similar levels of redundancy after EST assembly have been estimated in other large EST collections (e.g., roughly 20% in a large mouse cDNA set; see Kawai et al. 2001). Taking 20% as an estimate of redundancy in the 8912 assembled Apis sequences, the EST set may represent a total of 7100 genes expressed in the honey bee brain. If Apis has about the same number of genes as does Drosophila, this would represent roughly 50% of the total number of genes in the Apis genome. A similar estimate of representation was provided by comparison of the 8912 assembled sequences with a set of 101 full- or near-full–length cDNA sequences obtained from an independent honey bee brain library (sequences kindly provided by R. Maleszka). A total of 55 assembled sequences from the EST set matched 54 different cDNA sequences from the independent brain library (match defined as ≥98% nucleotide identity over 200 bp). This result suggested that (based on this small sample set of 101 brain expressed cDNA sequences) the chance of finding a gene in the EST set was about 54%. Functional Annotation of Bee Brain ESTsWe characterized the A. mellifera EST sequences with respect to functionally annotated genes in Drosophila melanogaster, taking advantage of the fact that this insect genome has been sequenced and extensively annotated (Adams et al. 2000). Each Apis assembled sequence was tentatively assigned Gene Ontology (GO) classification based on annotation of the single “best hit” match in BLASTX searches of Drosophila predicted proteins (E ≤10−5). Functional assignments of Apis ESTs described here are at the “inferred from electronic annotation” (IEA) level of evidence (see The Gene Ontology Consortium 2001). We take a conservative approach and avoid using Drosophila annotations that are, themselves, assigned at the IEA level of evidence. We do not exclude Drosophila annotations that are assigned at the “inferred from sequence similarity” (ISS) level of evidence (which requires human judgment and is therefore a higher level of evidence than IEA). Tables 3 and 4 summarize assignments of Apis sequences to major molecular functions and biological processes, respectively. A broad range of functions and processes are represented in the brain ESTs. Table 5 lists Apis sequences that match Drosophila genes implicated in synaptic transmission (GO:0007268). Fifty-four (out of 116) Drosophila genes implicated in synaptic transmission were “best hit” for at least one Apis-assembled sequence. Table 6 lists Apis sequences that match Drosophila genes implicated in behavior. Note that current GO annotation for Drosophila includes only 42 genes implicated in behavior (as of December 2001). To provide information for comparative analysis, we generated a list of 106 genes directly implicated in behavior based on mutant analysis and/or transgenic experiments in Drosophila (compiled from FlyBase and J. Hall, pers. comm.). Genes were listed if at least one mutant allele or transgene affected a specific aspect of behavior, such as rhythmicity, mating, feeding, or learning and memory. (Global locomotor effects such as paralysis, uncoordinated movement, or shaking were not considered in this analysis, although many of the genes listed do exhibit global locomotor or lethal phenotypes when mutated to the null state.) Using this criteria, 47 (out of 106) Drosophila behavior genes were “best hit” for at least one Apis-assembled sequence. Annotation of Apis EST sequences with respect to all GO terms for molecular function, biological process, and cellular component are regularly updated and can be accessed at http://titan.biotec.uiuc.edu/bee/honeybee_project.htm.
We expect that ongoing improvements in GO annotation for Drosophila, human, mouse, and Caenorhabditis elegans will lead to significant improvements in Apis gene annotation in the near future. The current annotation of Apis sequences, based solely on matches to Drosophila proteins, allowed useful comparative analyses but had several drawbacks. We often found Apis sequences that clearly encoded members of important gene families of known function, but nevertheless were not annotated. In every case examined, this occurred because the “best hit” gene in Drosophila was not yet assigned GO annotation. Conversely, Apis sequences sometimes were assigned function based on fairly weak matches (i.e., close to the E-value cutoff of 10−5), resulting from the short length of the Apis EST. Annotation also was limited by a high proportion of ESTs in this project that contain transcript noncoding sequence (e.g., 3′ UTR). Additional ESTs, especially from full-length, enriched, normalized, and subtracted libraries (e.g., Carninci et al. 2000), would enhance Apis gene annotation by allowing more ESTs to be assembled into larger contig sequences. Honey Bee Brain MicroarrayTo allow functional genomic studies of brain and behavior in the honey bee, we generated cDNA microarrays from the annotated EST set described above. A total of 7329 cDNAs (putatively representing different transcripts) were successfully amplified as “single-band” PCR product and spotted on the microarray. Pilot studies indicated that fluorescent probe derived from single-brain mRNA (amplified by in vitro transcription; see Methods) could be used to label the vast majority of Apis cDNA spots on the microarray. Data obtained from one microarray experiment are presented in Table 7 and Figure Figure2.2 = 0.9926) indicating that technical variation (from RNA isolation, mRNA amplification by in vitro transcription, and fluorescent labeling of probe) is very low. Results from additional microarrays were qualitatively similar using different bee brains as source material (data not shown). These results indicate that genomic scale gene expression profiling is feasible in single honey bee brains using the microarrays and protocols described here.
Microarray hybridization data have been used for the validation of gene sequences (e.g., Andrews et al. 2000; Shoemaker et al. 2001). The results presented above indicate that the vast majority of bee ESTs were derived from legitimate brain-expressed gene transcripts. Comparative Genomics in Apis and DrosophilaA total of 823 of the assembled sequences (24% of those with matches) were most similar to protein sequence from Chordata (Fig. (Fig.1B).1 Of the 701 remaining cases where the best match for the Apis sequence was to Chordata, 574 (16% of Apis-assembled sequences with matches) had likely orthologs in Drosophila, but these Drosophila genes were so diverged that better matches for the Apis sequences were identified in human, mouse and/or other non-Arthropoda. In 126 cases (3.6% of Apis assembled sequences with matches), the Apis sequence had significant and clear matches to proteins from human, mouse and/or other organisms, but no plausible ortholog was identified in searches of Drosophila-predicted protein, genome, or EST databases. These Apis sequences appear to define genes that have been lost from the Drosophila genome. Detailed analysis of these highly diverged genes and gene loss events in Drosophila will be presented in a subsequent manuscript. Future Prospects The relationship between genes and behavior is complex and is only beginning to be understood. Honey bees exhibit a wide variety of behavioral phenomena that are not observed in Drosophila, such as kin recognition, complex communication via the dance language, socially regulated division of labor, and a larger variety of forms of learning. The honey bee also is haplodiploid and has the highest known recombination rate of any animal (Hunt and Page 1995), traits that can facilitate genetic analyses of behavior. A wide range of naturally variable behavior traits has been described in honey bees, including defensive behavior (Hunt et al. 1998), foraging preferences (Hunt et al. 1995), and differences in socially regulated division of labor (Robinson 1992; see also Brillet et al. 2001). A comprehensive, web-based atlas of the bee brain currently in development (see http://www.neurobiologie.fu-berlin.de/Menzel.html) also will be helpful in providing a stronger neurobiological foundation for the study of genes and behavior in the honey bee. Early efforts to develop transgenic bees (Omholt et al. 1995; Ronglin et al. 1997; K. Robinson et al. 2000) suggest that there are no barriers to harnessing this technology. The work described here provides additional resources that should contribute to molecular analyses of honey bee behavior, using candidate gene studies, positional cloning, and functional genomic approaches. METHODS Bees Approximately 600 adult workers were collected from a typical field colony at the University of Illinois Bee Research Facility. The colony had about 40,000 adult bees and was derived from a naturally mated queen. The bees in this area are a mixture of various races of European honey bees, predominantly Apis mellifera ligustica (Pellett 1938). Bees were collected when they were 1, 5, 10, 15, 20, 25, and 30 days old, which spans the typical lifespan during the active season (Winston 1987). This collection scheme ensured a broad representation of behavioral states, because bees specialize on different tasks at different ages (Robinson 1992). To obtain bees of known age, frames of pupae were removed from the colony and placed in an incubator (33°C). About 3500 one-day-old bees were marked with a spot of paint (Testor's Pla) on the thorax and then returned to their natal colony. We supplemented these age-based collections with samples of bees taking preforaging orientation flights (Capaldi et al. 2000) and foragers returning with either pollen or nectar loads. Collections were made both in the early morning and late in the afternoon. Bees were collected directly into liquid nitrogen (Toma et al. 2000) to minimize the possible effects of collection on gene expression. Brains were dissected on dry ice. Brain cDNA Libraries Total RNA was isolated from 400 bee brains (ca. 500 μg) with Rneasy total RNA isolation kit (Qiagen) followed by treatment with Dnase (1 unit RQ1 Dnase; Promega). Poly(A)+ RNA was purified and cDNA was synthesized and directionally cloned into NotI and EcoRI digested pT7T3-Pac phagemid vector as in Bonaldo et al. (1996). cDNA inserts are flanked by linker sequences 5′-NotI-GTTGC-3′ (library specific, 3′ linker) and 5′-EcoRI-GGCACGAGG-3′ (5′ linker). The library was normalized and (subsequently) subtracted as in Bonaldo et al. (1996). Sequencing and Sequence AnalysisPlasmid DNA was extracted and sequenced using ABI 377 and 3700 sequencers. The sequencing primer used was 5′-AGCGGATAACAATTTCACACACAGGA-3′. Base-calling was performed with phred (see Table 8 for all programs and databases used). Vector sequences were trimmed using Cross-match. Low-quality bases (quality score <20) were trimmed from both ends of sequences using Qualtrim and Simpletrim. Those ESTs having a length of more than 200 bp after both vector and quality trimming were considered “high-quality” ESTs. The repeat sequences in these ESTs then were masked by RepeatMasker program using Drosophila repeat sequences as reference. The masked sequences were further screened for bacterial chromosomal DNA, RNA, insect viral DNA, rRNA, and mitochondrial DNA using BLASTN. Further screens for possible contaminants were conducted by BLASTN searches of the Non-Redundant Nucleotide Sequences (nt), EST_human, EST_mouse, and EST_others databases. Eighty-one ESTs were removed that corresponded to clear contaminants likely derived from other library and/or sequencing projects (from mouse or rat [49], cattle [9], human [6], pig [2], undetermined vertebrate [2], and various non-Escherichia coli bacteria [9]). No other ESTs were found to be ≥90% identical (over any 100 bp span) to nucleotide sequence from any non-Apis species, suggesting that the EST set did not include contamination from Drosophila or other sources not identified here. An additional 101 ESTs were removed as informatic artifacts (e.g., sequencing lanes that should not have produced sequence). Some EST screening was conducted after assembly, resulting in 54 contig sequences that were composed of contaminant or artifact ESTs. These 54 sequences were removed from the “assembled sequence” database and did not affect analyses presented here. ESTs were analyzed to identify chimeric, backward, or unspliced inserts. Chimeric clones could be indicated by back-to-back poly(A)+ tails or vector linker sequences within ESTs. BLASTN searches for these instances identified only one chimera (out of all 21,408 ESTs). In this instance the 3′ linker sequence was found in the middle of an EST, flanked by back-to-back poly(A)+ tails from two different transcripts. Furthermore, in all cases where 3′ ESTs had BLASTX matches (E ≤10−20) to a Drosophila predicted protein (68 cases), 5′ ESTs from the same cDNA matched the same Drosophila protein. To estimate the total number of backward cDNA inserts, singlet ESTs with BLASTX matches to Drosophila-predicted proteins were analyzed. Out of 1919 singlet EST matches, 364 (19%) had a negative reading frame, indicating a backward cDNA insert. Of 720 individually analyzed ESTs with BLASTX matches to proteins from other organisms, 72 (10%) had clear instances of unspliced intron sequence (based on alignment with putative orthologs, ORF analysis, and identification of putative splice junctions); many of these clones appear to have resulted from priming of the oligo(dT) primer within an unspliced AT-rich intron. ESTs were assembled using CAP3 and phrap (see Table 2 for settings). ORFs were identified using FLIP with the minimum length set to 150 amino acids (450 bp). All BLAST searches were conducted on a desktop PC or local server using stand-alone BLAST software and sequence databases indicated in Table 8. All E-value cutoffs were 10−5, except where indicated otherwise. GO databases were installed on a local server. A GO browser was designed and implemented at the W.M. Keck Center for Comparative and Functional Genomics (University of Illinois at Urbana-Champaign) and used for functional annotation of the assembled EST sequences. Microarray Fabrication A single EST cDNA clone was selected to represent each assembled sequence (putatively unique transcript). For contigs with multiple ESTs, the rule followed was to select the 3′-most EST that had at least 300 bp of high-quality sequence. This procedure biases the cDNAs on the microarray toward the 3′ end but ensures that at least 300 bp of cDNA is spotted on the array. A total of 8872 cDNA clones were selected. These clones were picked from the library stock plates (384-well bacteria clones) and rearrayed to a new set of 384-well plates. These clones were grown overnight followed by sequence verification (see Clone Tracking, below). Creation of the microarrays was essentially as described by Brown and Botstein (1999). Bacteria clones were inoculated to 96-well plates with LB and Amp and grown overnight. Plasmid inserts were amplified by PCR using 1 μL of the overnight bacteria inoculant and modified M13 (5′-CCAGTCACGACGTTGTAAAACGAC-3′) and M13 reverse (5′-GTGTGGAATTGTGAGCGGATAACAA-3′) primers in 50 μL volume reactions. Amplifications were performed in a MJ PTC-200 thermocycler (MJ Research). PCR reaction mixes contained 5 μL 10x reaction buffer (100 mM Tris-HCl, pH 8.3, 500 mM KCl), 2.0 mM MgCl2, 100 μM dNTPs, 0.2 μM each primer, and 1U Amplitaq Gold (Perkin Elmer). An initial 9-min denaturation was followed by 35 cycles of 40 sec denaturation at 94°C, 40 sec annealing at 65°C, and 3.5 min elongation at 72°C. The reaction ended with an additional incubation of 5 min at 72°C. Products were cleaned using Sephadex G-50 columns. Five microliters of each clean PCR product was analyzed on a 1% agarose gel. cDNA amplification products were visually examined and subjectively classified as follows: “strong single band” (86%), “weak or absent band” (13%), or “multiple bands” (1%). Only cDNAs that were amplified as “single strong band” and successfully spotted on the array (see below) were used in subsequent data analysis (7329 total). PCR products were dried and resuspended in 8 μL 3x SSC, 1.5 M betaine. Betaine was used as in Diehl et al. (2001) to improve spot homogeneity and to increase hybridization signal on the microarray. All cDNAs were printed as single spots on Telechem Superamine slides (Arrayit) using a Cartesian Technologies spotter. Exogenous control cDNAs derived from cattle (phosphoglycerate kinase 1 and β-2-microglobulin) and soy (rubisco small chain 1 and chlorophyll ab binding protein) were spotted on the array 16 times each, such that they were represented on each of the 16 subgrids on the microarray (“exogenous controls 1–4”, respectively, in Table 7). An additional 43 vertebrate-derived cDNAs (singly spotted at random positions throughout the microarray) were used as control spots (“exogenous controls 5–48” in Table 7). Spot and printing quality were assessed visually after printing. cDNA spots do not fully evaporate after arraying (as a result of 1.5 M betaine) allowing inspection of spot morphology under a dissecting scope. A few slides (about one in every five) exhibited minor defects (e.g., a single spot missing or several spots damaged by dust or lint particles). The majority of slides exhibited no defects (no spots missing, no spots joined, and all spots uniform in size). DNA was crosslinked to slides by baking at 80°C for 1 h. Slides were blocked in 0.2% SDS for 4 min, followed by two washes in water. Slides were denatured in boiling water for 2 min, spun dry, and stored. Microarray Hybridization, Scanning, and Data AnalysesFrozen brains were dissected from bees of known age and behavioral state as above. mRNA was amplified exactly as in Baugh et al. (2001), using only one round of in vitro transcription. Amplified RNA (aRNA) was analyzed by spectrophotometer and gel electrophoresis. Negative control reactions (no template and genomic DNA only) conducted in parallel produced no aRNA. aRNA was labeled by reverse transcription as follows: 5 μg of aRNA was mixed with 5 μg of random primer (Roche) (10 μL volume), denatured at 70°C for 4 min, and placed on ice. Labeling reaction (6 μL of 5x 1st Strand Buffer [Gibco]; 3 μL of 100 mM DTT; 6 μL of low T dNTPs [2.5 mM each dATP, dCTP, dGTP and 1.0 mM dTTP] (Sigma), 3 μL of 1 mM Cy3– or Cy5-dUTP [Amersham Pharmacia] and 2 μL of 200 U/μL SuperScript II [Gibco]) was prepared on ice, mixed with aRNA and primer, then incubated at 42°C for 1 h. One microliter of SuperScript II was added and the reaction was incubated at 42°C for an additional hour. RNA was removed by adding 1 μL of 0.25 mg/mL RNAse A (NEB) and 0.5 μL of 2 U/μL RNAse H (Stratagene) and incubating at 37°C for 30 min. Labeled cDNA was purified using the Qiagen PCR Purification Kit. Thirty microliters of purified, labeled cDNA was mixed with blocking oligos dT-T7 (20 μg; see Baugh et al. (2001)) and dT30 (40 μg), boiled for 3 min, allowed to anneal at 60°C for 10 min and then room temperature for 10 min, mixed with an equal volume of 2x hybridization buffer (50% formamide, 10x SSC, and 0.2% SDS), and then hybridized to microarray at 42°C overnight. Excess probe was removed by a series of 4 min washes in 1x SSC, 0.2% SDS at 42°C; 0.1x SSC, 0.2% SDS at room temperature; and 0.1x SSC at room temperature. Slides were scanned using an Axon 4000B scanner, and images were analyzed with GenePix software. All data analyses were conducted using log-transformed values (median pixel intensities) generated by the GenePix software. Clone Tracking To identify and correct possible errors in clone tracking, 420 cDNA clones (of the initial set of 20,256) were resequenced from the stock bacterial 384-well plates. Two clones were selected from different positions from each 96-well quadrant (there are four quadrants per 384-well plate). These sequences were tested against existing EST sequences in the database. A PERL script was used to identify expected matches, possible lane-tracking errors, quadrant or plate swaps, or errors in quadrant or plate orientation. In the majority of cases, one or two sequences were obtained from each quadrant and matched expected database sequences, thus confirming tracking accuracy. In cases where a sequence was not obtained or did not match the expected sequence, two additional clones were grown and sequenced. Tracking errors affecting whole quadrants were indicated for 16 (of 212 total) quadrants, including quadrant swaps, duplicate sequencing of quadrants, and quadrants in which database sequences were in an upside-down orientation with respect to the actual clones. The exact nature of each quadrant error was determined (in all cases, the initial determination was confirmed by additional sequencing) and corresponding sequence entrees in the database were corrected to reflect their true plate positions. Lane-tracking errors (i.e., ABI 377 generated sequences that drift from one lane into a neighboring lane) were not observed. After rearraying the 8872 clones to be used for the microarray, an additional 192 cDNA clones were regrown and sequenced to verify tracking integrity (two clones were picked from each 96-well quadrant, as above). From these, 136 high-quality sequences were obtained and tested for identity with the expected EST. Only one sequence of the 136 tested did not match the expected EST, suggesting that clone tracking was close to 99% accurate at this stage. WEB SITE REFERENCES http://www.fruitfly.org; Berkeley Drosophila Genome Project (BDGP). http://www.genome.washington.edu/UWGC; University of Washington Genome Center. http://megasun.bch.umontreal.ca/ogmpproj.html; Organelle Genome Megasequencing Project, University of Montreal. http://www.ncbi.nim.nih.gov; National Center for Biotechnology Information (NCBI). Acknowledgments We thank L. Hood and D. Smoller for helpful discussions; A.J. Ross, S. O'Brien, and A. Cziko for bee collections; S. O'Brien for bee brain dissections; D. Toma for RNA extraction; M. Rebeiz for assistance with PERL programming; A. Cziko for assistance in microarray fabrication; and R. Hoskins, S. Clough, and members of the Robinson lab for reviewing the manuscript. Special thanks to H.A. Lewin, Director of the Keck Center, for excellent advice throughout the project and his tireless and creative efforts to facilitate genomics research on this campus. This research was supported by an NSF Postdoctoral Fellowship in Bioinformatics (C.W.W.) and grants from the University of Illinois Critical Research Initiatives Program and the Burroughs Wellcome Trust (G.E.R.). The publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked “advertisement” in accordance with 18 USC section 1734 solely to indicate this fact. Footnotes E-MAIL generobi/at/life.uiuc.edu; FAX (217) 244-3499. Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.5302. REFERENCES
|
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
|||||||||||||||||||||||
Learn Mem. 1995 Sep-Oct; 2(5):199-224.
[Learn Mem. 1995]Learn Mem. 2001 Mar-Apr; 8(2):53-62.
[Learn Mem. 2001]Behav Brain Res. 2000 Oct; 115(1):49-53.
[Behav Brain Res. 2000]Genetics. 1995 Dec; 141(4):1537-45.
[Genetics. 1995]Genetics. 1998 Mar; 148(3):1203-13.
[Genetics. 1998]Trends Ecol Evol. 1999 May; 14(5):202-205.
[Trends Ecol Evol. 1999]Genome Biol. 2001; 2(1):RESEARCH0001.
[Genome Biol. 2001]Science. 1995 Oct 20; 270(5235):467-70.
[Science. 1995]Science. 1997 Oct 24; 278(5338):680-6.
[Science. 1997]Bioessays. 1996 Dec; 18(12):973-81.
[Bioessays. 1996]Proc Natl Acad Sci U S A. 2000 Jun 6; 97(12):6619-24.
[Proc Natl Acad Sci U S A. 2000]Genome Res. 2001 Aug; 11(8):1425-33.
[Genome Res. 2001]Nature. 2001 Feb 8; 409(6821):685-90.
[Nature. 2001]Science. 2000 Mar 24; 287(5461):2185-95.
[Science. 2000]Genome Res. 2001 Aug; 11(8):1425-33.
[Genome Res. 2001]Genome Res. 2000 Oct; 10(10):1617-30.
[Genome Res. 2000]Genome Res. 2000 Dec; 10(12):2030-43.
[Genome Res. 2000]Nature. 2001 Feb 15; 409(6822):922-7.
[Nature. 2001]Genetics. 1995 Mar; 139(3):1371-82.
[Genetics. 1995]Genetics. 1998 Mar; 148(3):1203-13.
[Genetics. 1998]Genetics. 1995 Dec; 141(4):1537-45.
[Genetics. 1995]Annu Rev Entomol. 1992; 37():637-65.
[Annu Rev Entomol. 1992]Insect Mol Biol. 2000 Dec; 9(6):625-34.
[Insect Mol Biol. 2000]Annu Rev Entomol. 1992; 37():637-65.
[Annu Rev Entomol. 1992]Nature. 2000 Feb 3; 403(6769):537-40.
[Nature. 2000]Proc Natl Acad Sci U S A. 2000 Jun 6; 97(12):6914-9.
[Proc Natl Acad Sci U S A. 2000]Genome Res. 1996 Sep; 6(9):791-806.
[Genome Res. 1996]Nat Genet. 1999 Jan; 21(1 Suppl):33-7.
[Nat Genet. 1999]Nucleic Acids Res. 2001 Apr 1; 29(7):E38.
[Nucleic Acids Res. 2001]Nucleic Acids Res. 2001 Mar 1; 29(5):E29.
[Nucleic Acids Res. 2001]Nucleic Acids Res. 2001 Mar 1; 29(5):E29.
[Nucleic Acids Res. 2001]Genome Res. 2001 Aug; 11(8):1425-33.
[Genome Res. 2001]Genome Res. 2001 Aug; 11(8):1425-33.
[Genome Res. 2001]Genome Res. 1999 Sep; 9(9):868-77.
[Genome Res. 1999]Genome Res. 2001 Aug; 11(8):1425-33.
[Genome Res. 2001]