• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of genoresGenome ResearchCSHL PressJournal HomeSubscriptionseTOC AlertsBioSupplyNet
Genome Res. Jan 2008; 18(1): 172–177.
PMCID: PMC2134766

Gene expression profiling by massively parallel sequencing

Abstract

Massively parallel sequencing holds great promise for expression profiling, as it combines the high throughput of SAGE with the accuracy of EST sequencing. Nevertheless, until now only very limited information had been available on the suitability of the current technology to meet the requirements. Here, we evaluate the potential of 454 sequencing technology for expression profiling using Drosophila melanogaster. We show that short (< ~80 bp) and long (> ~300–400 bp) cDNA fragments are under-represented in 454 sequence reads. Nevertheless, sequencing of 3′ cDNA fragments generated by nebulization could be used to overcome the length bias of the 454 sequencing technology. Gene expression measurements generated by restriction analysis and nebulization for fragments within the 80- to 300-bp range showed correlations similar to those reported for replicated microarray experiments (0.83–0.91); 97% of the cDNA fragments could be unambiguously mapped to the genomic DNA, demonstrating the advantage of longer sequence reads. Our analyses suggest that the 454 technology has a large potential for expression profiling, and the high mapping accuracy indicates that it should be possible to compare expression profiles across species.

Gene expression technologies have greatly matured over the past years, but it has become clear that hybridization-based approaches have obvious limitations in cross-species comparisons (Gilad et al. 2005, 2006). Probably the most eminent problems are mismatches in heterologous probes and probe-specific hybridization kinetics, which complicate the design of species-specific oligonucleotide arrays. Alternatively, sequencing-based approaches could be used to measure gene expression if the sequence reads could be unambiguously mapped to the corresponding transcripts. While the short sequence reads of serial analysis of gene expression (SAGE) (Velculescu et al. 1995) and related techniques are severely limited by the requirement of a reliable genome annotation, the recently developed 454 sequencing technology (Margulies et al. 2005) may provide sufficient sequence information to overcome this limitation at moderate costs.

In this study, we evaluate the potential of 454 sequencing technology to serve as a reliable tool for expression profiling. We show that 454 sequencing technology has a biased representation of cDNA fragments with different length. However, in combination with random breakage of the cDNAs by nebulization, 454 sequencing provides an excellent tool for expression profiling. The high accuracy with which we could map the sequenced fragments onto the Drosophila melanogaster genome suggests that 454 sequencing has great potential for interspecific expression profiling.

Results

Conceptual design

Measuring gene expression by sequencing requires only that a proportion of the transcript be analyzed. We sequenced a 3′ region of the cDNA to avoid potential bias due to incomplete reverse transcription of the mRNAs. We used two different approaches to evaluate the potential of 454 sequencing for expression profiling. First, we generated well-defined 3′ cDNA fragments by restriction enzyme treatment (Fig. 1). Within the limitations of the available genome annotation, we could predict the expected 3′ cDNA fragments, as we used a D. melanogaster strain with a fully sequenced genome. This strategy allowed us to evaluate whether fragment size affected 454 sequencing efficiency. As a second strategy, we sequenced 3′ cDNA ends that were generated by random shearing of the cDNA (Fig. 1). Use of the same mRNA for both approaches allowed comparison between the two different strategies and thus a measurement for the reliability of 454 sequencing-based expression profiling.

Figure 1.
Overview of the methods used to generate 3′ cDNA fragments. Double-stranded cDNA was fragmented using one of the two different strategies: restriction enzyme treatment or nebulization (step 3). 3′ Fragments were recovered (step 4) and ...

Sequencing and mapping efficiency

The first prerequisite for a reliable measure of gene expression based on absolute counts is accurate identification of the transcript corresponding to the sequence read. Totals of 11,477 and 14,570 reads with average read lengths of 114 ± 20 and 116 ± 23 bp were collected from digested (DIG) and nebulized (NEB) samples, respectively. After raw data were processed to eliminate low-quality sequences and poly(A) tails, we obtained 11,437 (DIG) and 14,512 (NEB) high-quality short expressed sequence tags (ESTs) (Table 1).

Table 1.
Summary statistics for 454-ESTs sequencing and mapping to D. melanogaster genomes and annotated transcripts (release 5.1)

The 454-ESTs were mapped to transcripts or genomic DNA of D. melanogaster (release 5.1) using BLAST with highly stringent criteria (E < 10−10, >90% identity, >50% of read length included in the high-scoring segment pair [HSP]). About two-thirds of the ESTs could be unambiguously mapped to the database of protein-coding transcript: 97% were unambiguously mapped to the genome and 90% to annotated genes (Table 1). Thus, 7% of the ESTs were not mapped to annotated gene sequences in D. melanogaster collection despite having an unambiguous match in the genomic sequence. To test if this discrepancy could be explained by incomplete annotation of the transcripts in D. melanogaster (i.e., lack of 3′ UTR, or missing isoforms), we performed a BLAST search against portions of 3′ flanking sequences of 500 and 2000 bp. This improved the mapping efficiency to 95%, indicating that information on 3′ UTRs is still missing or incomplete even for the well-characterized D. melanogaster genome. Further support for incomplete 3′ UTR information is provided by the higher mapping efficiency of TaiI library, which harbored, on average, longer cDNA fragments and had a higher probability to overlap with the coding part of the transcript. Five percent of the ESTs that were not mapped to the transcript database could be located on intronic regions, suggesting the presence of new isoforms. We also found that 3%–6% of the hits to the transcript database consisted of antisense transcripts from 5%–6% of the genes sampled (Table 1).

Assessment of biases in transcript representation

Accurate measurement of gene expression with 454 sequencing technology requires an unbiased representation of the cDNA molecules, irrespective of length or sequence composition. As the expression intensity is not known, we designed a test that did not depend on the gene expression level. We used the sequenced D. melanogaster strain, hence it was possible to predict the restriction fragment length of every known transcript. We obtained an expected fragment-length distribution by an in silico restriction analysis of all annotated transcripts. To compare this expected distribution to the observed one, we considered every identified transcript only once. This procedure is expected to result in a good approximation of the fragment-length distribution for an unbiased sequencing procedure.

As 454 sequencing reads are frequently too short to cover the entire 3′cDNA fragment, we estimated the fragment length by matching the sequence read to the transcripts and determining the number of bases between the first base of the alignment and the 3′ end of the reference transcript. While the predicted 3′ cDNA fragment-length distribution differed among the three restriction enzymes used, we consistently observed a striking difference between the expected and observed distributions. For all enzymes tested, ESTs shorter than ~80 bp or longer than 300 bp were under-represented (Fig. 2). The under-representation of short fragments results in part from the filtering of the 454 sequencing software, which requires a minimum read length. Thus, these fragments are completely absent. The filtering, however, does not explain why fragments longer than the software threshold are under-represented. It is possible that this bias is produced during library preparation, but it is not entirely clear which step in the 454 sequencing procedure caused this effect. We can only speculate that short fragments are lost during enrichment of DNA capture beads carrying amplification products. Capture beads loaded with small fragments may not be recovered by the magnetic beads as effectively as those with longer fragments. The under-representation oflong fragments is probably caused by the inefficiency of the emulsion PCR for long PCR products.

Figure 2.
Under-representation of short and long 3′ cDNA fragments in 454 sequencing reads. The frequency distribution of 3′ cDNA fragment lengths obtained from in silico digestion of all D. melanogaster transcripts (release 5.1.) is shown in gray. ...

Nebulization success

The undesirable effect of this apparent size bias in the 454 sequencing could be overcome if every transcript had a similar distribution of fragment sizes. Thus, randomly breaking cDNA fragments should overcome the size bias, as it affects all transcripts similarly. Shearing of DNA fragments by high-pressure nitrogen (nebulization) is frequently used to produce short DNA fragments for sequencing (Surzycki 2000). For expression profiling, it is essential that this procedure work for different cDNAs with the same efficiency.

We tested for a potential effect of cDNA length on nebulization efficiency. As for the DIG library, we estimated the 3′ cDNA fragment size by extending the aligned 454 sequencing ESTs to the 3′ end of the transcript and compared the distribution of the inferred fragment sizes among different cDNA length classes. Despite covering a wide range of size classes, we found the mean size of the nebulized cDNA fragments to be very similar among cDNAs of different length (Fig. 3). We further scrutinized the nebulization pattern by analyzing highly expressed genes for which at least 30 sequences were available. Genes that are not spliced and that have similar transcript lengths were found to have similar fragment sizes (data not shown). This observation suggests that there is no apparent effect of the DNA sequence on the nebulization procedure, but more data are required to corroborate this. Nevertheless, even if nebulization were to cause some differences between cDNA fragments, they may not translate into a biased measurement of gene expression due to the relatively broad size range for which 454 sequencing quantitatively operates.

Figure 3.
Length distribution of 3′ cDNA fragments after nebulization among different size classes of full-length transcripts (as inferred from the available genome annotation). The bold line indicates the median. The lower hinge gives the 25% quantile, ...

Cross-method consistency

The above analyses suggested that 454 sequencing could be used for expression profiling when cDNAs are nebulized. To further validate this approach, we compared the results of the nebulized cDNAs to those obtained from cDNAs treated with restriction enzymes. When the expression levels of nebulized library were compared with the different digested libraries, the correlation coefficients ranged from 0.71 to 0.77 (Table 2). A recent study comparing the reproducibility for different microarray platforms reported intra-platform correlation coefficients ranging from 0.68 to 0.95 (Kuo et al. 2006). Thus, despite the apparent under-representation of short and long fragments in the digested libraries, the correlation coefficients fall within the range of correlation coefficients reported for microarrays. The correlation coefficients were markedly lower for the fragments longer than 300 bp (0.52–0.57), reflecting the under-representation of long fragments. Not only were transcripts represented by long fragments missed (Fig. 2), but read counts for long fragments were also extremely low (data not shown). If we limit our correlation analysis to those cDNAs, which result in fragments not suffering from an under-representation (80–300 bp), the correlation coefficients improved profoundly (0.83–0.91). Interestingly, we observed similar correlation coefficients for those cDNAs that resulted in fragments smaller than 80 bp (Table 2), suggesting that the purification step affected all cDNA fragments in this size class to a similar extent. Similar trends were observed when the size thresholds were varied by 10 or 20 bp (data not shown). As the nebulized library showed a high correlation coefficient with each of the three different restriction libraries, our results strongly indicate that the nebulization procedure is highly suitable to provide a reliable measurement of gene expression. Furthermore, the high correlation coefficients also suggest that 454 sequencing expression analysis is as reproducible as microarray experiments.

Table 2.
Consistency between libraries

Discussion

Despite the large potential of 454 sequencing technology for transcriptome analysis, so far only a limited number of approaches using this technique have been published. One approach involved a modification of the paired-end ditagging (PET) technique (Ng et al. 2006). In this technique, 5′ and 3′ signatures of ~20 bp of each full-length transcript are simultaneously extracted and covalently-linked into the paired-end ditag. In the second approach, DeepSAGE (Nielsen et al. 2006), 21 bp are sequenced from each transcript. Both approaches greatly benefit from the high-throughput of 454 sequencing technology, but they still require the cloning of cDNAs to generate the tags/paired ditags. Given that some cDNA sequences are potentially refractory to cloning, this cloning step could introduce a bias. Furthermore, both techniques require the presence of a NlaIII recognition site. An in silico digestion of the D. melanogaster transcripts indicated that 4% of the sequences did not have a NlaIII recognition site. These transcripts would be entirely missed in both methods. If one considers that cDNA synthesis of long transcripts is less effective, the number of under-represented transcripts increases even more. For example, 6% of the NlaIII recognition sites required for SAGE are found >800 bp away from the poly(A) tail. Even stronger is the effect of incomplete cDNA synthesis for the paired ditag method, as this requires full-length cDNA synthesis. Furthermore, the dependence on restriction enzymes makes both methods sensitive to intraspecific polymorphism, which could generate/destroy restriction sizes. At the very extreme, this may result in a loss of the transcript due to the absence of the NlaIII recognition site. Finally, 454 sequencing has a higher error rate than does Sanger sequencing, which results in a reduced tag-to-gene mapping efficiency of short transcripts.

In our proof-of-principle study using D. melanogaster, we showed that the sequencing of randomly sheared 3′ cDNA provides a very good alternative to the previously suggested approaches for expression profiling using 454 sequencing technology. While it would be also possible to sequence full-length cDNAs (Bainbridge et al. 2006; Emrich et al. 2007; Weber et al. 2007) rather than 3′ ends, we consider our approach more cost-effective, as it requires only a single read per transcript. Furthermore, no adjustment for transcript length needs to be made.

As in a previous report using another technique of massively parallel sequencing (Chen and Rattray 2006), we found a severe bias against long fragments, possibly due to the inefficiency of the PCR amplification of long fragments. Furthermore, we showed that shorter fragments are also strongly under-represented. Hence, it is absolutely mandatory that the length distribution of the cDNA fragments generated by shearing be similar among transcripts. We found that shearing of cDNA molecules using high-pressure nitrogen (nebulization) results in a very similar distribution of sheared fragments among cDNA size classes and concluded that the fragmentation of the cDNAs during nebulization did not introduce a major bias in the representation of transcripts. Interestingly, recent studies also assessed the randomness of nebulization and found more reads in the 5′ end of the transcript (Bainbridge et al. 2006; Emrich et al. 2007; Weber et al. 2007). Although this bias was not very strong, our focus on the 3′ ends of the transcripts has probably resulted in an even higher homogeneity of the size distribution of the cDNA fragments analyzed. Consistent with the absence of a bias introduced by the nebulization process, our comparison of transcription profiles generated by nebulization and restriction fragments resulted in correlation coefficients that are similar to those that have been observed in intraplatform comparisons of microarray performance (Kuo et al. 2006). Thus, we conclude that 454 sequencing-based expression profiling is highly reproducible and that no strong bias is introduced by nebulization.

One difference of the 454 sequencing technology to other massively parallel sequencing techniques is the generation of longer read lengths. We evaluated the effect of read length on the mapping efficiency by truncating the obtained 454 reads to 20, 50, and 100 bp. Short read lengths result in many HSPs with scores very similar to the best one. About 20% of the 20-bp fragments had at least two perfect matches in the D. melanogaster genome (Fig. 4), whereas 50- and 100-bp fragments had substantially increased mapping accuracies, resulting in only 3% and 0.5% ambiguously mapped fragments, respectively. Furthermore, the difference in bit scores between the best and second-best hits is much more pronounced for longer fragments. Hence, as expected, longer fragments result in a higher proportion of unambiguously mapped sequences. In the presence of sequence polymorphism, the mapping of short sequence reads to the corresponding genes becomes an even more challenging task and introduces considerable uncertainty. For well-annotated genomes, restricting the analysis to the transcriptome rather than the genome can compensate to some extent for the low mapping accuracy of shorter sequence reads. This is a widely used approach for SAGE analyses, but for poorly annotated genomes this strategy is not efficient. For example, a recent gene expression study in D. pseudoobscura could map only 27% of the SAGE tags to transcripts (Metta et al. 2006). Hence, while massively parallel sequencing holds enormous potential for cross-species comparison of gene expression, our analyses also showed that sufficient read length is essential to ensure reliable identification of the corresponding transcript. Furthermore, our analyses also showed that, even for a well-studied species, such as D. melanogaster, we could identify new isoforms, UTRs, and antisense transcripts. Due to ability to identify SNPs in transcripts, we anticipate that this method will be also extremely powerful to measure allele-specific gene expression.

Figure 4.
Cumulative distribution of the difference in BLAST bit scores of the best and second-best hits. The dashed, dotted, and solid lines show the cumulative distribution of 20, 50, and 100 bp, respectively. The plots are based on the 454 sequencing reads that ...

Methods

RNA isolation and cDNA synthesis

The D. melanogaster genome strain (y; cn bw sp) was obtained from the Tucson Stock Center (stock no. 14021-0231.36). Flies were grown at 20°C in standard cornmeal medium. Total RNA was extracted with TRIzol (Invitrogen) from 30 virgin females aged 3–7 d (three replicates of 10 flies each). RNA samples were treated with DNase I (10 units/50 μg of total RNA), and absence of contaminating genomic DNA was confirmed by PCR of two low-expressed genes CG11053 and CG13272 (Supplemental Table 1) from total RNA. Furthermore, the primers were chosen such that the amplicon includes an intronic sequence, which permits the identification of genomic DNA contamination.

First-strand cDNA was generated from ~5 μg of total RNA using the RevertAid H Minus First Strand cDNA Synthesis Kit (Fermentas) according to manufacturer’s instructions. The synthesis was carried out using a biotinylated oligo(dT) fused to the 454 sequencing primer B (5′-biotin-GCCTTGCCAGCCC GCTCAG(T)17V-3′, where V stands for any base but T). Double-stranded cDNA was synthesized by addition of 30 U of Escherichia coli DNA polymerase I and 1 U of E. coli ribonuclease H to the first-strand synthesis reaction, following the manufacturer’s (Fermentas) suggested protocol for second-strand cDNA synthesis.

Enzyme library preparation

Methods for the library preparation were based on previously described SAGE (Velculescu et al. 1995) and GLGI (Generation of Longer cDNA fragments for Gene Identification; Chen et al. 2002) protocols. Double-stranded cDNA was digested separately with the following restriction endonucleases: MboI (Sau3AI,DpnII), NlaIII, and TaiI. 3′ Fragments were recovered using M-270 Streptavidin beads (Dynal). After selection, specific linkers for each enzyme were ligated to the 3′ fragments. The linkers consisted of the double-stranded 454 sequencing primer A (5′-GCCTCCCTCGCGCCATCAG-3′) and a four-base overhang complementary to the enzyme restriction site (Supplemental Table 1). The three enzyme libraries were pooled before sequencing.

Shotgun library preparation

Approximately 5 μg of double-stranded cDNA was nebulized following previously described methods (Margulies et al. 2005) using 3 bar of nitrogen for 1 min. 3′ Nebulized fragments were recovered using M-270 Streptavidin beads (Dynal), blunt-ended with T4 DNA polymerase and ligated to the double-stranded linker A.

454 sequencing

To reduce the technical error, two libraries were produced with each methodology and pooled prior to sequencing. The libraries were purified and analyzed on a BioAnalyzer DNA 1000 LabChip to determine the concentration and quality of the fragmented cDNA. Sequencing was performed on a Genome Sequencer GS20 Instrument (Roche Diagnostics) following standard protocols (Margulies et al. 2005).

Bioinformatics

In-house Perl scripts (available on request) for the automated analysis of the 454-ESTs were used to (1) trim low-quality sequences, (2) remove poly(A) tails, (3) sort reads derived from the enzyme library into three different sample sets based on the 5′-most restriction site, and (4) map the ESTs to annotated transcripts and genome. We used as a reference data set the 5.1 release of transcript annotations and genomic sequence of D. melanogaster. Mapping of the ESTs to the available databases was undertaken using BLAST. The E-value cutoff was set at 1 × 10−10, and only reads with ≥90% identity with a sequence in the database ≥50% of their length were considered. Statistical analyses were carried out using the statistical programming language R (R Development Core Team 2007).

Acknowledgments

We thank three anonymous reviewers for insightful comments. We also thank the C.S. laboratory members for helpful comments on earlier versions of this manuscript and S. Glinka for 454 sequencing. This work was supported by Fonds zur Förderung der wissenschaftlichen Forschung grants to C.S. and a fellowship of the Brazilian National Council for Scientific and Technological Development (CNPq) to T.T.T.

Footnotes

[Supplemental material is available online at www.genome.org. The EST sequences have been deposited in GenBank under accession nos. EV574767-EV600806.]

Article published online before print. Article and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.6984908

References

  • Bainbridge M.N., Warren R.L., Hirst M., Romanuik T., Zeng T., Go A., Delaney A., Griffith M., Hickenbotham M., Magrini V., Warren R.L., Hirst M., Romanuik T., Zeng T., Go A., Delaney A., Griffith M., Hickenbotham M., Magrini V., Hirst M., Romanuik T., Zeng T., Go A., Delaney A., Griffith M., Hickenbotham M., Magrini V., Romanuik T., Zeng T., Go A., Delaney A., Griffith M., Hickenbotham M., Magrini V., Zeng T., Go A., Delaney A., Griffith M., Hickenbotham M., Magrini V., Go A., Delaney A., Griffith M., Hickenbotham M., Magrini V., Delaney A., Griffith M., Hickenbotham M., Magrini V., Griffith M., Hickenbotham M., Magrini V., Hickenbotham M., Magrini V., Magrini V., et al. Analysis of the prostate cancer cell line LNCaP transcriptome using a sequencing-by-synthesis approach. BMC Genomics. 2006;7:246. doi: 10.1186/1471-2164-7-246. [PMC free article] [PubMed] [Cross Ref]
  • Becker R.A., Chambers J.M., Wilks A.R., Chambers J.M., Wilks A.R., Wilks A.R. The new S language: A programming environment for data analysis and graphics. Wadsworth & Brooks/Cole; Pacific Grove, CA: 1988.
  • Chen J., Lee S., Zhou G., Wang S.M., Lee S., Zhou G., Wang S.M., Zhou G., Wang S.M., Wang S.M. High-throughput GLGI procedure for converting a large number of serial analysis of gene expression tag sequences into 3′ complementary DNAs. Genes Chromosomes Cancer. 2002;33:252–261. [PubMed]
  • Chen J., Rattray M., Rattray M. Analysis of tag-position bias in MPSS technology. BMC Genomics. 2006;7:77. doi: 10.1186/1471-2164-7-77. [PMC free article] [PubMed] [Cross Ref]
  • Emrich S.J., Barbazuk W.B., Li L., Schnable P.S., Barbazuk W.B., Li L., Schnable P.S., Li L., Schnable P.S., Schnable P.S. Gene discovery and annotation using LCM-454 transcriptome sequencing. Genome Res. 2007;17:69–73. [PMC free article] [PubMed]
  • Gilad Y., Oshlack A., Smyth G.K., Speed T.P., White K.P., Oshlack A., Smyth G.K., Speed T.P., White K.P., Smyth G.K., Speed T.P., White K.P., Speed T.P., White K.P., White K.P. Expression profiling in primates reveals a rapid evolution of human transcription factors. Nature. 2006;440:242–245. [PubMed]
  • Gilad Y., Rifkin S.A., Bertone P., Gerstein M., White K.P., Rifkin S.A., Bertone P., Gerstein M., White K.P., Bertone P., Gerstein M., White K.P., Gerstein M., White K.P., White K.P. Multi-species microarrays reveal the effect of sequence divergence on gene expression profiles. Genome Res. 2005;15:674–680. [PMC free article] [PubMed]
  • Kuo W.P., Liu F., Trimarchi J., Punzo C., Lombardi M., Sarang J., Whipple M.E., Maysuria M., Serikawa K., Lee S.Y., Liu F., Trimarchi J., Punzo C., Lombardi M., Sarang J., Whipple M.E., Maysuria M., Serikawa K., Lee S.Y., Trimarchi J., Punzo C., Lombardi M., Sarang J., Whipple M.E., Maysuria M., Serikawa K., Lee S.Y., Punzo C., Lombardi M., Sarang J., Whipple M.E., Maysuria M., Serikawa K., Lee S.Y., Lombardi M., Sarang J., Whipple M.E., Maysuria M., Serikawa K., Lee S.Y., Sarang J., Whipple M.E., Maysuria M., Serikawa K., Lee S.Y., Whipple M.E., Maysuria M., Serikawa K., Lee S.Y., Maysuria M., Serikawa K., Lee S.Y., Serikawa K., Lee S.Y., Lee S.Y., et al. A sequence-oriented comparison of gene expression measurements across different hybridization-based technologies. Nat. Biotechnol. 2006;24:832–840. [PubMed]
  • Margulies M., Egholm M., Altman W.E., Attiya S., Bader J.S., Bemben L.A., Berka J., Braverman M.S., Chen Y.J., Chen Z.T., Egholm M., Altman W.E., Attiya S., Bader J.S., Bemben L.A., Berka J., Braverman M.S., Chen Y.J., Chen Z.T., Altman W.E., Attiya S., Bader J.S., Bemben L.A., Berka J., Braverman M.S., Chen Y.J., Chen Z.T., Attiya S., Bader J.S., Bemben L.A., Berka J., Braverman M.S., Chen Y.J., Chen Z.T., Bader J.S., Bemben L.A., Berka J., Braverman M.S., Chen Y.J., Chen Z.T., Bemben L.A., Berka J., Braverman M.S., Chen Y.J., Chen Z.T., Berka J., Braverman M.S., Chen Y.J., Chen Z.T., Braverman M.S., Chen Y.J., Chen Z.T., Chen Y.J., Chen Z.T., Chen Z.T., et al. Genome sequencing in microfabricated high-density picolitre reactors. Nature. 2005;437:376–380. [PMC free article] [PubMed]
  • Metta M., Gudavalli R., Gibert J.M., Schlotterer C., Gudavalli R., Gibert J.M., Schlotterer C., Gibert J.M., Schlotterer C., Schlotterer C. No accelerated rate of protein evolution in male-biased Drosophila pseudoobscura genes. Genetics. 2006;174:411–420. [PMC free article] [PubMed]
  • Ng P., Tan J.J.S., Ooi H.S., Lee Y.L., Chiu K.P., Fullwood M.J., Srinivasan K.G., Perbost C., Du L., Sung W.K., Tan J.J.S., Ooi H.S., Lee Y.L., Chiu K.P., Fullwood M.J., Srinivasan K.G., Perbost C., Du L., Sung W.K., Ooi H.S., Lee Y.L., Chiu K.P., Fullwood M.J., Srinivasan K.G., Perbost C., Du L., Sung W.K., Lee Y.L., Chiu K.P., Fullwood M.J., Srinivasan K.G., Perbost C., Du L., Sung W.K., Chiu K.P., Fullwood M.J., Srinivasan K.G., Perbost C., Du L., Sung W.K., Fullwood M.J., Srinivasan K.G., Perbost C., Du L., Sung W.K., Srinivasan K.G., Perbost C., Du L., Sung W.K., Perbost C., Du L., Sung W.K., Du L., Sung W.K., Sung W.K., et al. Multiplex sequencing of paired-end ditags (MS-PET): a strategy for the ultra-high-throughput analysis of transcriptomes and genomes. Nucleic Acids Res. 2006;34:e84. doi: 10.1093/nar/gkl444. [PMC free article] [PubMed] [Cross Ref]
  • Nielsen K.L., Hogh A.L., Emmersen J., Hogh A.L., Emmersen J., Emmersen J. DeepSAGE—Digital transcriptomics with high sensitivity, simple experimental protocol and multiplexing of samples. Nucleic Acids Res. 2006;34:e133. doi: 10.1093/nar/gkl714. [PMC free article] [PubMed] [Cross Ref]
  • R Development Core Team . R: A language and environment for statistical computing. R Foundation for Statistical Computing; Vienna, Austria: 2007.
  • Surzycki S.J. Basic methods in molecular biology. Springer-Verlag; New York: 2000.
  • Velculescu V.E., Zhang L., Vogelstein B., Kinzler K.W., Zhang L., Vogelstein B., Kinzler K.W., Vogelstein B., Kinzler K.W., Kinzler K.W. Serial analysis of gene expression. Science. 1995;270:484–487. [PubMed]
  • Weber A.P., Weber K.L., Carr K., Wilkerson C., Ohlrogge J.B., Weber K.L., Carr K., Wilkerson C., Ohlrogge J.B., Carr K., Wilkerson C., Ohlrogge J.B., Wilkerson C., Ohlrogge J.B., Ohlrogge J.B. Sampling the Arabidopsis transcriptome with massively parallel pyrosequencing. Plant Physiol. 2007;144:32–42. [PMC free article] [PubMed]

Articles from Genome Research are provided here courtesy of Cold Spring Harbor Laboratory Press
PubReader format: click here to try

Formats: