# The Comparison of Gene Expression from Multiple cDNA Libraries

^{1}Oxford Gene Technology, Littlemore Park, Oxford OX4 4SS, UK;

^{2}Statistical Laboratory, University of Cambridge, Cambridge CB2 1SB, UK;

^{3}Lorantis Limited, Babraham, Cambridge CB2 4UL, UK

^{4}Corresponding author.

## Abstract

We describe a method for comparing the abundance of gene transcripts in cDNA libraries. This method allows for the comparison of gene expression in any number of libraries, in a single statistical analysis, to identify differentially expressed genes. Such genes may be of potential biological or pharmaceutical relevance. The formula that we derive is essentially the entropy of a partitioning of genes among cDNA libraries. This work goes beyond previously published analyses, which can either compare only two libraries, or identify a single outlier in a group of libraries. This work also addresses the problem of false positives associated with repeating the test on many thousands of genes. A randomization procedure is described that provides a quantitative measure of the degree of belief in the results; the results are further verified by considering a theoretically derived large deviations rate for the test statistic. As an example, the analysis is applied to four prostate cancer libraries from the Cancer Genome Anatomy Project. The analysis identifies biologically relevant genes that are differentially expressed in the different tumor cell types.

The introduction of high throughput sequencing and robotics technology has transformed the field of molecular biology. In the field of gene expression, the introduction of array technology has made it possible to monitor the expression of thousands of genes in single experiments (Phimister 1999) This approach is playing a fundamental role in the quantitative analysis of gene expression. However, it is limited by the propensity for cDNA clones and oligos to generate hybridization artifacts, especially the cross-hybridization of highly related family members. Complementary approaches use the frequency of a gene in a cDNA library as a measure of its tissue-specific expression. One approach, termed serial analysis of gene expression (SAGE) relies on high throughput sequencing of 14-bp gene-specific sequence tags to enumerate the expression of individual genes in a cell (Velculescu et al. 1995). A different approach uses EST counts to infer the relative level of expression of a gene (Okubo et al. 1992; Lee et al. 1995; Franco et al. 1997). Both methods, with their own advantages and limitations, can identify novel genes differentially expressed in a biological sample. Microarray-based gene expression analysis relies on an existing DNA sequence being present on the array and therefore can detect only expression of a predefined set of genes.

There are a growing number of cDNA library databases available both commercially and in the public domain. These include the BodyMap project (Okubo et al. 1992; http://www.imcb.osaka-u.ac.jp/bodymap/) and Incyte's LifeSeq database (http://www.incyte.com). Recently, the NCBI has launched the Cancer Genome Anatomy Project (O'Brien 1997; http://www.ncbi.nlm.nih.gov/ncicgap/). This project aims to understand the molecular bases of the transformation of specific normal epithelial cells into pre-malignant populations, and their further transformation into invasive and metastatic cancer. To circumvent the problem of tissue heterogeneity, different cell types are first dissected out of the tumor mass by use of a laser-based technology (Emmert-Buck et al. 1996) and then converted into cDNA libraries.

One of the uses of cDNA libraries is to identify genes whose expression differs between the tissue sources of the libraries (Lee et al. 1995; Franco et al. 1997; Bortoluzzi and Danieli 1999). Such genes may be of potential biological or pharmaceutical relevance. Thus, as this type of data is becoming more widely available, analysis techniques are now being developed to identify differentially expressed genes.

The Cancer Genome Anatomy Project use Fisher's Exact Test (see for example, Kanji et al. 1993) to compare the abundance of genes in cDNA libraries in their Digital Differential Display tool (DDD). Audic and Claverie (1997) raised a number of valid criticisms of the use of Fisher's exact test for this type of data, and developed their own statistical test to compare the expression of a gene in two cDNA libraries. Their test also allows for the construction of confidence intervals about a gene expression level.

However, both Audic and Claverie's test, and Fisher's exact test, can only be used to compare gene expression between precisely two libraries. When comparing more than two libraries, both groups use their test repeatedly to compare all possible pairs of libraries. In particular, Audic and Claverie performed multiple comparisons between libraries to construct Table 3 in their paper — a procedure that is statistically invalid. CGAP's DDD analysis provides a heuristic approximation to compensate for this procedure, by multiplying the *P*-values by the number of comparisons made. However, this is only a first order approximation, which ignores the correlation between all of the *P*-values derived. As a result, the *P*-values generated consistently underestimate the true probabilities of the events.

In addition to the problem of testing the same gene in many libraries, these tests will typically also be used repeatedly on many genes to identify those genes that are most differentially expressed between the libraries. In such situations, some genes would have significant *P*-values, even if the data were truly random. Again, CGAP's analysis multiplies the *P*-values by the number of genes tested.

More recently, Greller and Tobin (1999) developed a technique to compare the expression of a gene in more than two libraries. However, their analysis only identifies genes whose expression in a single library is markedly different from their expression in the others. It does not extend to more general patterns of differing gene expressions.

In this work, a more general test is developed that compares the abundance of a gene in any number of cDNA libraries by use of a single statistical test. The extent to which a gene is differentially expressed between the libraries is described by a log likelihood ratio statistic that we derive; this statistic tends asymptotically to a χ^{2} distribution.

Because the test is to be used repeatedly on many thousands of genes, we deliberately do not ascribe a *P*-value to the test statistic. Instead, two procedures are described that can verify that the genes found with high levels of the test statistic do not represent random noise. The first procedure is to use a randomization procedure that gives a quantitative measure of the degree to which the genes associated with a particular level of the statistic represent true differential expression. The second procedure is to use a theoretically derived large deviations rate.

## RESULTS

This section starts with an informal description of the basis of the statistic used for comparing gene expressions. A formal derivation is given in the Methods section. Consider a gene expressed in a set of cDNA libraries that have been constructed, using the same protocol, from a collection of tissues. The differences in abundance of that gene between the libraries can arise via two factors. First, it might be that the true frequency of the gene is the same in all of the tissues. In this case, the differences in gene transcript abundance between the cDNA libraries are simply sampling errors, arising by chance when the clones are selected. This is referred to as the Null Hypothesis.

Alternatively, the differences in transcript abundance may reflect genuine differences in the gene expressions in the different libraries. These differences may be due to any biological or pharmaceutical mechanism, for example, heterogeneities between tissues, patients, pathologies, or drug treatments. This is referred to as the Alternative Hypothesis.

In most cases, the differences in abundance will arise through a combination of these factors. The aim of the test we develop is to identify the extent to which the differenc es in expression represent true heterogeneity as opposed to sampling variability. This is possible because the distribution of the sampling errors can be quantified. The test works by considering each of the two situations in turn, and, in each case, calculating the likelihood of seeing the observed data. The two likelihoods are compared by subtracting the logs of the likelihoods, generating a log likelihood ratio. This ratio gives a measure of the extent to which the differences in gene expression correspond to heterogeneity of the libraries as opposed to random sampling variability.

The statistic, denoted *R _{j}* for gene

*j*, is derived in the Methods section, and is given by the expression

where *m* is the number of cDNA libraries, *x _{i,j}* is the number of transcript copies of gene

*j*in the

*i*th library and

*N*is the total number of cDNA clones sequenced in the

_{i}*i*th library.

*f*is the frequency of gene transcript copies of gene

_{j}*j*in all of the libraries, given by the formula

In a library in which there are no observed copies of the gene, that is, *x*_{i,j} = 0, its contribution to *R _{j}* is zero.

The formula is only valid if at least 50 ESTs have been sequenced from each library, and no single gene contributes >20% of the ESTs in a library. However, such libraries are unlikely to be encountered in real-life examples.

### Example Analysis

As an example, the analysis is performed on four prostate cancer libraries from the Cancer Genome Anatomy Project database. The four libraries are derived from the same patient. They have been constructed by use of the same protocol, from populations of micro-dissected cells representing different levels of pathology, varying from normal epithelium to invasive prostatic tumor. Details of the libraries used are shown in Table Table1.1.

The top hits, with *R* > 8, are shown in Table Table2.2. The table shows the UniGene Hs cluster ID, a brief description of the protein, the value of the test statistic *R* and the abundance of the gene in each of the four prostate cancer libraries.

There are 21 genes with *R* > 8. The majority of these clusters are annotated; four clusters are unclassified ESTs. Among the annotated clusters are a number of genes whose products are associated with the prostate, inflammation or proliferation.

Two genes belonging to the kallikreinin family, *kallikreinin 2* and *prostate-specific antigen* (*PSA*), are differentially expressed in the micro-dissected tumor cell types. Both are known markers for prostate cancer (Daher and Beaini 1998; Nelson et al. 1998). Interestingly, these genes appear to be over-expressed in low-grade prostatic intraepithelial neoplasia (PIN) compared with normal, high-grade PIN and invasive tumor cells. This finding is in accordance with in-situ hybridization studies in which it was found that the level of *PSA* expression in the prostate tumor mass is inversely proportional to the tumor grade (Qiu et al. 1990). α-*1-antichymotrypsin,* a protein known to bind PSA (Borchert et al. 1999), shows a similar expression pattern.

The analysis also identifies four genes up-regulated only in invasive tumor cells. Among these genes, human *150-kD oxygen-related protein* is involved in the mechanisms that protect cells from hypoxia damage (Ikeda et al. 1997), and may play a role in the development of tumor metastasis.

The ribosomal genes *S4, S15a, L31*, and *L37a* are all found to be differentially regulated between the four tissue types. However, these genes do not behave consistently. The genes for *S4, L37*, and *L37a* are under-expressed in tumor tissue compared with the normal or hyperplastic cells. This appears to be contrary to the findings of Vaarala et al. (1998), who have found that a number of ribosomal mRNAs, including *L37*, are over-expressed in prostatic cancer cell lines and tumor samples. We do not have any explanation for these discrepancies.

Inflammatory genes, as well as a number of novel genes, were also identified as differentially regulated within the four cell types. One of the unannotated EST clusters, Hs.172603, consists of ESTs almost entirely derived from prostatic cDNA libraries. These results, far from being conclusive, would need to be confirmed by further experimental research.

### Verification

In these analyses, many thousands of genes are separately tested to identify those genes that are most differentially expressed. Intrinsic to this type of analysis is the problem that even with totally random data, it is likely that some genes would achieve significant levels of the test statistic *R*. This is the reason that we have not associated *P*-values with the likelihood ratio statistic, and only used it to rank the genes.

Therefore, two verifications of these results are provided. The first is to generate random data sets conforming to the null hypothesis and identify the number of genes achieving each level of *R*, as described in the Methods section. The second is to assess the results in the context of the theoretical considerations of the large deviations rate associated with the test statistic *R*.

The results of the randomization are detailed in Table Table3.3. As the log likelihood ratio decreases, becoming more significant, the proportion of true positives among the real data increases.

For the threshold selected for Table Table2,2, *R* > 8, the mean number of false positives is 0.4, compared with 21 real genes found at this threshold. This corresponds to a true positive rate of ∼98%. Therefore, according to this analysis, it is likely that all of the 21 genes listed in Table Table22 are genuine results. As the threshold value of *R* is decreased, both the number and the proportion of false positives increases. For example, of the 74 genes with *R* > 6, there may be 6 false positives. Only 90% of these genes are likely to be true positives and correspond to genuine biological effect.

It is important to note that the results of these simulations are entirely data dependent. With different data, simulations would need to be repeated, and the numbers and thresholds derived would be different.

The second verification uses the theory of large deviations described in the Methods section. If the data were truly random, then the number of genes achieving levels of the statistic *R* should fall exponentially as a function of *R*. If there are more genes than predicted by this exponential decline, then this would be an indication that these genes represent true effect. In Figure Figure1,1, the number of genes at each level of the test statistic *R* is plotted as a function of *R*. It can be seen that there are two distinct regions of behavior. For 1 ⩽ *R* ⩽ 9, the number of genes decreases exponentially. The gradient in this region is −0.9, with standard error 0.7. This is not significantly different from the theoretically derived value of −1 for random data. Thus, according to this analysis, the number of genes achieving values of *R* in this region is not distinguishable from the number that would be expected when comparing a large number of genes. However, for *R* > 9, the number of genes is much above the exponential curve. This indicates that for *R* > 9, the number of genes observed is much greater than would be expected from random data. Therefore, we can be confident that these genes represent true variation, and are not false positive results.

## DISCUSSION

This work has described a likelihood ratio method for comparing the abundance of a gene in any number of cDNA libraries. The statistic can be used to identify those genes whose expression most varies across a set of cDNA libraries. The analysis method was tested on example prostate library data. It identified a number of genes that appear to be biologically relevant, as well as a number of unannotated EST clusters. That many of the top hits are known to be important in the prostate and associated pathology provides confidence that the analysis produces meaningful results. It also gives confidence that the unannotated EST clusters identified by the test warrant further investigation.

Because this method is used for comparing expression data for large numbers of genes, it is essential to quantify the number of false positives associated with an analysis. A method was described for randomizing the data, which assesses the extent to which results can be believed. The randomization was used to demonstrate that ∼98% of the genes identified from the example libraries, at the threshold level chosen, are likely to constitute genuine biological effect. The results were further verified by considering the large deviations rate for the test statistic. The number of highly differentially expressed genes was shown to be much greater than predicted by this rate.

In a sense, Figure Figure11 is incomplete in that it does not include those genes that are expressed, but which have not been sampled in any of the libraries. As a gene becomes more differentially expressed, we expect to find more copies of the gene in the tissue, and thus have a higher chance of capturing it in one of the libraries. Consequently, the smaller the value of *R*, the more genes are missing from the analysis. Examining Figure Figure1,1, we see that this effect when 0 < R < 1 and the histogram drops below the linear fit. The intercept of the linear regression could be used as an estimate of the total number of genes that are expressed in the tissue. However, this would only be true if the libraries were prepared from identical tissue; in our case, the estimate would be invalid.

Both cDNA sequencing and hybridization-array-based methods are now being increasingly used to quantify gene expressions in tissues and cell lines, and to make comparisons between healthy, pathological, and drug-treated states. The study of gene expression alone, however, does not give the complete picture of cellular activity. Studies comparing gene expression with protein abundance (Anderson and Seilhamer 1997; Gygi et al. 1999) have shown little correlation between the two. There are several reasons why this might be the case, including differences in translational control and RNA and protein turnover rates (Hargrove and Schmidt 1989; Rivett 1990). Thus, this type of analysis can only give an indication of genes whose products may be of biological or pharmaceutical relevance. Any results of this type of analysis would have to be confirmed by further research.

## METHODS

### Derivation of the Test Statistic

Consider the expression of gene *j* in all of the cDNA libraries. Denote the number of clones sampled for each library *i* as *N _{i}*, and the observed number of copies of the gene as

*x*. Let

_{i,j}*m*be the number of cDNA libraries. We will compare two hypotheses relating to the frequency of this gene using a likelihood ratio. Under the null hypothesis, the gene is not differentially expressed, so the frequency of the gene is the same in all libraries. Under the alternative hypothesis, the gene is differentially expressed, so the frequency of the gene in each of the libraries is different.

In both cases, as long as the abundance of the gene is small relative to the total mRNA content of the cell (20% is the usual heuristic, Hays 1994), the distribution of the gene, denoted *X _{i,j}*, will be well approximated by a Poisson distribution, with

λ will be determined below and will have a different value for the two hypotheses. The same Poisson approximation is also made by Audic and Claverie (1997).

The null hypothesis is that the frequency of the gene is the same in each library. For a gene with transcript frequency *f,* the number of transcripts in library *i* is approximately distributed as a Poisson variable with parameter *fN _{i}.* Therefore, the likelihood of the observed data, under the null hypothesis, is given by

The maximum likelihood estimate of the common gene frequency is the solution to the equation

The solution, *f _{j},* is given by

This is just the proportion of the mRNA for the gene of interest among all mRNA transcripts in all of the libraries. Observe that this is also the general solution that maximizes the likelihood of the entire data set under the null hypothesis. Thus, the maximum estimate of the likelihood of the observed data under the null hypothesis, *L*_{j}^{0}, is given by

Under the alternative hypothesis, the frequency of gene transcripts in each library is different. The maximum likelihood estimate of gene frequency in each library *i* is *x _{i,j}*/

*N*. Therefore, the gene abundance in library

_{i}*i*is approximately distributed as a Poisson variable with parameter

*x*(When

_{i,j}.*x*= 0, the Poisson distribution is well defined, with the event

_{i,j}*x*= 0 having probability 1, and events

*x*> 0 having probability 0). Note that for the Poisson approximation to hold, each library must have at least 50 ESTs sequenced (Hays 1994). Thus, the maximum estimate of the likelihood of the observed data under the alternative hypothesis is given by

The null hypothesis is compared with the alternative hypothesis by taking the log of the ratio of the two likelihoods, that is, log (*L*^{1}_{j}/*L*^{0}_{j}). This gives the test statistic *R _{j}*

### Application of Method to CGAP Data

For each sequence in the CGAP libraries, we identified the Unigene cluster to which the sequence has been allocated; this was acheived by searching for the accession number of the EST in the Unigene database (Hs Build 96; Boguski and Schuler 1995; http://www.ncbi.nlm.nih.gov/ncicgap/). The number of ESTs from each library that belong to each Unigene cluster were used as the input into the statistical test. The test was applied to each gene in turn. The genes were then ordered according to their value of the test statistic *R*.

### Verification

For the first verification, the number of false positives is assessed by generating random data sets satisfying the null hypothesis, and performing the analysis on these data. This is used to provide a quantitative measure of the extent to which the results of the original analysis can be believed. For each gene, the common gene transcript frequency, *f _{j}* (eqaution 2) is calculated. Then, for each library, a random gene abundance is generated from a Poisson distribution whose parameter is equal to the expected number of gene transcripts for that library(equal to

*N*for library

_{i}f_{j}*i*).

One-thousand random data sets were generated in this way. The analysis was performed on each data set. For each level of the log likelihood test statistic *R*, the mean number of genes across the 1000 analyses with at most that log likelihood was calculated. This was compared with the number of genes from the true data set with at most the same level of *R*. For each log likelihood threshold, the proportion of the genes from the true data set likely not to be false positives was calculated.

### Theoretical Considerations of the Test Statistic

When the null hypothesis is correct, there are a number of theoretical considerations that can be made about the test statistic. Under the null hypothesis, the true frequency of each gene *j*, in every library, is *f _{j}*, as given in equation 6.

Firstly, under Wilke's theorem (De Groot 1986), as all of the *N _{i}* → ∞, the distribution of 2

*R*for each

_{j}*R*tends to a χ

_{j}^{2}distribution with

*m*– 1 degrees of freedom.

Secondly, for each library, the probabilities of the observed gene frequencies, {*x _{ij}/N_{i}* ∶ 1 ⩽

*j*⩽

*k*} deviating from the actual frequencies {

*fj*∶ 1

*j*⩽

*k*} can be determined. These are denoted

*P*. When

_{i}*N*is large, the theory of large deviations (Ellis 1985) estimates that

_{i}*P*decays exponentially, so that

_{i} *I _{i}* is the large deviations rate function for a multinomial distribution and is given by

This function is also know as the Kullback-Leibler distance between the two frequencies and measures the relative entropy between them. Because there are *m* (independent) libraries, the joint probability of observing {*x _{i,j}*: 1 ≤

*i*≤

*m,*1 ≤

*j*≤

*k*} is Π

_{i}

*P*which is proportional to

_{i}, Equation 12 gives the duality under which we may think of the probability of observing the expression of gene *j* in each of the libraries as proportional to *e*^{−Rj}.

There is an alternative way to view *R _{j}* as an appropriate test statistic. Consider the distribution of the number of mRNAs for gene

*j,*in all libraries, {

*x*1 ≤

_{i,j}:*i*≤

*m*}, conditioned on the total number of mRNAs for gene

*j*seen in all of the libraries. We denote this total as

*x*= Σ

_{j}_{i}

*x*If each of the

_{i,j}.*x*are drawn from Poisson random variables, then, according to the divisibility property of the Poisson distribution, the variables {

_{i,j}*x*|

_{i,j}*x*1 ≤

_{j}:*i*≤

*m*} are drawn from a multinomial distribution, with

*x*events, and

_{j}*m*outcomes, with probabilities

*N*

_{1}/

*N,*… ,

*N*/

_{m}*N. N*is the total number of observed mRNAs in all of the libraries, equal to Σ

_{i}

*N*When

_{i}.*x*is large, the large deviations rate function for the multinomial distribution (equation 11) can be used directly to deduce that the probability of observing {

_{j}*x*1 ≤

_{i,j}:*i*≤

*m*|

*x*} is proportional to

_{j} Therefore, with *k* genes, the expected number of genes for which the test statistic *R* is approximately *r* will decrease exponentially as a function of *r*, with gradient −1. Thus, a logarithmic plot of the number of genes with *R* approximately *r*, as a function of *r*, can be used to determine the extent to which the number of observed genes with a given value of *R* is greater than one would expect by random chance.

## Acknowledgments

We thank Liz Proudfoot for help with UniGene, and Gillian Amphlett, Anna Git, Simon Dear, Philippe Sanseau, and Mike Trower for helpful discussion and comments. Y.G. holds a Research Fellowship at Emmanuel College, Cambridge., UK.

The publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked “advertisement” in accordance with 18 USC section 1734 solely to indicate this fact.

## Footnotes

E-MAIL ku.oc.tgo@lekets.vod; FAX 44 0 1865 405120.

Article published online before print: *Genome Res.,* 10.1101/gr.132500.

Article and publication are at www.genome.org/cgi/doi/10.1101/gr.132500.

## REFERENCES

- Anderson L, Seilhamer J. A comparison of selected mRNA and protein abundances in the human liver. Electrophoresis. 1997;18:533–537. [PubMed]
- Audic S, Claverie J-M. The significance of digital gene expression profiles. Genome Res. 1997;7:986–995. [PubMed]
- Boguski MS, Schuler GD. ESTablishing a human transcript map. Nat Genet. 1995;10:369–371. [PubMed]
- Borchert GH, Yu H, Tomlinson G, Giai M, Roagna R, Ponzone R, Sgro L, Diamandis EP. Prostate specific antigen molecular forms in breast cyst fluid and serum of women with fibrocystic breast disease. J Clin Lab Anal. 1999;13:75–81. [PubMed]
- Bortoluzzi S, Danieli GA. Towards an in silico analysis of transcription patterns. Trends Genet. 1999;15:118–119. [PubMed]
- Daher R, Beaini M. Prostate-specific antigen and new related markers for prostate cancer. Clin Chem & Lab Med. 1998;36:671–681. [PubMed]
- De Groot MH. Probability and statistics. Reading, MA: Addison-Wesley; 1986.
- Ellis RS. Entropy, large deviations and statistical mechanics. Heidelberg, Germany: Springer-Verlag; 1985.
- Emmert-Buck MR, Bonner RF, Smith PD, Chuaqui RF, Zhuang Z, Goldstein SR, Weiss RA, Liotta LA. Laser capture microdissection. Science. 1996;274:998–1000. [PubMed]
- Franco, G.R., Rabelo, E.M.L., Azevedo, V., Pena, H.B., Ortega, J.M., Santos, T.M., Meira, W.S.F., Rodrigues, N.A., Dias, C.M.M., Harrop, R. et al. Evaluation of cDNA libraries from different developmental stages of Schistosoma mansoni for production of expressed sequence tags (ESTs).
*DNA Res.***4:**231–240. [PubMed] - Greller LD, Tobin FL. Detecting selective expression of genes and proteins. Genome Res. 1999;9:282–296. [PMC free article] [PubMed]
- Gygi SP, Rochon Y, Franza BR, Aebersold R. Correlation between protein and mRNA abundance in yeast. Mol Cell Biol. 1999;19:1720–1730. [PMC free article] [PubMed]
- Hargrove JL, Schmidt FR. The role of mRNA and protein stability in gene expression. FASEB J. 1989;3:2360–2370. [PubMed]
- Hays WL. Statistics. London, UK: Holt, Rinehart and Winston; 1994.
- Ikeda J, Kaneda S, Kuwabara K, Ogawa S, Kobayashi T, Matsumoto M, Yura T, Yanagi H. Cloning and expression of cDNA encoding the human 150kDa oxygen-regulated protein, ORP150. Biochem Biophys Res Commun. 1997;230:94–99. [PubMed]
- Kanji GK. 100 Statistical Tests. London, UK: Sage Publications; 1993.
- Krizman DB, Chuaqui RF, Meltzer PS, Trent JM, Duray PH, Linehan WM, Liotta LA, Emmert-Buck MR. Construction of a representative cDNA library from prostatic intraepithelial neoplasia. Cancer Res. 1996;56:5380–5383. [PubMed]
- Lee NH, Weinstock KG, Kirkness EF, Earle-Hughes JA, Fuldner RA, Marmaros S, Glodek A, Gocayne JD, Adams MD, Kerlavage AR, et al. Comparative expressed-sequence-tag analysis of differential gene expression profiles in PC-12 cells before and after nerve growth factor treatment. Proc Natl Acad Sci. 1995;92:8303–8307. [PMC free article] [PubMed]
- Nelson PS, Ng W-L, Schummer M, True LD, Liu AY, Bumgarner RE, Ferguson C, Dimak A, Hood L. An expressed-sequence-tag database of the human prostate: Sequence analysis of 1168 cDNA clones. Genomics. 1998;47:12–25. [PubMed]
- O'Brien C. Cancer genome anatomy project launched. Mol Med Today. 1997;3:94. [PubMed]
- Okubo K, Hori N, Matoba R, Niiyama T, Fukushima A, Kojima Y, Matsubara K. Large scale cDNA sequencing for analysis of quantitative and qualitative aspects of gene expression. Nat Genet. 1992;2:173–179. [PubMed]
- Phimister B. Chipping forecast. Nat Genet. 1999;21:1–60.
- Qiu S-D, Young C Y-F, Bilharz DL, Prescott JL, Farrow GM, He W-W, Tindall DJ. In situ hybridisation of prostate specific antigen mRNA in human prostate. J Urol. 1990;144:1550–1556. [PubMed]
- Rivett AJ. Eukaryotic protein degradation. Curr Opin Cell Biol. 1990;2:1143–1149. [PubMed]
- Vaarala MH, Porvari KS, Kyll AP, Mustonen MVJ, Lukkarinen O, Vihko Several genes encoding ribosomal proteins are over-expressed in prostate cancer cell lines: Confirmation of L7a and L37 over-expression in prostate cancer tissue samples. Int J Cancer. 1998;78:27–32. [PubMed]
- Velculescu VE, Zhang L, Volgelstein B, Kinzler KW. Serial analysis of gene expression. Science. 1995;270:484–87. [PubMed]

**Cold Spring Harbor Laboratory Press**

## Formats:

- Article |
- PubReader |
- ePub (beta) |
- PDF (113K) |
- Citation

- Estimating and comparing the rates of gene discovery and expressed sequence tag (EST) frequencies in EST surveys.[Bioinformatics. 2004]
*Susko E, Roger AJ.**Bioinformatics. 2004 Sep 22; 20(14):2279-87. Epub 2004 Apr 1.* - Peanut gene expression profiling in developing seeds at different reproduction stages during Aspergillus parasiticus infection.[BMC Dev Biol. 2008]
*Guo B, Chen X, Dang P, Scully BT, Liang X, Holbrook CC, Yu J, Culbreath AK.**BMC Dev Biol. 2008 Feb 4; 8:12. Epub 2008 Feb 4.* - Comparison of seven methods for producing Affymetrix expression scores based on False Discovery Rates in disease profiling data.[BMC Bioinformatics. 2005]
*Shedden K, Chen W, Kuick R, Ghosh D, Macdonald J, Cho KR, Giordano TJ, Gruber SB, Fearon ER, Taylor JM, et al.**BMC Bioinformatics. 2005 Feb 10; 6:26. Epub 2005 Feb 10.* - Statistical evaluation of SAGE libraries: consequences for experimental design.[Physiol Genomics. 2002]
*Ruijter JM, Van Kampen AH, Baas F.**Physiol Genomics. 2002 Oct 29; 11(2):37-44. Epub 2002 Oct 29.* - An introduction to the genetics of normal and defective hearing.[Ann N Y Acad Sci. 1997]
*Martini A, Mazzoli M, Kimberling W.**Ann N Y Acad Sci. 1997 Dec 29; 830:361-74.*

- Accumulation of genome-specific transcripts, transcription factors and phytohormonal regulators during early stages of fiber cell development in allotetraploid cotton[The Plant journal : for cell and molecular ...]
*Yang SS, Cheung F, Lee JJ, Ha M, Wei NE, Sze SH, Stelly DM, Thaxton P, Triplett B, Town CD, Chen ZJ.**The Plant journal : for cell and molecular biology. 2006 Sep; 47(5)761-775* - The oak gene expression atlas: insights into Fagaceae genome evolution and the discovery of genes regulated during bud dormancy release[BMC Genomics. ]
*Lesur I, Le Provost G, Bento P, Da Silva C, Leplé JC, Murat F, Ueno S, Bartholomé J, Lalanne C, Ehrenmann F, Noirot C, Burban C, Léger V, Amselem J, Belser C, Quesneville H, Stierschneider M, Fluch S, Feldhahn L, Tarkka M, Herrmann S, Buscot F, Klopp C, Kremer A, Salse J, Aury JM, Plomion C.**BMC Genomics. 16(1)112* - AtMYB93 is a novel negative regulator of lateral root development in Arabidopsis[The New Phytologist. 2014]
*Gibbs DJ, Voß U, Harding SA, Fannon J, Moody LA, Yamada E, Swarup K, Nibau C, Bassel GW, Choudhary A, Lavenus J, Bradshaw SJ, Stekel DJ, Bennett MJ, Coates JC.**The New Phytologist. 2014 Sep; 203(4)1194-1207* - Transcriptome analysis of ripe and unripe fruit tissue of banana identifies major metabolic networks involved in fruit ripening process[BMC Plant Biology. ]
*Asif MH, Lakhwani D, Pathak S, Gupta P, Bag SK, Nath P, Trivedi PK.**BMC Plant Biology. 14316* - Using the DFCI Gene Index Databases for Biological Discovery[Current protocols in bioinformatics / edito...]
*Antonescu C, Antonescu V, Sultana R, Quackenbush J.**Current protocols in bioinformatics / editoral board, Andreas D. Baxevanis ... [et al.]. 2010 Mar; 0 1Unit1.6.1-Unit1.636*

- Cited in BooksCited in BooksNCBI Bookshelf books that cite the current articles.
- PubMedPubMedPubMed citations for these articles

- The Comparison of Gene Expression from Multiple cDNA LibrariesThe Comparison of Gene Expression from Multiple cDNA LibrariesGenome Research. 2000 Dec; 10(12)2055

Your browsing activity is empty.

Activity recording is turned off.

See more...