![]() | ![]() |
Formats:
|
||||||||||||||||
Copyright © American Society for Investigative Pathology Quantitative Gene Expression Profiling in Formalin-Fixed, Paraffin-Embedded Tissues Using Universal Bead Arrays From Illumina, Inc.,* San Diego; and Veridex, LLC,† a Johnson & Johnson Company, San Diego, California Accepted July 6, 2004. This article has been cited by other articles in PMC.Abstract We recently developed a sensitive and flexible gene expression profiling system that is not dependent on an intact poly-A tail and showed that it could be used to analyze degraded RNA samples. We hypothesized that the DASL (cDNA-mediated annealing, selection, extension and ligation) assay might be suitable for the analysis of formalin-fixed, paraffin-embedded tissues, an important source of archival tissue material. We now show that, using the DASL assay system, highly reproducible tissue- and cancer-specific gene expression profiles can be obtained with as little as 50 ng of total RNA isolated from formalin-fixed tissues that had been stored from 1 to over 10 years. Further, tissue- and cancer-specific markers derived from previous genome-wide expression profiling studies of fresh-frozen samples were validated in the formalin-fixed samples. The DASL assay system should prove useful for high-throughput expression profiling of archived clinical samples. The recent development of high-throughput microarray technologies provides a powerful tool for genome-wide gene expression analysis.1 For example, microarray-based tumor classification,2–4 as well as treatment response and clinical outcome prediction,4–7 have been demonstrated in many cancer types. However, these technologies typically require substantial quantities of fresh or frozen tissue. Although many institutions are now maintaining frozen tissue banks, which should facilitate gene expression analysis in the future, few of these now have sufficient clinical follow-up data. On the other hand, there is a vast supply of formalin-fixed, paraffin-embedded (FFPE) tissues for which the clinical outcome is already known.8 The ability to analyze gene expression patterns in these archived tissues would greatly facilitate retrospective studies to correlate gene expression patterns with given disease states, or histological and clinical phenotypes. This approach could be used to discover biomarkers for therapeutic decision making and also to develop clinical tests, as FFPE sample collection and storage is a routine practice in pathology laboratories. A barrier to the analysis of FFPE samples is that RNA extracted from FFPE tissues is often significantly degraded. Previous studies show that only about 3% or less of the RNA isolated from paraffin samples is accessible to cDNA synthesis, compared to fresh-frozen samples.9 In particular, this has impeded progress in microarray-based gene expression quantitation from FFPE specimens.10 As a result, most gene expression analysis of FFPE tissues has so far been done using immunohistochemical staining (IHC) and quantitative RT-PCR (qPCR), which allow only a few genes to be analyzed at a time.9,11–16 Although sufficient RNA can be isolated from a few 10-μm slide-mounted paraffin sections to quantitate up to 30 genes by qPCR,17 there is clearly a bottleneck in scaling up the number of genes that can be measured by this approach. Also, qPCR does not reliably measure RNA fragments shorter than 100 bp.17 We have recently developed a flexible, sensitive, and reproducible gene expression profiling assay, DASL (cDNA-mediated annealing, selection, extension and ligation), for parallel analysis of hundreds of genes with as little as 25 ng of total RNA.18 We hypothesized that the DASL assay might be able to overcome the technical limitations to microarray-based analysis of FFPE samples. While most array technologies use an in vitro transcription (IVT)-mediated sample labeling procedure,19 DASL uses random priming in the cDNA synthesis, and therefore does not depend on an intact poly-A tail for oligo-d(T) priming. In addition, the assay requires a relatively short target sequence of about 50 nucleotides for query oligonucleotide annealing. In this study, we characterized the sensitivity and quantitative performance of the assay system on FFPE tissues and demonstrated its utility for marker validation as well as new marker identification. The results show that the DASL assay is effective for an important and extensive source of archival clinical material that was hitherto largely inaccessible to microarray technology. This opens up new avenues to the large-scale discovery, validation, and clinical application of mRNA biomarkers of disease. Materials and Methods Tissue Specimens Sample set consisted of 11 matched pairs of FFPE colon cancer and adjacent normal tissues, and 11 matched pairs of FFPE breast cancer and adjacent normal tissues. Colon cancer tissue specimens included 2 Dukes B1 (both well differentiated adenocarcinomas), 5 Dukes B2 (4 moderately and 1 well differentiated adenocarcinoma) and 4 Dukes C2 (2 well, 1 moderately differentiated, and 1 mucinous adenocarcinoma). Breast cancer tissue specimens included one Stage 0, two Stage I, six Stage IIA, one Stage IIB, and one Stage IIIC. There were nine infiltrative ductal carcinomas, one mucinous carcinoma and one ductal carcinoma in situ. Colon cancer was staged according to Modified Aston-Coller classification and breast cancer was staged according to AJCC Cancer Staging Manual (Sixth Edition, Springer, 2003). All samples were obtained from Asterand, Inc. (Detroit, MI) according to an Institutional Review Board approved protocol. Patient demographic and pathology information was also collected. Among the eleven sample pairs of each tissue type, four pairs were collected in a period within 1 year, four pairs in a period of 2 years, and three pairs in a period of 9 to 11 years before the current study (Table 1). Along with the FFPE samples, two matched pairs of fresh-frozen colon cancer and adjacent normal tissue and two matched pairs of fresh-frozen breast cancer and adjacent normal tissue were collected from the patients included in the FFPE sample set. The histopathological features of each sample were reviewed to confirm diagnosis and tumor content.
RNA Isolation For total RNA isolation from FFPE tissues, three 20-μm-thick sections were cut from each tissue block. The High Pure RNA Paraffin Kit (Roche) was used. Proteinase K digestion time was 12 hours for each sample. All purification, DNase treatment, and other steps were performed according to the manufacturer’s protocol. After total RNA isolation, samples were stored at −80°C until use. Total RNA from fresh-frozen tissue samples was isolated by a standard Trizol/chloroform method. Tissue was homogenized in Trizol reagent (Invitrogen). Total RNA was isolated from Trizol and precipitated at −20°C with isopropyl alcohol. RNA pellets were washed with 75% ethanol, dissolved in water, and stored at −80°C until use. RNA integrity was examined with the Agilent 2100 Bioanalyzer RNA 6000 Nano Assay (Agilent Technologies). Real-Time Quantitative RT-PCR (qPCR) qPCR analyses were performed on the ABI Prism 7900HT sequence detection system (Applied Biosystems) as described previously.18 Most PCR primers were designed to amplify approximately 90-bp fragments. Primers for the RPL13A transcript were designed to amplify 90-bp and 155-bp fragments. BeadArray Manufacture Microarrays were assembled by loading pools of glass beads (3 μm in diameter) derivatized with oligonucleotides onto the etched ends of fiber-optic bundles.20 About 50,000 optical fibers are hexagonally packed to form a ~1.4 mm diameter bundle. The fiber optic bundles are assembled into an array matrix (Sentrix array), comprising 96 bundles arranged in an 8 × 12 matrix that matches the dimensions of standard microtiter plates.21 This arrangement allows simultaneous processing of 96 samples using standard robotics. Because the beads are positioned randomly, a decoding process is carried out to determine the location and identity of each bead in every array location.22 Decoding is an automated part of array manufacture. Assay Probe Design For array analysis, two probe oligonucleotides were designed to interrogate each target site on the cDNA as described previously,18 with 2 to 10 target sites per gene (average 6 sites). The first oligo consists of two parts: the gene-specific sequence and a universal PCR primer sequence (P1, 5′-ACTTCGTCAGTAACGGAC-3′) at the 5′-end. The second oligo consists of three parts: the gene-specific sequence, a unique address sequence which is complementary to one of 1520 capture sequences on the array, and a universal PCR primer sequence (P2, 5′-GTCTGCCTATAGTGAGTC-3′) at the 3′-end. A single address sequence is uniquely associated with a single target site. This address sequence allows the PCR-amplified products (see below) to hybridize to a universal microarray bearing the complementary probe sequences.21 The gene-specific sequence is designed with Tm ranging from 57°C to 62°C. Array Analysis cDNA synthesis, DASL process, array image processing, and signal extraction were as described previously.18 First, a 20-μl reverse transcription reaction containing a reaction mix (MMC; Illumina, San Diego, CA), biotinylated random hexamers and oligo-d(T)18, and total RNA (up to 1 μg), was incubated at room temperature for 10 minutes and then at 42°C for 1 hour. The oligo-d(T) priming helps improve assay sensitivity for fresh-frozen samples with intact RNA. Pooled assay oligos were annealed to their sequence-specific targets on the cDNA under a controlled hybridization program.21 The cDNA was immobilized on paramagnetic beads and washed to remove any excess or mis-hybridized oligos. Hybridized oligos were then extended and ligated to generate amplifiable templates, using Illumina-supplied reagents and conditions (BeadStation User’s Manual, Illumina). A PCR reaction was performed with Cy3 labeled universal PCR primers. Single-stranded PCR products were prepared by denaturation, and were then hybridized to Sentrix arrays under a temperature gradient program.21 The arrays were imaged using a BeadArray Reader scanner (Illumina).20 Image processing and intensity data extraction software were as describe previously.23 The DASL assay was performed three times independently, and samples were hybridized to three different array matrices. The sample and array coordinate information is shown in Table 2. All of the array data are represented in Supplementary Tables 1–3 at http://ajp.amjpathol.org.
Array Data Normalization Our method normalizes given array data with respect to reference data such as an average of multiple replicate arrays. We used cubic spline normalization that makes distributions of gene intensities on a given array and reference array similar. The normalization uses quantiles of sequence type signals to fit smoothing B-splines similar to what was proposed by Workman et al.24 Expression Data Analysis and Clustering Algorithm To identify disease- and tissue-specific markers, we performed two separate analyses. 1) FFPE samples on Array Matrix 2 were distributed into the following group pairs: colon normal versus colon cancer, breast normal versus breast cancer, and normal breast versus normal colon. We applied Mann-Whitney test with a P value cutoff of 0.01 and a twofold change requirement to identify marker genes using FFPE samples. 2) We divided fresh-frozen samples from Array Matrix 3 into colon cancer versus colon normal and breast cancer versus breast normal (two samples per group) and ran the algorithm using negative controls in combination with rank invariant set of probes for construction of an error model, as described by Fan et al.18 Based on the array signals of selected genes, we computed the correlation coefficient matrix for the FFPE samples and clustered them using Agnes function in the R package with Ward’s method. The markers identified from Array Matrix 2 were used to cluster FFPE samples on Array Matrix 1 while markers identified on Array Matrix 3 were applied to clustering FFPE samples from the same matrix. Results Quality of RNAs Isolated from FFPE Tissues with Different Durations of Storage We used 8 fresh-frozen and 44 FFPE tissues (Table 1) with time of storage ranging from 1 year to over 10 years for this study. Total RNA was extracted from fresh-frozen and FFPE tissues and converted to cDNA (see Materials and Methods). Aliquots of the cDNA reactions were taken for real-time PCR analysis. To assess the integrity of RNA isolated from these FFPE tissues, we measured the amplification efficiency of two fragments (90 bp and 155 bp) from a highly expressed gene (RPL13A). As shown in Figure 1
To obtain reproducible gene expression results, we used an RT-PCR test to pre-qualify the RNA samples before array analysis. RT-PCR primers were designed to target ~90-bp fragments in each of three housekeeping genes: UBC, HPRT, and PBDG. Of the 44 samples tested, only one sample (FS3-CN2) showed no amplification in RT-PCR even for the highly expressed ubiquitin C (UBC) gene. This sample also failed to produce any gene expression data on the array. DASL Assay Performance and Reproducibility We examined the impact of input RNA quantity on assay performance. Various amounts of total RNA (1000, 500, 250, and 100 ng) isolated from FFPE tissues were converted into cDNA. Each cDNA sample was split to perform two independent DASL assays. Highly reproducible results were obtained with as little as 50 ng of total RNA (R2 = 0.97). More importantly, as shown in Figure 2
We also compared the number of genes detectable by the DASL assay in 16 RNA samples extracted from paired fresh-frozen and FFPE colon and breast tissues, both cancerous and normal. More than 90% of the genes that were detected in the fresh-frozen samples were also detected in their matching FFPE samples, when 200 ng of total RNA was assayed. However, we observed that the gene expression profile of the paraffin-embedded samples had weaker correlation with the profile generated from the corresponding frozen samples (R2 = 0.69), possibly due to sequence-dependent differences in mRNA degradation during tissue fixation and storage. Lists of differentially expressed genes generated from fresh-frozen and FFPE samples had highly significant overlap (with the FFPE list containing ~50% less genes). For example, at a 0.01 confidence level, 64 of 231 genes were identified as differentially expressed in matching fresh-frozen samples (FS1_CC2: colon cancer versus FS1_CN2: colon normal), and 38 were differentially expressed in the corresponding FFPE samples. Twenty-eight of these genes were in common, which gives a significance of overlap of 1.0e-09, according to the Fisher’s exact test,25 applied to contingency tables formed from differential expression calls. For another matching pair, FS1_BC3: breast cancer versus FS1_BN3: breast normal, 61 genes were identified as differentially expressed using fresh-frozen and 33 using FFPE samples, with an overlap of 20 genes. The significance of this overlap is 3.8e-05. Together, these results suggest that sets of differentially expressed genes identified in FFPE samples resemble those identified from fresh-frozen samples. All of the assays were done at a 1212-plex level, corresponding to 231 genes with 2 to 10 targeted sites per gene. This experimental design allowed assessment of the effect of the probe number on assay quantitation. Our subsampling analysis showed that three optimally designed probes performed comparably to four or more probes with regard to their ability to detect expressed genes as well as differential expression in RNA samples extracted from both fresh-frozen and FFPE tissues. Further lowering the probe number negatively impacted assay reproducibility. Probes optimized for fresh-frozen sample RNAs performed equally well with RNAs extracted from FFPE samples. Since DASL uses random priming in the cDNA synthesis, the probes can be designed to target any unique regions of the gene. There is no need to limit the selection of optimal probes to the 3′-end of the transcripts. Cluster Analysis of Gene Expression Patterns in FFPE Samples To further test the strategy of using archival tissues for cancer marker discovery, we generated expression profiles with paired (ie, “cancer versus normal” of same individual) fresh-frozen samples (N = 4 for each tissue type), and identified a subset of genes that distinguished cancer from normal tissues with a significant differential expression score (P < 0.001). 40 and 37 of these differentially expressed genes were identified from a set of 212 cancer-related genes for colon and breast tissue, respectively. Since we had a limited number of fresh-frozen samples (two for each class), our list could contain genes which simply reflect individual differences unrelated to cancer status. Expression profiles of these genes from FFPE samples (N = 21 for each tissue type) were then analyzed using an agglomerative nesting clustering method. The cancer and normal samples were separated into two distinct clusters in both of the tissue types; and cancer samples with the same clinical stage were clustered together (Figure 3)
We also performed an alternate cluster analysis, in which genes selected by differential expression analysis from a set of 24 FFPE samples were used to cluster another set of 25 FFPE samples assayed independently in another experiment. The cluster analysis was done in two steps: first, genes distinguishing colon from breast tissue were selected. Based on these genes, we were able to separate samples from the second group into colon and breast tissue types with 100% accuracy. Second, genes specific for colon cancer and breast cancer were selected (similar to the analysis of the fresh-frozen samples). Based on these genes, colon cancer samples were separated from colon normal samples without mistake, while breast cancer samples were separated from breast normal samples with one mistake (FS3-BC3). Based on cluster analyses of both fresh-frozen and FFPE samples assayed on Array Matrix 1 and 2, we generated a list of differentially expressed genes that can distinguish colon cancer from colon normal tissue (SIM2, HAR, MMP7, FGFR2, TMEPAI, CLU, PLAB, and human skin collagenase) and a list of genes that can distinguish breast cancer from breast normal tissue (HAR, FGF2, calmegin, IGF-1a, MET, EGFR, ITGA6, IGF2, and BMPR1B), each at a P value <0.01. We plotted the array data for some of these genes—four for each tissue type (Figure 4)
Furthermore, seven colon cancer and four breast cancer-specific markers identified from the array analysis were tested by qPCR with 46 individual samples (4 fresh-frozen and 20 FFPE colon tissues, and 4 fresh-frozen and 18 FFPE breast tissues). Good correlations between the threshold cycle (Ct) number and the array intensity for the 11 markers was obtained with the fresh-frozen samples (R2 = 0.88). However, poor correlations were observed with the FFPE samples (R2 = 0.41), mainly because the qPCR assay was less reproducible and less sensitive in these samples. Individual FFPE samples are known to have different degrees of RNA degradation,17 which in turn dramatically affect the qPCR results (Figure 1) Discussion RNA from FFPE specimens can be difficult to extract, since the RNA becomes cross-linked and degraded during the fixation and storage process; in addition, the amount of tumor tissue in the FFPE specimen is often very small. Therefore, it is essential to have a robust method to retrieve high quality RNA from FFPE tissue efficiently. There are various commercially available RNA extraction kits for this purpose, but their comparison was not a goal of this study. With our current protocol, similar expression profiles were obtained with RNAs extracted independently from the same paraffin tissue blocks (R2 = 0.93). To prequalify the RNA samples before array analysis, we used a real-time PCR-based method to assess the intactness of the RNA samples (Figure 1) The DASL assay combines the advantages of array-based gene expression analysis with those of multiplexed qPCR,18 thereby offering much higher multiplexing capacity and huge throughput and cost-saving advantages. It uses as little as 50 ng of total RNA to analyze 300 to 400 genes in FFPE samples, ~100-fold less than what is required by qPCR, which usually uses 20 to 50 ng per reaction (per gene). The assay is highly reproducible (see Figure 2 Our results show that we can obtain reproducible gene expression profiles with FFPE samples older than 10 years. 90% of the genes detected in fresh-frozen sample RNA were detected with RNA from matching FFPE samples. Gene expression profiles of the FFPE samples do not exactly correlate with those from the fresh-frozen samples (R2 = 0.69), presumably because of different rates of RNA degradation occurring during the fixation and paraffin embedding process and during storage.9 However, gene expression analysis within FFPE samples should provide a powerful approach to discover molecular signatures associated with a given disease state, or histological or clinical phenotypes. This technology is especially useful for determining cancer prognosis or therapy response, because it allows not only prospective analysis but also retrospective analysis. Using DASL, gene expression analysis can now be performed on routinely stored tumor specimens from patients with known outcomes. Our results showed that characteristic gene expression patterns can be identified in FFPE samples for a particular cancer type (Figure 3) We also demonstrated the utility of this strategy by validating eight tissue and cancer-specific markers identified previously from fresh-frozen samples using Affymetrix GeneChip microarrays. The eight genes were assayed along with other 212 cancer-related genes in 51 fresh-frozen and FFPE samples including 26 breast and 25 colon tissues (Table 1). All four tissue-specific markers were able to correctly identify the tissue of origin with a typical tissue-specific expression pattern, and the cancer specific markers were highly expressed in the tumor samples and had significantly lower levels of expression in the matching normal tissues (data not shown). Furthermore, the marker sensitivity and specificity measured by the array analysis were compared to those determined for qPCR with a subset (N = 36) of the FFPE samples. Overall, the array analysis outperformed qPCR. The DASL assay is a powerful technology for high-throughput expression profiling of hundreds of genes in hundreds to thousands of samples.18 We have now shown that the DASL assay can be applied to clinical FFPE samples, an important source of material that has not been amenable to conventional microarray-based assays. This opens up the possibility of a new generation of microarray-based gene expression assays being applicable not only to routine clinical care but also to the retrospective analysis of paraffin-embedded sample collections obtained during clinical trials or from large population-based cohorts. Supplemental Material
Acknowledgments We thank Philippe Rigault, Lixin Zhou, and Ivan Mikoulitch for assistance with assay probe design and data analysis. Footnotes Address reprint requests to Jian-Bing Fan, Genetic Analysis, Illumina, Inc., 9885 Towne Center Drive, San Diego, CA 92121. E-mail: jfan/at/illumina.com. Supported in part by grant R33 CA88351 from the National Institutes of Health. Supplemental information can be found at http://ajp.amjpathol.org. References
|
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
|||||||||||||||
Nat Biotechnol. 1996 Dec; 14(13):1675-80.
[Nat Biotechnol. 1996]Science. 1999 Oct 15; 286(5439):531-7.
[Science. 1999]Nature. 2000 Aug 17; 406(6797):747-52.
[Nature. 2000]Cancer Res. 2001 Aug 15; 61(16):5974-8.
[Cancer Res. 2001]Cancer Res. 2001 Aug 15; 61(16):5974-8.
[Cancer Res. 2001]J Mol Diagn. 2000 May; 2(2):84-91.
[J Mol Diagn. 2000]Nucleic Acids Res. 2002 Jan 15; 30(2):E4.
[Nucleic Acids Res. 2002]J Mol Diagn. 2000 May; 2(2):84-91.
[J Mol Diagn. 2000]J Mol Diagn. 2003 Feb; 5(1):34-41.
[J Mol Diagn. 2003]Pathobiology. 2000; 68(4-5):202-8.
[Pathobiology. 2000]Genome Res. 2004 May; 14(5):878-85.
[Genome Res. 2004]Methods. 1996 Dec; 10(3):283-8.
[Methods. 1996]Genome Res. 2004 May; 14(5):878-85.
[Genome Res. 2004]Genome Res. 2004 May; 14(5):870-7.
[Genome Res. 2004]Genome Res. 2004 May; 14(5):878-85.
[Genome Res. 2004]Genome Res. 2004 May; 14(5):878-85.
[Genome Res. 2004]Bioinformatics. 2003 Sep 22; 19(14):1832-6.
[Bioinformatics. 2003]Genome Biol. 2002 Aug 30; 3(9):research0048.
[Genome Biol. 2002]Genome Res. 2004 May; 14(5):878-85.
[Genome Res. 2004]Am J Pathol. 2004 Jan; 164(1):35-42.
[Am J Pathol. 2004]Science. 1999 Oct 15; 286(5439):531-7.
[Science. 1999]Science. 1999 Oct 15; 286(5439):531-7.
[Science. 1999]Prostate. 2002 Aug 1; 52(3):245-52.
[Prostate. 2002]Cancer Res. 2003 Aug 1; 63(15):4648-55.
[Cancer Res. 2003]Eur J Cancer. 2001 Jan; 37(2):268-80.
[Eur J Cancer. 2001]Eur J Endocrinol. 2002 Jun; 146(6):813-21.
[Eur J Endocrinol. 2002]Int J Cancer. 2003 Sep 20; 106(5):758-65.
[Int J Cancer. 2003]Science. 1999 Oct 15; 286(5439):531-7.
[Science. 1999]Am J Pathol. 2004 Jan; 164(1):35-42.
[Am J Pathol. 2004]Science. 1999 Oct 15; 286(5439):531-7.
[Science. 1999]Genome Res. 2004 May; 14(5):878-85.
[Genome Res. 2004]Science. 1999 Oct 15; 286(5439):531-7.
[Science. 1999]J Mol Diagn. 2000 May; 2(2):84-91.
[J Mol Diagn. 2000]Genome Res. 2004 May; 14(5):878-85.
[Genome Res. 2004]