Br J Haematol. 2008 Sep; 142(5): 802–807.
PMCID: PMC2654477

An international standardization programme towards the application of gene expression profiling in routine leukaemia diagnostics: the Microarray Innovations in LEukemia study prephase


Gene expression profiling has the potential to enhance current methods for the diagnosis of haematological malignancies. Here, we present data on 204 analyses from an international standardization programme that was conducted in 11 laboratories as a prephase to the Microarray Innovations in LEukemia (MILE) study. Each laboratory prepared two cell line samples, together with three replicate leukaemia patient lysates in two distinct stages: (i) a 5-d course of protocol training, and (ii) independent proficiency testing. Unsupervised, supervised, and r2 correlation analyses demonstrated that microarray analysis can be performed with remarkably high intra-laboratory reproducibility and with comparable quality and reliability.

Keywords: microarray, gene expression profiling, leukaemia, standardization, diagnostics

Several microarray studies have already demonstrated the identification of differentially expressed genes associated with distinct clinical and therapeutically relevant classes of leukaemias (Golub et al, 1999; Armstrong et al, 2002; Schoch et al, 2002; Yeoh et al, 2002). Given that microarray assays analyse the expression of multiple genes in parallel, they appear to be a robust test method for diagnostic usage (Kohlmann et al, 2003, 2005; Haferlach et al, 2005). However, to date, all of these studies aimed at subclassifying leukaemia subtypes through gene expression profiling have been performed mainly as monocentric studies that included only a limited number of patients or using mostly RNA specimens that were predominantly analysed retrospectively from archived samples.

Here we report data from an international study group formed around the European Leukemia Network (ELN, in 11 laboratories: seven from the ELN, three from the United States, and one in Singapore. The so-called Microarray Innovations in LEukemia (MILE) study programme will prospectively assess the clinical accuracy of gene expression profiles of 16 acute and chronic leukaemia subclasses, of myelodysplastic syndromes (MDS), and a “none of the target classes” control group, as compared to current routine diagnostic workup in over 3000 patients. As a first step representing a major effort to standardize the microarray analysis workflow in the participating centres, a prephase of the MILE study was performed. This report presents the results of the prephase, i.e., a standardization programme of the microarray procedure in the participating laboratories in order to ensure a robust gene expression profiling test performance before patient samples were analysed.

Materials and methods

There were two stages in the MILE prephase study: protocol training and proficiency testing. As part of the initial protocol training each participating laboratory was provided with identical equipment, including reagent kits, enzymes, spectrophotometer, and heat block instruments, and eight microarray experiments were performed at each centre with an on-site trainer in the respective laboratory being trained. The eight samples analysed during the training course were represented by MCF-7 (breast adenocarcinoma) and HepG2 (liver carcinoma) cell line total RNA (Ambion, Austin, TX, USA) with 1·0 μg and 5·0 μg input of total RNA, respectively, and four leukaemia patient sample lysates prepared from mononuclear cells obtained after Ficoll density purification. Patient lysates comprised cells of one chronic myeloid leukaemia (CML), one chronic lymphocytic leukaemia (CLL), and two replicate lysates of an AML patient sample (containing a translocation t(8;21), French-American-British (FAB) type M2). The total RNA from the patient lysates was extracted at each centre as part of the training programme, making these samples a test of the entire microarray process workflow post sample acquisition (RNeasy kit, Qiagen, Hilden, Germany). Subsequently, after the training phase and for operator proficiency testing, each laboratory independently performed four microarray experiments each for MCF-7 and HepG2 cell lines with inputs of 1·5 μg, 3·0 μg, 5·0 μg, and 8·0 μg total RNA. In total, 204 microarray profiles were included in the analysis (for details see Appendix SI and SII). The three anonymous replicate patient lysates were provided by the Laboratory for Leukaemia Diagnostics in Munich, Germany. All patients gave their informed consent for participation after having been advised of the purpose and investigational nature of the study. The study design adhered to the tenets of the Declaration of Helsinki and was approved by the ethics committees of the participating institutions before its initiation. Details on the microarray analysis workflow, image analysis, quality reports, as well as statistical methods are given in Appendix SI.


Intra-laboratory reproducibility of gene expression analyses

As shown in an unsupervised Principal Component Analysis (PCA), the individual gene expression profiles grouped closely together with their corresponding biological sample types based on the underlying similarity, but not according to the centre where the microarray experiments were performed (Fig 1). The arrows in Fig 1 indicate that the four leukaemia sample preparations from Centre 9 (N17-20), as well as one HepG2 preparation from Centre 3 (N18) were outliers in the PCA. Large differences in gene expression profiles were also observed with respect to the manufacturing batches for MCF-7 total RNA, but overall, a high level of reproducibility between laboratories was seen when a standardized protocol for microarray analysis was followed by trained operators. According to the unsupervised PCA plots, replicated gene expression profiles of the HepG2 cell line were more biologically homogeneous and not as influenced by manufacturing batch numbers, as seen for MCF-7 cell line replicates. Therefore, replicated profiles of the HepG2 cell line were chosen to further investigate the intra- and inter-laboratory correlations. All centres generated highly reproducible gene expression profiles for this cell line, as shown in the box plot analysis of r2 values from all pairwise comparisons within each centre for the sample type HepG2 (Fig 2A), where mean r2 values range from 0·973 to 0·988. The slightly higher variability at Centre 11 might be explained by a higher number of operators and replicate analyses than in other centres. Figure 2B shows the intra-site repeatability of microarray data based on quantitative signal values and qualitative detection calls. The number of generally detected genes for each sample type at each centre varied from 24 627–27 075 for HepG2 and 25 841–28 953 for MCF-7. The coefficient of variation (CV) of the quantitative signal values between the intra-site replicates was calculated using the generally detected subset of genes for each sample type HepG2 and MCF-7 at each laboratory. The distribution of the replicate CV measures across the set of detected genes is displayed in a series of box plots. The different laboratories demonstrated similar replicate CV median values of 1·962–3·234% for HepG2 and 1·869–2·864% for MCF-7.

Fig 1
Unsupervised principal component analysis (PCA). A total of 204 experiments are included in the three-dimensional PCA and each sphere represents the gene expression profile for a cell line or leukaemia sample. The signal used is DQN1. The first three ...
Fig 2
Analysis of intra- and inter-laboratory reproducibility. (A) Box-and-whisker plots display, for each laboratory, the intra-laboratory squared correlation coefficients (r2) of all probe sets represented on the HG-U133 Plus 2.0 microarray for the HepG2 ...

Inter-laboratory reproducibility of gene expression analyses

As an example of inter-laboratory reproducibility of gene expression analyses, correlations between Centre 3 and all other ten laboratories are given (Fig 2C and D). The degree of correlation was only slightly different to the intra-laboratory reproducibility (Fig 2C). The minimum and maximum mean values were 0·959 and 0·985, respectively. This again demonstrated a high inter-laboratory correlation of HepG2 gene expression profiles and confirms the outstanding performance of microarray analysis in the 11 centres. This high inter-laboratory consistency can be also shown in pairwise scatter plot analyses. The 5·0 μg HepG2 replicate analysis between Centre 3 and other laboratories is shown as an example (Fig 2D). A very tight distribution of gene expression data can be observed along the diagonal line for every paired HepG2 sample. Additional analyses of inter-site correlations for HepG2 subsets across all laboratories, along with hierarchical cluster and principal component analyses, are given in Appendix SI. Furthermore, the online section also contains an analysis of the relative contribution of different sources of both technical and biological variability in gene expression measurements.


Taken together, this study demonstrated that standardizing experimental protocols for microarray analysis and performing a thorough operator training resulted in excellent comparability with respect to both data sets generated within a participating laboratory and across 11 different laboratories in three continents. This extends the observations of a recent across-platform comparison study from the Toxicogenomics Research Consortium (Bammler et al, 2005). In particular, and also noted by Bammler et al (2005), the standardization of RNA labelling protocols using common procedures was recognized as an important contributor to signal intensity correlations across different laboratories. Our study further shows consistent results when compared with the intra-platform precision demonstrated from three different centres in the recent MicroArray Quality Consortia data (Shi et al, 2006).

In conclusion, this standardization effort represented the prerequisite foundation of the first phase of the MILE study, wherein 1889 patients have, thus far, been analysed by whole genome expression microarrays (Haferlach et al, 2006). The protocol devised for sample preparation takes only one working day from cDNA synthesis to cocktail hybridization and is easily applicable in a daily routine setting. The standardization of gene expression profiling testing in this way has the potential to offer identical objective diagnostic results in any trained laboratory throughout the world. Thus, microarrays are getting substantially closer to a routine application of gene expression profiling for the diagnosis of leukaemias in the clinical practice.

Authors’ contributions

AK, LW, TH: design of the study and drafting the article; RL, WML, PMW: statistical analysis and interpretation of data; TJK, LZR, JRD, SAS, KIM, AFG, WKH, GB, MCDO, RF, SC, JDV, SR, PRP, JMH, EL, AEY, ESK: data acquisition, interpretation of data, and article revision. All authors approved the final version submitted for publication.


We would like to acknowledge the technical assistance of Traci Lyn Toy, W. Kent Williams, Letha Phillips, Verena Serbent, Simona Tavolaro, Monica Messina, Julie Tsai, Matt Eaton, Véronique Pantesco, William Overman, Ted Farr, Cecilia S. N. Kwok, Pei Tee Hwan, and Dr. Lu Yi. We further thank Dr. Geertruy te Kronnie, Prof. Marie Christine Béné, Prof. Claude Preudhomme, and Prof. Elizabeth Macintyre for support throughout the conduct of the prephase of the MILE study.


This study is part of the MILE Study (Microarray Innovations In LEukemia) programme, an ongoing collaborative effort headed by the European Leukaemia Network (ELN) and sponsored by Roche Molecular Systems, Inc., addressing gene expression signatures in acute and chronic leukaemias. This work is further partly supported by AIRC (Associazione Italiana per la Ricerca sul Cancro), Milan, Ministero dell’Università e della Ricerca, Fondo per gli Investimenti della Ricerca di Base (FIRB) and COFIN, Rome, Italy.

Conflict of interest

AK, RL, WML, PMW, and LW are employed by Roche Molecular Systems, Inc. and are involved in the AmpliChip Leukaemia Test research programme, a gene expression microarray for the subclassification of leukaemia. TH is a consultant for F. Hoffmann-La Roche Ltd, Basel, Switzerland. The other authors report no potential conflicts of interest.

Supplementary material

The following supplementary material is available for this article online:

Appendix SI. Details on microarray analysis, manufacturing lot numbers of cell lines, and additional information on interlaboratory reproducibility.

Appendix SII. Information on microarray quality parameters.

Appendix SIII. r2 correlation data for MCF-7 cell line data.

Appendix SIV. r2 correlation data for HepG2 cell line data.

Appendix SV. r2 correlation data for leukaemia samples data.

The material is available as part of the online article from:

(This link will take you to the article abstract).

Please note: Blackwell Publishing are not responsible for the content or functionality of any supplementary materials supplied by the authors. Any queries (other than missing material) should be directed to the corresponding author for the article.


  • Armstrong SA, Staunton JE, Silverman LB, Pieters R, Den Boer ML, Minden MD, Sallan SE, Lander ES, Golub TR, Korsmeyer SJ. MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia. Nature Genetics. 2002;30:41–47. [PubMed]
  • Bammler T, Beyer RP, Bhattacharya S, Boorman GA, Boyles A, Bradford BU, Bumgarner RE, Bushel PR, Chaturvedi K, Choi D, Cunningham ML, Deng S, Dressman HK, Fannin RD, Farin FM, Freedman JH, Fry RC, Harper A, Humble MC, Hurban P, Kavanagh TJ, Kaufmann WK, Kerr KF, Jing L, Lapidus JA, Lasarev MR, Li J, Li YJ, Lobenhofer EK, Lu X, Malek RL, Milton S, Nagalla SR, O’malley JP, Palmer VS, Pattee P, Paules RS, Perou CM, Phillips K, Qin LX, Qiu Y, Quigley SD, Rodland M, Rusyn I, Samson LD, Schwartz DA, Shi Y, Shin JL, Sieber SO, Slifer S, Speer MC, Spencer PS, Sproles DI, Swenberg JA, Suk WA, Sullivan RC, Tian R, Tennant RW, Todd SA, Tucker CJ, Van Houten B, Weis BK, Xuan S, Zarbl H. Standardizing global gene expression analysis between laboratories and across platforms. Nature Methods. 2005;2:351–356. [PubMed]
  • Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, Bloomfield CD, Lander ES. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science. 1999;286:531–537. [PubMed]
  • Haferlach T, Kohlmann A, Schnittger S, Dugas M, Hiddemann W, Kern W, Schoch C. Global approach to the diagnosis of leukemia using gene expression profiling. Blood. 2005;106:1189–1198. [PubMed]
  • Haferlach T, Kohlmann A, Basso G, Bene MC, Downing J, Shurtleff S, Hernandez JM, Hofmann WK, Kipps TJ, Kronnie TT, Liu WM, Li R, Macintyre E, Preudhomme C, Chiaretti S, Rassenti L, de Vos J, Yeoh A, Brown C, Williams M, Mills K, Wieczorek L, Foa R. An international multi-center study to define the application of microarrays in the diagnosis and subclassification of leukemia (MILE study): interim analysis based on 1,889 patients achieves 95.4% prediction accuracy. Blood. 2006;108:34A–35A.
  • Kohlmann A, Schoch C, Schnittger S, Dugas M, Hiddemann W, Kern W, Haferlach T. Molecular characterization of acute leukemias by use of microarray technology. Genes, Chromosomes & Cancer. 2003;37:396–405. [PubMed]
  • Kohlmann A, Schoch C, Dugas M, Rauhut S, Weninger F, Schnittger S, Kern W, Haferlach T. Pattern robustness of diagnostic gene expression signatures in leukemia. Genes, Chromosomes & Cancer. 2005;42:299–307. [PubMed]
  • Schoch C, Kohlmann A, Schnittger S, Brors B, Dugas M, Mergenthaler S, Kern W, Hiddemann W, Eils R, Haferlach T. Acute myeloid leukemias with reciprocal rearrangements can be distinguished by specific gene expression profiles. Proceedings of the National Academy of Sciences of the United States of America. 2002;99:10008–10013. [PMC free article] [PubMed]
  • Shi L, Reid LH, Jones WD, Shippy R, Warrington JA, Baker SC, Collins PJ, de Longueville F, Kawasaki ES, Lee KY, Luo Y, Sun YA, Willey JC, Setterquist RA, Fischer GM, Tong W, Dragan YP, Dix DJ, Frueh FW, Goodsaid FM, Herman D, Jensen RV, Johnson CD, Lobenhofer EK, Puri RK, Scherf U, Thierry-Mieg J, Wang C, Wilson M, Wolber PK, Zhang L, Amur S, Bao W, Barbacioru CC, Lucas AB, Bertholet V, Boysen C, Bromley B, Brown D, Brunner A, Canales R, Cao XM, Cebula TA, Chen JJ, Cheng J, Chu TM, Chudin E, Corson J, Corton JC, Croner LJ, Davies C, Davison TS, Delenstarr G, Deng X, Dorris D, Eklund AC, Fan XH, Fang H, Fulmer-Smentek S, Fuscoe JC, Gallagher K, Ge W, Guo L, Guo X, Hager J, Haje PK, Han J, Han T, Harbottle HC, Harris SC, Hatchwell E, Hauser CA, Hester S, Hong H, Hurban P, Jackson SA, Ji H, Knight CR, Kuo WP, Leclerc JE, Levy S, Li QZ, Liu C, Liu Y, Lombardi MJ, Ma Y, Magnuson SR, Maqsodi B, McDaniel T, Mei N, Myklebost O, Ning B, Novoradovskaya N, Orr MS, Osborn TW, Papallo A, Patterson TA, Perkins RG, Peters EH, Peterson R, Philips KL, Pine PS, Pusztai L, Qian F, Ren H, Rosen M, Rosenzweig BA, Samaha RR, Schena M, Schroth GP, Shchegrova S, Smith DD, Staedtler F, Su Z, Sun H, Szallasi Z, Tezak Z, Thierry-Mieg D, Thompson KL, Tikhonova I, Turpaz Y, Vallanat B, Van C, Walker SJ, Wang SJ, Wang Y, Wolfinger R, Wong A, Wu J, Xiao C, Xie Q, Xu J, Yang W, Zhang L, Zhong S, Zong Y, Slikker W., Jr The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements. Nature Biotechnology. 2006;24:1151–1161. [PMC free article] [PubMed]
  • Yeoh EJ, Ross ME, Shurtleff SA, Williams WK, Patel D, Mahfouz R, Behm FG, Raimondi SC, Relling MV, Patel A, Cheng C, Campana D, Wilkins D, Zhou X, Li J, Liu H, Pui CH, Evans WE, Naeve C, Wong L, Downing JR. Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling. Cancer Cell. 2002;1:133–143. [PubMed]

Articles from British Journal of Haematology are provided here courtesy of Wiley-Blackwell, John Wiley & Sons
PubReader format: click here to try


Save items

Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...


  • BioProject
    BioProject links
  • GEO DataSets
    GEO DataSets
    Gene expression and molecular abundance data reported in the current articles that are also included in the curated Gene Expression Omnibus (GEO) DataSets.
  • MedGen
    Related information in MedGen
  • PubMed
    PubMed citations for these articles
  • Substance
    PubChem chemical substance records that cite the current articles. These references are taken from those provided on submitted PubChem chemical substance records.

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...