Logo of pnasPNASInfo for AuthorsSubscriptionsAboutThis Article
Proc Natl Acad Sci U S A. 2009 May 5; 106(18): 7559–7564.
Published online 2009 Apr 17. doi:  10.1073/pnas.0811829106
PMCID: PMC2670243
Statistics, Microbiology

Statistical estimation of cell-cycle progression and lineage commitment in Plasmodium falciparum reveals a homogeneous pattern of transcription in ex vivo culture


We have cultured Plasmodium falciparum directly from the blood of infected individuals to examine patterns of mature-stage gene expression in patient isolates. Analysis of the transcriptome of P. falciparum is complicated by the highly periodic nature of gene expression because small variations in the stage of parasite development between samples can lead to an apparent difference in gene expression values. To address this issue, we have developed statistical likelihood-based methods to estimate cell cycle progression and commitment to asexual or sexual development lineages in our samples based on microscopy and gene expression patterns. In cases subsequently matched for temporal development, we find that transcriptional patterns in ex vivo culture display little variation across patients with diverse clinical profiles and closely resemble transcriptional profiles that occur in vitro. These statistical methods, available to the research community, assist in the design and interpretation of P. falciparum expression profiling experiments where it is difficult to separate true differential expression from cell-cycle dependent expression. We reanalyze an existing dataset of in vivo patient expression profiles and conclude that previously observed discrete variation is consistent with the commitment of a varying proportion of the parasite population to the sexual development lineage.

Keywords: malaria, microarray

Plasmodium falciparum is the most virulent of the human malaria parasites and is responsible for the vast majority of malaria-specific mortality. Infection with this organism results in a wide range of outcomes from asymptomatic carriage through mild disease to life-threatening illness. This spectrum of response is due in part to the preexisting level of clinical immunity induced by repeated previous infection, to human genetic and environmental factors, and to differences in parasite virulence. One way to approach the analysis of parasite-specific factors in disease severity is to compare the transcriptional profiles of parasites taken directly from patients with different clinical presentations. Several such analyses have already been carried out but with conflicting conclusions (14).

Some studies have analyzed the global transcriptional profile of P. falciparum in synchronized in vitro culture (57), revealing a highly unusual pattern of gene expression in which >80% of genes are transcribed in a wavelike pattern, with a single maximum and a single minimum within the cell cycle. This pattern of periodic expression is conserved among clones of diverse geographic origin, and relatively few genes (< 50) show substantial phase shifts between isolates (7). When only a single time point is observed in a microarray experiment, asynchrony between different samples can therefore introduce a systematic difference in the relative gene expression levels, decreasing the statistical power of detecting differential gene expression. This challenge can be addressed experimentally by using artificial synchronization methods such as sorbitol, thermocycling, density separation, or magnetic methods (811), but such methods are limited to in vitro cultures.

The present analysis addresses these issues in 3 complementary ways. First, we develop and validate a likelihood-based statistical framework for estimating parasite developmental age [hours post invasion (HPI)] in the cell cycle, using gene expression values and, where available, morphological data. Second, we apply this method to mature stage parasites cultured directly from patient isolates, to compare their gene expression profiles. Finally, we extend this framework to include estimates of proportions of sexually committed parasites in the samples. We find that in cases matched for temporal development, transcriptional patterns display little variation across a set of patients with diverse symptoms of malaria. The relationship between our findings and those of other groups is discussed.


Statistical Method.

We develop a statistical method based on maximum likelihood to estimate the most probable age (HPI) for a sample of unknown cell-cycle progression. The highly periodic nature of gene expression in P. falciparum facilitates this procedure because the vast majority of genes show strong coexpression. In this way, as the number of genes measured increases, the log-likelihood concentrates around the time that best describes the coexpression of genes, and the uncertainty of the estimate decreases accordingly (for a complete discussion, see Materials and Methods). Conceptually, the log-likelihood can be thought of as a weighted distance measure between the test sample and some reference set, with the weights determined by variation between biological samples. The likelihood-based method offers considerable advantages over other methods to match up microarray experiments, such as correlation-based approach used by PlasmoDB (www.plasmodb.org), and the neural-network approach of Scholz and Fraunholz (12). In addition to formal confidence intervals and a probabilistic interpretation of results, the likelihood framework allows for more complex modeling extensions, including the mixture model discussed below and a more general Bayesian analysis where prior information on temporal development, such as microscopy data, can be included.

Validation of Statistical Method.

To validate this approach, we computed maximum likelihood estimates for the samples in published datasets, whose true developmental age was known (Fig. S1). This approach accurately computes the HPI of 2 additional clones of P. falciparum, the sequenced reference clone 3D7 and the chloroquine-sensitive clone Dd2 (7), on glass slide arrays and the stage-specific expression profiling studies conducted by Le Roch et al. (6), using Affymetrix arrays. Furthermore, likelihood estimates for an entire transcriptome are robust to changes in expression levels of small numbers of genes. Analysis of tetracycline-treated samples, which specifically under-express apicoplast genes at later time points (13) yielded similar HPI estimates as the untreated controls.

The estimates for the cultures from Le Roch et al. showed a small but reproducible difference between the sample ages of cultures synchronized by sorbitol and those synchronized by thermocycling. Parasites in the thermocycling group appeared slightly younger in our analysis, which was supported by small differences in size in the blood smears [see supplementary figures in Le Roch et al. (6)].

Analysis of Ex Vivo Cultured Patient Isolates.

We performed microarray analysis on P. falciparum sampled from patients with diverse symptoms of malaria and grown in culture until maturity. The aim of this study was to pilot the analysis of gene expression data in parasites taken from patients with a range of clinical presentations as a means of identifying parasite specific virulence factors. Initial inspection of the array data for 23 patient isolates revealed that 5 of the samples had hybridization intensities that were too low for reliable normalization (Fig. S2), and 1 showed a ghost image of fluorescence on the array. These were therefore excluded from further analysis.

Examination of the remaining dataset revealed variability in gene expression patterns, but we were unable to identify any significant differences in the expression patterns of individual genes relating to any of the clinical parameters measured in the study (Table S1), using conventional microarray statistical methods (14). We grouped the samples in the dataset, using hierarchical clustering (15) and nonnegative matrix factorization (NMF)-based clustering (2, 16), and again did not find any significant associations between clustering and clinical parameters (Fig. S3). To determine whether the variability we were observing within the dataset was due to subtle differences in sample age, we estimated the age of each sample by an extensive morphometric analysis of the thin films and applied the approach outlined above to compute log-likelihood curves and maximum likelihood estimates for each sample (Fig. 1 and Fig. S4). For the morphometric analysis we first obtained Giemsa-stained thin blood films of a highly synchronous 3D7 culture taken at intervals over the 48-h period after invasion as a standard. Using image analysis software, we then measured parasite area as a proxy for age and observed that the relationship between parasite area and time was best fitted by an exponential relationship beginning at 20 HPI (Fig. S5). We also took photomicrographs of each of the 3D7 time course samples and used these as a reference to make an independent visual estimate of the age of the field samples.

Fig. 1.
Estimating parasite age. (A) Estimates of parasite age from gene expression based on a reference set correlate with measurements of parasite area. (B) The maximum likelihood estimates and 95% confidence intervals for the samples in this study are shown. ...

All methods were in good agreement (Fig. 1 and Fig. S4). These data showed that the ages of ex vivo samples were distributed over the interval 20–44 HPI (Fig. 1B). The log-likelihood curves for the set of ex vivo samples are shown as overlaid curves and as a heatmap (Fig. 1 C and D). In the heatmap, each row represents the log-likelihood function for a sample, and the ordering of log-likelihood curves by maximum HPI reveals a continuous distribution of sample ages in the set. When we mapped the estimates of sample age onto the major branchings from hierarchical clustering (Fig. S3C), or the cluster membership for k = 2, using NMF clusters (membership was identical in the 2 methods) we found a strong association between sample age and cluster membership (P < 0.001, 2-sided t test).

In Fig. S3D, we show the pairwise correlations between the expression profiles measured ex vivo and previously published (17) late schizont profiles from the 3D7 clone grown in culture and measured on the same array. The pairwise correlation matrix revealed that samples in our study correlated highly with one another (0.86–0.97) and with late schizont in vitro profiles (0.74–0.94) ordered by maximum likelihood estimate of HPI. In the pairwise correlations with late schizont in vitro samples there is a continuous gradient toward increasing correlation with late schizonts as sample age increases. This analysis suggests that the transcriptional differences that we observe are due to the difficulty of obtaining a highly synchronous culture of mature stages from patients. Our results indicate an absence of large-scale variation between the expression profiles of mature parasites grown directly from patients with diverse clinical presentations and those of laboratory clones adapted for growth in vitro.

This pilot study was designed to address the feasibility of identifying genes differentially expressed in parasites from mild and severe disease. However, with the available sample size remaining after matching samples for parasite age, no such differences could be detected. As an alternative, we sought to identify genes that were differentially expressed in all field isolates, when compared with time-matched published in vitro profiles (17). Between these 2 groups, we identified 278 genes, representing ≈5% of the genome, as differentially expressed [P ≤ 0.001, using the Benjamini-Yekutieli multiple test correction (18)] (Fig. S6A and Table S2). As expected, the list of down-regulated genes was enriched in highly polymorphic genes [14/132 had SNP density >4/KB (Fisher's exact test P value <0.01)]. We have therefore marked these genes with an asterisk in Table S2. To assay whether any functionally related groupings of genes were up- or down-regulated, we searched this list for enrichment of gene ontology (GO) terms (Table S2). The list of up-regulated genes was enriched for terms involved in DNA replication, protein catabolism and ubiquitin-dependent degradation, whereas no enrichment was evident in the list of down-regulated genes. After a separate analysis of the variant surface protein families, we noted the significant up-regulation of a single rif gene (19) in 9 of the 17 patient samples, PFL2585c. This result was validated for 5 of the patient samples by real time PCR (Fig. S6B). PFL2585c must be relatively conserved in at least some patient isolates in our study and it appears to be consistently over-expressed compared with other members of the family. We plotted the raw fluorescence data for each of the probes in the probe set, which span the length of the transcript from the 5′ to the 3′ end (Fig. S6 C and D). The strong expression signal is nonuniformly distributed across the transcript, suggesting that some regions of PFL2585c are more conserved than others. Overall, the expression differences we did detect between asexual growth in field isolates and laboratory isolates are subtle and the majority of variation within our sample was due to slight differences in isolate maturity rather than large-scale shifts in global transcriptional profile.

Analysis of Published In Vivo Expression Profiles.

In a recent study of parasite gene expression from patients with malaria, Daily et al. (2) have identified major alterations in transcriptional patterns in vivo such that they were able to identify 3 different transcriptional states, 2 of which were markedly different from published in vitro data. Because our data did not reveal any such changes it was important to conduct a detailed comparison between the studies. Since asynchronous cell-cycle progression had confounded the analysis of our ex vivo samples, we computed maximum likelihood estimates of sample age for their dataset and found all of the samples to be 8–12 HPI, consistent with their report that only ring forms were present in the blood smears (2) (Fig. 2A). The maximum likelihood estimates were also distributed across this range in all 3 clusters, suggesting that their clustering was not associated with differences in cell-cycle progression.

Fig. 2.
Maximum likelihood estimates (A) and log-likelihood curves (B) for samples from Daily et al. (2). (A) The samples all have maxima within the ring portion of the parasite lifecycle. The clusters do not appear to be associated with different sample age. ...

The variation identified by Daily et al. (2) was, however, immediately apparent from differences in the shapes of the likelihood curves (Fig. 2B). The shape of the entire likelihood curve is itself informative and can be useful for studying the biological properties of the test samples. In our analysis, the variation in the Daily et al. (2) samples presents as a gradient between strong and weak peaks, with stronger peaks associated with a greater similarity with the in vitro reference set. The secondary peak at ≈30 HPI is striking because Daily et al. (2) report that only ring forms were present in blood smears from their samples and parasites older than 18–20 HPI are sequestered in the deep vasculature and do not circulate. One potential explanation is that mature forms indeed circulate at very low levels and mix with the observed ring stages in varying proportions. Were this to be the case, however, one would expect to see a continuous distribution of secondary peaks overlaid across the development cycle. A second possibility is that differences in synchrony between circulating populations of similar age could account for the observed differences. We excluded this by simulating mixture samples of varying synchrony and found that, consistent with theory, asynchrony between samples of similar age increases the variance of the prediction, but does not lead to bias. Daily et al. (2) suggest a third explanation when they note the weak similarity of cluster 1 to profiles from the sexual life cycle. To address this possibility, we applied our temporal estimation method (using the asexual reference set) to the in vitro gametocyte development data to establish the apparent asexual age of developing gametocytes. This shows a strong peak at 30 HPI, which could not have been due to contaminating trophozoites as it persisted over 13 consecutive days (Fig. S1E). We therefore hypothesized that the peak observed at 30 HPI might be due to the presence of a proportion of parasites committed to sexual development, but at an early point where they would be morphologically indistinguishable from ring stages.

To investigate this, we used 2 statistical techniques, principal components analysis (PCA) and mixture modeling. We projected both sets of samples into the first 2 principal components of the gametocyte development space, observing a grouping of points (i.e., samples) for the ex vivo data that centered around late schizont samples computed on the same array (Fig. 2C). Conversely, we observed a continuous progression of samples for the in vivo data from Daily et al. (2) (Fig. 2D). Unlike the ex vivo cultured samples, the distribution of these points did not overlap with the distribution of points from time- and array-matched in vitro ring-stage samples. Instead, the distribution of points mapped out a continuous projection further into the gametocyte development space. Cluster 3 samples were approximately distributed at the center of this progression while cluster 1 samples projected furthest into the space.

As an alternative analytical approach, we extended the likelihood framework to include parasites undergoing sexual development. This yielded a 2-parameter estimate for each sample: the estimated hours after invasion along the asexual development cycle and the fraction of mRNA attributed to gametocyte expression patterns. We note that because the relative mRNA expression of sexual and asexual cells is unknown, the estimate of α corresponds to total fraction of mRNA rather than a true proportion of sexual cells. To validate this model, we applied it to 3 reference time courses from Young et al. (20), which represent a mixture of sexual and asexual stages in proportions subsequently measured by light and fluorescence microscopy. In all 3 cases, the model yielded estimates of gametocyte mRNA fraction, α, which were highly correlated with the measured percentage of gametocytes in the culture (Fig. S7). We then compared the 2-parameter maximum likelihood estimates from our study to those obtained for the samples of Daily et al. (2) (Fig. 2E). Samples hybridized in our study appear as late stage asexual parasites with small, but nonzero, contributions from sexual stages (mean = 0.13, SD = 0.07, range = 0.03–0.23). Samples from the Daily et al. (2) study appear as mixtures of early stage asexual parasites with continuously varying proportions of gametocyte mRNA (mean = 0.29, SD = 0.20, range = 0.03–0.58). In each case, the expression of known and putative gametocyte marker genes was associated with increasing values of α (Fig. S7 D–G). As a control, in vitro asexual stages (6, 17) and in vitro gametocytes (20) are plotted in the same space. The estimated fraction of the culture developing sexually is strongly associated with cluster membership from the Daily et al. (2) study (F test, P < 0.0001). In both studies, the samples present as mixtures of asexual and sexual parasites; however, in the study from Daily et al. (2), there appear to be much higher contributions from sexual-stage mRNA. Taken together, the results of principal components analysis, 2-parameter maximum likelihood estimation, and the expression of gametocyte marker genes provide strong evidence of continuous variation in the level of sexual development in the samples from Daily et al. (2) and from our study.


The periodic pattern of most of the genes in the P. falciparum transcriptome can introduce artificial variability in asynchronous samples, leading to potentially spurious conclusions about differential gene expression. We have come across this issue in attempting to analyze ex vivo isolates taken from the blood of infected individuals. As a result, we have developed straightforward but powerful statistical methods to estimate the cell-cycle progression given expression data from a single time point. When applied to published datasets, these methods provide accurate estimates of sample age, which are robust to differences in array type and specific changes in the expression of a subset of genes. In this study, sample ages of ex vivo parasites were accurately measured by our method as evidenced by their close agreement with independent morphological data.

In analyzing our ex vivo parasite data, application of this method clarified that much of the observed variation is due to subtle changes in parasite development, rather than large-scale transcriptional shifts. In time-matched comparisons, the data appear to have a transcriptional pattern that resembles the gene expression profiles of corresponding trophozoite- and schizont-stage parasites cultured in vitro. We were not able to detect any transcriptional differences between severe and mild malaria, but given the small sample size that is further reduced by the necessity to match parasites for developmental age, this is not surprising. When comparing all of the ex vivo data of appropriate age with aged matched in vitro culture data, we were, however, able to identify a significant number of genes that were up- or down-regulated in field isolates. Determining whether this reflects genuine variation in the field, or whether it reflects adaptation of parasites to in vitro culture will require further experiments.

We have also identified a single member of the rif gene family that shows a strong signal relative to the other members of variant surface proteins in our sample, due either to overexpression, conservation in primary sequence between isolates, or both. The possible nonuniform patterns of conservation along the length of the gene present the intriguing possibility that portions of this protein are functionally constrained.

When applied to a dataset of in vivo patient isolates, our method for estimating cell-cycle progression confirms that all samples have similar developmental ages, suggesting that temporal asynchrony is not the cause of the observed transcriptional variation. After investigating unusual results in the likelihood curves, we noted, as did the authors (2), patterns that resemble gametocyte expression. We explored this further, using principal components analysis, the results of which suggested that sexual development may have been an important component of gene expression for both groups of samples. We therefore extended the likelihood framework to consider each patient infection as a mixture of varying proportions of parasites committed to sexual and asexual developmental lineages. This analysis revealed that both groups of samples displayed a continuous gradient of sexual gene expression signals, although the overall signal in the in vivo samples was much stronger than that of the ex vivo cultured samples. We can see 3 possible explanations for this difference. First, there were genetic, geographic and environmental differences between the studies. Second, it is possible that the culture conditions used in our study suppressed the maturation of sexual stages. Finally, it may be that the studies indeed measure similar underlying proportions of cellular populations, but that the relative signal of sexual mRNA is overshadowed by the dramatic increase in mRNA production by trophozoite and schizont stages. An additional reason why we may observe continuous variation whereas Daily et al. (2) observe discrete clusters is that samples in their cluster 3 are positively associated (P < 0.0001, Fisher's exact test) with hybridization intensity distributions of lower variance (Fig. S8).

We and Daily et al. (2) both note, however, that the pattern of gene expression observed is not completely consistent with the published gametocyte development expression profile. One problem with looking for signatures of early gametocyte expression is that in the gametocyte developmental series, early gametocytes are contaminated by remaining asexual parasites. Similarly, at least in the 3D7 asexual development dataset, the high rate at which this parasite commits to gametocytes means that contaminating early stages of gametocytes are likely to be present.

The similarity in transcription pattern between 30 HPI asexual parasites and gametocytes offers an interesting view into the mechanisms of transcriptional regulation and control in P. falciparum. It has recently been suggested that periodic gene expression in P. falciparum is controlled by a series of AP2-like transcription factors (21) that each regulate a subset of genes at succeeding stages within the cell-cycle. The asexual genes also expressed in gametocytes are likely to be expressed under the same transcriptional control mechanisms as those expressed at or ≈30 HPI. Our results might therefore imply that during gametocytogenesis, a single asexual AP2-like transcription factor is used. The reuse of the late-trophozoite/early-schizont biological machinery by sexually developing parasites suggests an economical strategy in which the same biological equipment is deployed in multiple contexts.

Overall, we find that analyzing gene expression from P. falciparum in the context of progression through the cell-cycle and commitment to diverging sexual and asexual lineages is a useful paradigm for thinking about clinical and experimental malaria. We have shown that the information contained in a large number of genes can provide reliable and robust estimates of asexual stage and sexual commitment, although the information from a single gene would be insufficient. Estimates of cell-cycle progression and lineage commitment can provide an important measure of quality control for technical and biological variability and offer a unique perspective on the interpretation of expression profiles. Such a view can clarify experimental results by removing the confounding effects of asynchronous progression, simplify the identification of differentially expressed genes and provide valuable insight into the architecture of transcriptional control in the malaria parasite.

Materials and Methods


Ex vivo culture and RNA preparation.

Approximately 3 mL of blood was collected from patients who presented with symptoms of mild or severe malaria to the Royal Victoria Teaching Hospital and the MRC Fajara clinical facility in the Greater Banjul Area of The Gambia. The study was approved by the joint Gambian Government and Medical Research Council Ethics Committee, and written informed consent was obtained from each child's parent or guardian before enrollment. Nineteen samples were collected in the 2005 annual malaria season and 4 were collected in the 2006 annual malaria season. Red cells were placed directly into culture for between 24 and 48 h in a candle jar at 37 °C until Giemsa stained smears showed that they had matured to the schizont stage. Red cells were then harvested and resuspended at 50% haematocrit and immediately mixed with 4 volumes of TRIzol ReagentTM (Invitrogen). Aliquots were stored at −80 °C for subsequent RNA extraction using RNeasy MicroTM (Qiagen).

RT-PCR and microarray analysis.

RNA was reverse-transcribed with oligo(dT), using TaqMan reverse transcription reagents (Applied Biosystems). For real-time PCR-based transcript quantification, 1/10 of the cDNA was used in a fluorogenic 5′ nuclease assay (TaqMan chemistry) on a Rotor-Gene 3000 (Corbett Life Sciences). The primers and probe for PFL2585c were as follows: forward, 5′- ACTGTTGGATTTTTCAGCACAATTGTT-3′; reverse, 5′-TCTGAATACCAGCTTCTACAGAAACTTTT-3′; and probe, 5′-CTGCTGCAAAACAAG-3′. As a positive control, ama1 was also evaluated and the primers were as follows: forward, 5′-GGATTATGGGTCGATGGAAATTGTG-3′; reverse, 5′-CATAATCTGTTAAATGTTGTTCATATTGTTTAGGTTGAT-3′; and probe, 5′-CCGAAGCACTCAATTCA-3′. Each run included 3D7 genomic DNA standards and each sample was run in duplicate.

For microarray analysis, cRNA was synthesized, purified, and labeled using the GeneChip IVT labeling kit as recommended by Affymetrix. Hybridization to Affymetrix PFSANGER arrays [www.affymetrix.com, as per Cortes et al. (17)] took place at 45 °C for 16 h under rotation at 60rpm. Arrays were washed on an Affymetrix FS450 and scanned using an Affymetrix GeneChip Scanner 3000 7G. Fluorescence intensities were background adjusted, quantile normalized and converted to expression values, using the robust multiarray averaging algorithm [RMA (22)] in R/Bioconductor (ref. 22; software packages: Affy, multtest, limma). Quality control measures for array preparation can be found in Fig. S2. Processed and raw data were deposited in the ArrayExpress repository with accession no. E-TABM-591.

In vitro culture.

Parasites were cultured by using the method of Trager and Jensen (23). Highly synchronous populations were obtained by 3 rounds of successive sorbitol synchronization (8). In developing the reference set for parasite size during the erythrocyte development cycle, cultures were monitored for more than 1 complete development cycle (for 48 h) and Giemsa-stained thin films were made every 3 h.


The image acquisition and processing was done on a digital imaging station from the Microimaging Applications Group, and the analysis was done in Image Pro Plus version 6.1. The area of >500 parasites per slide was counted. For each Giemsa-stained thin blood smear, >100 digital images were acquired at 100X magnification and analyzed using an automated method that detected positively staining parasites by setting separate intensity thresholds in individual RGB color channels, until positively staining parasites were identified relative to background. This was done in a calibration step for each batch of slides, until the software definition of positive stained visually agreed with the operator's definition. Calibration was done separately for each batch of Giemsa-stained slides as intensity values varied slightly depending on darkness of each batch of stained slides. The automatic counting method was validated with manual measurements of area, demonstrating a highly reproducible measurement of total area stained by Giemsa for parasites that have grown beyond ring stages (details in Figs. S4 and S5).

Statistical Methods.

Temporal estimation.

We develop a maximum likelihood method for estimating the cell cycle progression of P. falciparum parasites of unknown HPI, which we refer to as the test set, y. For each sample, the probability of parasite age given the transcriptome data for a single gene is computed with respect to the hourly gene expression data for the HB3 strain collected by Bozdech et al. (5), which we refer to as the reference set, xg(t), for gene g at time t. The observed reference set is assumed to be a noisy realization of a smooth and periodic true reference set, μg(t), such that xg(t) = μg(t) + ηg(t), where ηg(t) ≈ N(0,ση2). This true reference, μg(t), is estimated using smoothing splines (24, 25), yielding [mu]g(t), the residuals of which are used to estimate [sigma with hat]η2. It is further assumed that the test set, y, is at some stage, t*, in the HPI such that yg = xg(t*) + εg where εgN(0,σε2). The assumption of Gaussian noise closely conforms to previous biological findings (26), with σε2 estimated, using differences across the 3 clones measured by Llinás et al. (7).

Because yg = xg(t*) + εg and xg(t*) = μg(t*) + ηg(t*), then in general, yg[mu]g(θ) ≈ εg + ηg(θ). Therefore,

equation image

where ϕ denotes the density function for the standard Normal. The sample likelihood is the product of the densities over all genes,

equation image

which is represented here as the log-likelihood, [ell](θ). The maximum likelihood estimate (MLE), [theta w/ hat], for sample age is used to estimate t*. The MLE is the value of θ for which [ell](θ) is maximal. Confidence intervals for t* are computed using subsampling to provide robust estimates (27, 28).

This approach can, in theory, be extended to reliably detect transcriptional differences between a test sample and a reference set for a given gene. Briefly, the observed value is compared with the expected value from the time-matched reference to obtain a residual, with larger residuals suggesting meaningful differential expression. Although such an approach is useful for selecting candidate genes for further analysis, it is generally insufficient to identify true transcriptional differences on its own due to the distribution of observed expression values for individual genes that depend on biological variability and array noise. Members of our research team are currently exploring this and related statistical methods for the robust and reliable identification of differentially expressed single genes.

Mixture Model.

We construct a mixture model in which each patient infection is considered as a mixture of some proportion of parasites undergoing sexual development (α [set membership] [0,1]) and another proportion of parasites (1 − α) undergoing asexual development. Formally, this is represented as yg = (1 − α)xg(t*) + (α)zg + εg′, where zg is the gene expression value for a sexual stage gene, α is the coefficient of mixing, and εg′ is the associated error term. Because there are no available measurements for the unidentified form of sexually committed rings (29), we use the expression profiles of early stage gametocytes as a proxy for committed gametocytes, zg. The most complete expression profile of early gametocyte stages without contaminants from asexual stages is day 4 of the gametocyte time course obtained by Young et al. (20). We note that the days of the Young et al. time course consisting of 100% gametocytes (days 4–12) are highly correlated, and results vary little using different sexual reference strains. Because the reference set used above was measured using a different array type, we use the sorbitol-synchronized data from Le Roch et al. to (6) estimate progression through the asexual cycle, as outlined above. The error term, [epsilon with circumflex]g′, is estimated using differences between expression profiles of the sorbitol- and thermocycling-synchronized datasets from Le Roch et al. (6) and the residuals from smoothing, as above. The log-likelihood was subsequently evaluated over a grid of mixtures for varying fraction of gametocyte expression, α, and time estimate, t*, yielding a 2-parameter estimate for each sample: left angle bracket[alpha],[theta w/ hat]right angle bracket. Confidence regions were computed using subsampling (Fig. S8)

Supplementary Material

Supporting Information:


We thank the malaria patients and their parents for participation in this study; clinical, laboratory, and field staff at each of the health facilities who supported the research; Drs. David Baker and David Warhurst for their expert advice on gametocyte morphology; and the Computational Biological Research Group and Dr. Simon McGowan for database support. This work was supported by a Wellcome Trust grant (to C.I.N.), a Medical Research Council grant (to D.C.), the Rhodes Trust (J.E.L. and A.F.), a Wellcome Trust Vacation Scholarship 2008 (F.D.).


The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

Data deposition: The data reported in this paper have been deposited in the ArrayExpress repository, www.ebi.ac.uk/microarray-as/ae (accession no. E-TABM-591).

This article contains supporting information online at www.pnas.org/cgi/content/full/0811829106/DCSupplemental.


1. Daily JP, et al. In vivo transcriptome of Plasmodium falciparum reveals overexpression of transcripts that encode surface proteins. J Infect Dis. 2005;191:1196–1203. [PMC free article] [PubMed]
2. Daily JP, et al. Distinct physiological states of Plasmodium falciparum in malaria-infected patients. Nature. 2007;450:1091–1095. [PubMed]
3. Siau A, et al. Whole-transcriptome analysis of Plasmodium falciparum field isolates: Identification of new pathogenicity factors. J Infect Dis. 2007;196:1603–1612. [PubMed]
4. Bozdech Z, et al. The transcriptome of Plasmodium vivax reveals divergence and diversity of transcriptional regulation in malaria parasites. Proc Natl Acad Sci USA. 2008;105:16290–16295. [PMC free article] [PubMed]
5. Bozdech Z, et al. The transcriptome of the intraerythrocytic developmental cycle of Plasmodium falciparum. PLoS Biol. 2003;1:E5. [PMC free article] [PubMed]
6. Le Roch KG, et al. Discovery of gene function by expression profiling of the malaria parasite life cycle. Science. 2003;301:1503–1508. [PubMed]
7. Llinas M, Bozdech Z, Wong ED, Adai AT, DeRisi JL. Comparative whole genome transcriptome analysis of three Plasmodium falciparum strains. Nucleic Acids Res. 2006;34:1166–1173. [PMC free article] [PubMed]
8. Lambros C, Vanderberg JP. Synchronization of Plasmodium falciparum erythrocytic stages in culture. J Parasitol. 1979;65:418–420. [PubMed]
9. Kwiatkowski D. Febrile temperatures can synchronize the growth of Plasmodium falciparum in vitro. J Exp Med. 1989;169:357–361. [PMC free article] [PubMed]
10. Pasvol G, Wilson RJ, Smalley ME, Brown J. Separation of viable schizont-infected red cells of Plasmodium falciparum from human blood. Ann Trop Med Parasitol. 1978;72:87–88. [PubMed]
11. Paul F, Roath S, Melville D, Warhurst DC, Osisanya JO. Separation of malaria-infected erythrocytes from whole blood: Use of a selective high-gradient magnetic separation technique. Lancet. 1981;2:70–71. [PubMed]
12. Scholz M, Fraunholz MJ. A computational model of gene expression reveals early transcriptional events at the subtelomeric regions of the malaria parasite, Plasmodium falciparum. Genome Biol. 2008;9:R88. [PMC free article] [PubMed]
13. Dahl EL, et al. Tetracyclines specifically target the apicoplast of the malaria parasite Plasmodium falciparum. Antimicrob Agents Chemother. 2006;50:3124–3131. [PMC free article] [PubMed]
14. Smyth GK. Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Stat Appl Genet Mol Biol. 2004;3:3. [PubMed]
15. Eisen MB, Spellman PT, Brown PO, Botstein D. Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci USA. 1998;95:14863–14868. [PMC free article] [PubMed]
16. Brunet JP, Tamayo P, Golub TR, Mesirov JP. Metagenes and molecular pattern discovery using matrix factorization. Proc Natl Acad Sci USA. 2004;101:4164–4169. [PMC free article] [PubMed]
17. Cortes A, et al. Epigenetic silencing of Plasmodium falciparum genes linked to erythrocyte invasion. PLoS Pathog. 2007;3:e107. [PMC free article] [PubMed]
18. Benjamini Y, Yekutieli D. The Control of the False Discovery Rate in Multiple Testing Under Dependency. Annals of Statistics. 2001;29:1165–1188.
19. Kyes SA, Rowe JA, Kriek N, Newbold CI. Rifins: A second family of clonally variant proteins expressed on the surface of red cells infected with Plasmodium falciparum. Proc Natl Acad Sci USA. 1999;96:9333–9338. [PMC free article] [PubMed]
20. Young JA, et al. The Plasmodium falciparum sexual development transcriptome: A microarray analysis using ontology-based pattern identification. Mol Biochem Parasitol. 2005;143:67–79. [PubMed]
21. De Silva EK, et al. Specific DNA-binding by apicomplexan AP2 transcription factors. Proc Natl Acad Sci USA. 2008;105:8393–8398. [PMC free article] [PubMed]
22. Gentleman RC, et al. Bioconductor: Open software development for computational biology and bioinformatics. Genome Biol. 2004;5:R80. [PMC free article] [PubMed]
23. Trager W, Jensen JB. Human malaria parasites in continuous culture. Science. 1976;193:673–675. [PubMed]
24. Heard NA, Holmes CC, Stephens DA, Hand DJ, Dimopoulos G. Bayesian coclustering of Anopheles gene expression time series: Study of immune defense response to multiple experimental challenges. Proc Natl Acad Sci USA. 2005;102:16939–16944. [PMC free article] [PubMed]
25. Lu X, Zhang W, Qin ZS, Kwast KE, Liu JS. Statistical resynchronization and Bayesian detection of periodically expressed genes. Nucleic Acids Res. 2004;32:447–455. [PMC free article] [PubMed]
26. Simpson JA, Aarons L, Collins WE, Jeffery GM, White NJ. Population dynamics of untreated Plasmodium falciparum malaria within the adult human host during the expansion phase of the infection. Parasitology. 2002;124:247–263. [PubMed]
27. Politis DN, Romano J.P. Large sample confidence intervals based on subsamples under minimal assumptions. Annals of Statistics. 1994:2031–2050.
28. Politis DN, Romano JP, Wolf M. Subsampling. Berlin: Springer; 1999.
29. Bruce MC, Alano P, Duthie S, Carter R. Commitment of the malaria parasite Plasmodium falciparum to sexual and asexual development. Parasitology. 1990;100(Pt 2):191–200. [PubMed]

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences
PubReader format: click here to try


Related citations in PubMed

See reviews...See all...


  • MedGen
    Related information in MedGen
  • PubMed
    PubMed citations for these articles

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...