• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of geneticsGeneticsCurrent IssueInformation for AuthorsEditorial BoardSubscribeSubmit a Manuscript
Genetics. Nov 2005; 171(3): 1437–1439.
PMCID: PMC1456841

A Statistical Multiprobe Model for Analyzing cis and trans Genes in Genetical Genomics Experiments With Short-Oligonucleotide Arrays

Abstract

Short-oligonucleotide arrays typically contain multiple probes per gene. In genetical genomics applications a statistical model for the individual probe signals can help in separating “true” differential mRNA expression from “ghost” effects caused by polymorphisms, misdesigned probes, and batch effects. It can also help in detecting alternative splicing, start, or termination.

IN a genetical genomics experiment, a panel of 30 genetically different recombinant inbred mice was derived from a cross between parental strains C57BL/6 (B6) and DBA/2 (D2) (Jansen and Nap 2001; Bystrykh et al. 2005). These 30 mice were profiled with Affymetrix MG-U74Av2 arrays, using RNA isolated from hematopoietic stem cells and 12,422 probe sets. The observed array data were background corrected and quantile normalized (Bolstad et al. 2003; Gautier et al. 2004). Although various methods have been developed to compute a single expression value per probe set for further data analysis (e.g., Zhang et al. 2003; Wu and Irizarry 2004; Manly et al. 2005), we here develop an alternative statistical method to more fully exploit the information contained in the individual probe signals.

Differential expression for a given gene can result from trans-regulation by other genes or from cis-regulation due to variation in the region of the gene itself (altering functional motifs in the promoter region, changing the stability of the mRNA, or modifying the gene product in such a way that the feedback loop is shifted; Jansen and Nap 2004). In either case signal differences are supposed to be rather stable across probes (Figure 1A). The differences may, however, also change from one probe to another, due to known or unknown single-nucleotide polymorphisms (SNPs) or microdeletions between mRNA transcripts of different samples (Figure 1B; see also Doss et al. 2005), due to misdesigned probes (the majority of probe sets in Figure 1 are sequence verified; see also Mecham et al. 2004) or due to other known or unknown factors (e.g., alternative transcription). In such cases computing a single expression value per probe set, as in the current methods, leads to a loss of biologically relevant information. This will also hold for future alternative transcription/splicing arrays with probes located in different exons and not in the last exon or 3′-untranslated region only, as in the MG-U74Av2 array used in our experiment (Sharov et al. 2005). When the differences in signal between probes match with information in alternative splicing databases, they indicate that alternative transcription/splicing is the cause, and not an SNP.

Figure 1.
Model parameterization and fitting: two examples and whole-genome results. (A) Expression profiles of 30 mice, adjusted for the effects of probe, batch, and probe-specific batch effects, are shown for probe set 103036_at. The mice with solid (shaded) ...

In a genetical genomics analysis we search with the aid of molecular markers for a genome position where the difference in expression between mice carrying the B6 marker allele and mice carrying the D2 marker allele is (most) significant; this genome position is commonly denoted “expression quantitative trait locus” (eQTL or just QTL). In our study the 30 mice have been expression profiled by using probe sets of size 16. The 30 × 16 signals in a given probe set are decomposed into

equation M1

where yij is the signal of the jth probe of the ith mouse, m is the average signal, Bi is the average effect of the batch to which the ith mouse belongs (three batches), Pj is the average effect of the jth probe, PBij is the interaction effect between probe and batch, Ai is the average effect of the allele (B6 or D2) carried by the ith mouse at a given genome position, PAij is the interaction effect between probe and allele type, ei is an error term per mouse, and, finally, eij is a probe-specific error term per mouse. The model is computed at each of 705 marker positions to find the position (QTL) with the most significant allele effect Ai; probe-by-QTL interaction is derived from the corresponding PAij. Standard F-values are computed: F1,25 at the mouse stratum for the QTL and F15,390 at the lowest stratum for the probe-by-QTL interaction.

Genes colocalizing within 20 Mb of their QTL are termed here cis genes, and genes mapping elsewhere are termed trans genes. The cis genes show many more probe-specific QTL effects than the trans genes do (Figure 1, C and D). It is expected that probe sets carrying the more influential polymorphisms between probe and transcript will be picked up as cis-acting with probe-specific QTL effects. Indeed 10 cis-acting genes carry currently known SNPs in one or more of their probes and the directions of the probe-specific QTL effects are in agreement with the SNPs; i.e., the mouse allele perfectly matching the probes on the array has the higher signal (allowing us to use the array for genotyping as well; see also Jansen and Nap 2001; Rostoks et al. 2005). Further lab research (e.g., sequencing of the D2 strain) will make clear how many of the cis-acting genes are caused by SNPs, alternative transcription, or other (hidden) factors.

Our analysis separates probe sets that are “consistently” cis across probes from those that are more “probe-specific” cis and should be investigated in more detail in silico or in the lab. Figure 1C shows that P-values for probe-by-QTL interaction can be very extreme. At the one hand probe-specific QTL effects can be large relative to eij (and thus statistically significant), at the other hand they can still be small relative to the average QTL effect (and thus biologically not really relevant). The biologist can best inspect figures (Figure 1B and alike) to see how many probes underlie probe-by-QTL interaction and to decide which genes merit closer scrutiny in silico or in the lab.

Some genomic regions showed up as “master” regulators in trans of many genes, but only if the factor batch was excluded from the model (particularly chromosome 4, see Figure 1E). The 30 samples have been profiled in three batches and samples in the same batch generally showed a very similar profile across the 16 probes, whereas samples in different batches showed different profiles. Spurious linkage between batch and these genome regions leads to “ghost” regulators, and, since batch and probe-specific batch effects are likely to occur in experiments with multiple arrays, it underpins the need for careful statistical modeling. It is important to note that standard permutation tests in QTL analysis without batch effects will not protect against this batch artifact (permutation should be carried out within batch-probe combinations).

We have shown that current methods fail to consider various influential variations (technical, molecular, and sequence) in genetical genomics studies, with worse fit, loss of relevant information, and possibly wrong conclusions as a result. Our statistical analyses on the entire probe data set are therefore at the moment the methods of choice in genetical genomics applications with short-oligonucleotide arrays to help in separating “true” cis and trans genes from “ghost” ones. Data are available at www.genenetwork.org (Chesler et al. 2004).

Acknowledgments

We thank J. P. Nap, D. J. de Koning, and C. S. Haley for stimulating discussions. R.A. was supported by Netherlands Organization for Scientific Research-Biomolecular Informatics grant 050-50-203.

References

  • Bolstad, B. M., R. A. Irizarry, M. Astrand and T. P. Speed, 2003. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 19: 185–193. [PubMed]
  • Bystrykh, L., E. Weersing, B. Dontje, S. Sutton, M. T. Pletcher et al., 2005. Uncovering regulatory pathways that affect hematopoietic stem cell function using ‘genetical genomics’. Nat. Genet. 37: 225–232. [PubMed]
  • Chesler, E. J., L. Lu, J. Wang, R. W. Williams and K. F. Manly, 2004. WebQTL: rapid exploratory analysis of gene expression and genetic networks for brain and behavior. Nat. Neurosci. 7: 485–486. [PubMed]
  • Doss, S., E. E. Schadt, T. A. Drake and A. J. Lusis, 2005. Cis-acting expression quantitative trait loci in mice. Genome Res. 15: 681–691. [PMC free article] [PubMed]
  • Gautier, L., L. Cope, B. M. Bolstad and R. A. Irizarry, 2004. affy—analysis of Affymetrix GeneChip data at the probe level. Bioinformatics 20: 307–315. [PubMed]
  • Jansen, R. C., and J. P. Nap, 2001. Genetical genomics: the added value from segregation. Trends Genet. 17: 388–391. [PubMed]
  • Jansen, R. C., and J. P. Nap, 2004. Regulating gene expression: surprises still in store. Trends Genet. 20: 223–225. [PubMed]
  • Manly, K. F., J. Wang and R. W. Williams, 2005. Weighting by heritability for detection of quantitative trait loci with microarray estimates of gene expression. Genome Biol. 6: R27. [PMC free article] [PubMed]
  • Mecham, B. H., D. Z. Wetmore, Z. Szallasi, Y. Sadovsky, I. Kohane et al., 2004. Increased measurement accuracy for sequence-verified microarray probes. Physiol. Genomics 18: 308–315. [PubMed]
  • Rostoks, N., J. O. Borevitz, P. E. Hedley, J. Russsell, S. Mudie et al., 2005. Single-feature polymorphism discovery in the barley transcriptome. Genome Biol. 6: R54. [PMC free article] [PubMed]
  • Sharov, A. A., D. B. Dudekula and M. S. Ko, 2005. Genome-wide assembly and analysis of alternative transcripts in mouse. Genome Res. 15: 748–754. [PMC free article] [PubMed]
  • Wu, Z., and R. A. Irizarry, 2004. Preprocessing of oligonucleotide array data. Nat. Biotechnol. 22: 656–658. [PubMed]
  • Zhang, L., M. F. Miles and K. D. Aldape, 2003. A model of molecular interactions on short oligonucleotide microarrays. Nat. Biotechnol. 21: 818–821. [PubMed]

Articles from Genetics are provided here courtesy of Genetics Society of America
PubReader format: click here to try

Formats:

Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...

Links

  • PubMed
    PubMed
    PubMed citations for these articles

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...