• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of pnasPNASInfo for AuthorsSubscriptionsAboutThis Article
Proc Natl Acad Sci U S A. Feb 13, 2007; 104(7): 2313–2318.
Published online Feb 2, 2007. doi:  10.1073/pnas.0610880104
PMCID: PMC1790866

Transcriptomic analysis of growth heterosis in larval Pacific oysters (Crassostrea gigas)


Compared with understanding of biological shape and form, knowledge is sparse regarding what regulates growth and body size of a species. For example, the genetic and physiological causes of heterosis (hybrid vigor) have remained elusive for nearly a century. Here, we investigate gene-expression patterns underlying growth heterosis in the Pacific oyster (Crassostrea gigas) in two partially inbred (f = 0.375) and two hybrid larval populations produced by a reciprocal cross between the two inbred families. We cloned cDNA and generated 4.5 M sequence tags with massively parallel signature sequencing. The sequences contain 23,274 distinct signatures that are expressed at statistically nonzero levels and show a highly positively skewed distribution with median and modal counts of 9.25 million and 3 transcripts per million, respectively. For nearly half of these signatures, expression level depends on genotype and is predominantly nonadditive (hybrids deviate from the inbred average). Statistical contrasts suggest ≈350 candidate genes for growth heterosis that exhibit concordant nonadditive expression in reciprocal hybrids; this represents only ≈1.5% of the >20,000 transcripts. Patterns of gene expression, which include dominance for low expression and even underdominance of expression, are more complex than predicted from classical dominant or overdominant explanations of heterosis. Preliminary identification of ribosomal proteins among candidate genes supports the suggestion from previous studies that efficiency of protein metabolism plays a role in growth heterosis.

Keywords: MPSS, nonadditive expression, factorial cross of partially inbred lines, potence, protein metabolism

Whole-genome sequencing has yielded rich insights into the number of genes required to make complex eukaryotic animals (14). Surprisingly few genes, however, can effect major changes in body design and shape. For example, elaboration and mutation of a fairly small number of developmentally important regulatory genes, such as the Hox gene cluster, appear to have driven the evolution of the major metazoan body plans (5). Profound changes in shape or morphology, underlying adaptive differences among closely related species, also can be caused by mutations in a very few genes (68).

Compared with our growing understanding of the evolution of biological shape and form, our understanding of what regulates body size and the rates and efficiencies of physiological functions is much less developed. For example, the genetic and physiological causes of heterosis (hybrid vigor) and its converse, inbreeding depression, have remained elusive for nearly a century, despite the economic and scientific significance of these phenomena (912). Major genetic explanations for heterosis are overdominance, the superiority of heterozygotes at genes affecting fitness or economic traits, dominance, the masking, in hybrids, of deleterious recessive mutations by dominant alleles inherited from one or the other inbred parent, and epistasis, the interaction of alleles at different loci. Under the first two hypotheses, inbreeding depression results from the increased frequency of less fit homozygous genotypes.

Plants, particularly crops and conifers, and bivalve molluscs exhibit high levels of inbreeding depression and heterosis. Initially, the evidence for heterosis in bivalves was indirect, a widely observed correlation of marker heterozygosity and fitness-related traits, such as growth (13, 14). Direct evidence for growth heterosis was demonstrated in the Pacific oyster by means of controlled crosses between partially inbred families (15). Further study of F2 hybrid populations revealed a large load of highly deleterious recessive alleles, consistent with the dominance hypothesis (16). Physiological causes of heterosis have received less attention than genetic causes (10), but classical methods for quantifying energy and nutrient budgets are advancing understanding of the physiological underpinnings of growth heterosis in bivalves. Both larval and adult hybrid oysters have higher feeding rates and efficiencies than their inbred counterparts (17, 18). A large proportion of growth efficiency is achieved by lower rates of protein turnover in hybrids or more highly heterozygous individuals (19).

Genomic approaches to understanding the genetic basis of heterosis have included indirect mapping of quantitative trait loci (2022) and statistical analysis of the gene action implied (23). More recently, attention has turned to gene expression in intraspecific hybrids (24, 25), although large-scale comparisons of patterns of gene expression in maize have just begun (26). Here, we use megacloning and massively parallel signature sequencing (MPSS; 27, 28), coupled technologies that generate short sequence tags from complex RNA samples, to analyze gene expression patterns in inbred and hybrid Pacific oysters produced by controlled reciprocal crosses between two inbred families. MPSS offers several advantages for organisms with poorly characterized genomes, such as the oyster: It requires no previous sequence information (28), it is comprehensive and quantitative (2933), and it detects rare messages (down to ≈3 transcripts per million) and small differences in expression among genotypes (32).


Two partially inbred families of Pacific oyster were crossed to produce inbred (named 33 and 55; inbreeding coefficient f = 0.375) and reciprocal hybrid (35 and 53, male parent listed first) larvae. Heterosis for growth is evident from differences between shell lengths of 5-d-old inbred and hybrid veliger larvae, as well as from the greater growth rate, from day 2 to day 5, of hybrids compared with inbreds (Fig. 1). Heterosis for larval shell length at day 5 is defined by the potence, hp = Q/L > 1.0, where Q is twice the deviation of a hybrid from the mean of the inbred shell lengths (mid-parent) and L is the absolute difference between the mean trait-values of the two parental inbred lines, i.e., Q is the quadratic and L, the linear contrast estimated from analysis of variance (ANOVA; 10, 15). Potence values for the two reciprocal hybrids are much greater than 1.0 (h35 = 4.56, P < 0.0001; h53 = 7.93, P < 0.0001); hybrid 53 is significantly larger than hybrid 35 (P = 0.0012) and inbred family 33 is larger, although not significantly so, than inbred family 55 (P = 0.054). Clearly, our experimental material exhibits classical hybrid vigor or growth heterosis, as observed in previous crosses among inbred lines (15, 17, 18).

Fig. 1.
Shell lengths on day 5 (in micrometers; bars) and growth rates (micrometers per day, between days 2 and 5; lines) for two inbred (33, 55) and two hybrid (35, 53) families produced by a factorial cross between partially inbred lines 35 and 51 of Pacific ...

RNA was extracted from 6-d-old larvae of each larval population and cloned via Megaclone methods (27). Sequences of 666 inserts from these libraries (average length 238 bp) show that 89% begin with the expected DpnII restriction enzyme site, GATC. Of these, 25% match GenBank sequences for the Crassostrea gigas mitochondrial genome (AF177226), as expected in rapidly growing larvae. Of the 446 nonmitochondrial DNA sequences, 84% are unique but only 40% of these (n = 151) have matches in GenBank.

From the Megaclone libraries, we obtain 17-bp sequences by MPSS for each of the four genotypes in replicates of two different sequencing frames [two-stepper (TS) and four-stepper (FS)]. Sequences deriving from the cloning vector, having poor signal-to-noise characteristics, or present in only one of the 15 MPSS runs (a TS replicate for 33 is missing) are removed from the data. The weighted mean relative frequency of a signature within a family is calculated as the sum, over replicate runs, of beads with that signature (the signature count) divided by the total number of sequences obtained in that reading frame for that family. Relative frequency is multiplied by 106 to yield transcripts per million (tpm). The reading frame with the highest normalized count across families is selected to represent the relative transcript frequency or expression of each signature. These filtering and selection steps leave 4.5 million sequences, comprising 53,771 distinct 17-bp signature sequences (Table 1). For 283 signatures, we observe matches of 14 or more base pairs to the Pacific oyster mitochondrial genome. One signature, GATCATAGGAGAAGTTA, which corresponds to the large mitochondrial ribosomal RNA subunit, has an observed count of 552,607 or 22.8% of the 2.4 million clones sequenced in the FS reading frame, similar to the 25% of 666 Megaclone library sequences checked. Signatures with matches to oyster mitochondrial DNA are left in the analysis of gene-expression patterns to follow and will be treated in more detail (J. Curole, E.M., D.T.M., and D.H., unpublished data).

Table 1.
Clones sequenced and unique signatures detected by MPSS in two inbred (33 and 55) and two hybrid (35 and 53) genotypes

Sequence pairs read in the same frame, differing from each other by a single base pair, and absent in one or the other inbred parent line (n = 943) are interpreted as expressed single-nucleotide polymorphisms (SNPs). For subsequent analyses, counts for these pairs are pooled arbitrarily under the signature for which expression in inbred family 55 is zero. We detected a more complex SNP family, composed of two SNP signature pairs, which were sequenced in different frames, together with two additional singleton signatures, all differing from each other by a single base pair. The SNP signature pair with the highest expression level is retained and the other signatures are omitted, leaving 52,825 signatures (i.e., 53,771 − 943 − 3).

Given these expression data, the problem is to identify those signatures that show significant nonadditive expression on the assumption that nonadditive rather than additive gene expression must underlie the observed heterosis for growth (Fig. 1). This evokes a statistical comparison of expression levels among families that is identical to the analysis of larval growth. The MPSS expression data for each of the 52,825 signatures are fit to a general linear model, p = g, where p is the proportion of clones having a particular signature sequence and g is the genotype (family). We find that expression levels of 10,623 signatures depend on genotype at a nominal significance threshold of P ≤ 0.0001; this number is lowered to 6,297 if the significance threshold is corrected to P ≤ 0.05 for multiple tests. Mean signature counts for significant linear models (i.e., N[p with macron], where N is the total number of sequences, 2,442,669 for the FS data and 2,066,153 for the TS data, and [p with macron] is the mean relative frequency over the four genotypes) are all >7, which is equivalent to ≈3 tpm. Of the 42,202 signatures, for which the linear model is not significant, 12,651 signatures have mean counts more than ≈3 tpm but no significant variation among genotypes. The remaining 29,551 signatures are very rare, accounting for only 1.8% of the 4.5 million sequences and likely represent artifacts of the MPSS process. Averaged over the four families, the median and modal counts of 23,274 signatures with statistically nonzero expression are 9.25 and ≈3.5 tpm, respectively, which suggests that, even with low-frequency artifacts removed, most genes are expressed at low levels (Fig. 2).

Fig. 2.
Transcript abundance (in tpm), averaged over the four genotypes, for 23,273 MPSS signatures with Np ≥ 7, where N is the number of clones sequenced and p is the mean proportion of clones having a specific signature. (Inset) Shown are 471 signatures ...

Mode of gene expression may be categorized as additive or nonadditive, using linear and quadratic contrasts among genotypic values, just as in the analysis of larval shell length (i.e., L is the difference between the mean expression levels of the two parental inbred lines and Q is twice the deviation of a hybrid from mid-parent expression levels). Expression is additive if L > 0 and Q is zero, i.e., hybrid expression is not different from the mid-parent level. Signatures with nonadditive expression are categorized into six classical patterns of gene action: (i) overdominance (OD), i.e., overexpressed in hybrids relative to the higher expressing parent, hp > 1.0; (ii) dominance (D) “+”, i.e., hybrids expressing like the higher expressing parent, hp = 1.0; (iii) partial dominance (PD) +, i.e., hybrids expressing above mid-parent but below the higher expressing parent, 0 < hp < 1.0; (iv) D “−”, i.e., hybrids like the lower expressing parent, hp = −1; (v) PD−, i.e., hybrids below mid-parent but above the lower expressing parent, 0 > hp > −1); and (vi) underdominance (UD), i.e., underexpressed in hybrids relative to the lower expressing parent, hp < −1.0. The criteria for these patterns of gene expression are mutually exclusive and yield nonoverlapping sets of signatures for each of the hybrids.

Of the 3,762 signatures that have significant linear models for hybrid 35, 97.4% have nonadditive expression and only 2.6% have additive expression. Of 3,705 significant genotype-dependent signatures for hybrid 53, 96.2% are nonadditive and only 3.7% are additive. Equally noteworthy is the uneven distribution of signatures across the categories of nonadditive expression (Fig. 3). Approximately half of the signatures show overexpression in hybrids (OD), with the next highest category being dominance for low expression (D−), followed by dominance for high expression (D+), with underexpression (UD) being slightly more common than additive expression. Less than 1% is in the partially dominant categories.

Fig. 3.
Proportions of MPSS signatures with modes of expression categorized as overdominant (OD), dominant for high expression (D+), partially dominant for high expression (PD+), additive (ADD), partially dominant for low expression (PD−), dominant for ...

We next consider signatures, for which expression levels in hybrids resemble those in the paternal or maternal lines. To infer parental effects, we require that the linear model be significant (at P ≤ 0.0001), that the parental inbred lines differ in expression (L significant at P ≤ 0.001), and that hybrids resemble paternal or maternal parents within a P > 0.01 threshold of nonsignificance. The criteria for parental patterns of expression do not yield sets of signatures that are mutually exclusive to those in Fig. 3. For example, we find 531 signatures with paternal patterns of expression; of these, 169 are already counted in the four categories of dominance (D-35, D+53, D+35, and D-53) and 362 are newly categorized. Likewise, of the 552 signatures showing maternal effects on expression at the P > 0.01 level of similarity, 170 overlap with the dominance categories and 382 are newly categorized. Removing overlap between the parental-effect and dominance categories, we find that ≈70% of the 10,623 signatures with expression levels that depend significantly on genotype can be assigned to additive, nonadditive, or parental-effect categories of expression in one or both hybrids. Overdominance or overexpression in one or more hybrids is by far the most common pattern of expression.

In the classical dominance and overdominance models of heterosis, there should be no genotypic difference for nuclear genes between reciprocal hybrids. This is strictly true only if parent lines are completely inbred, i.e., AA and BB, such that AB = BA. Our lines are only three-eighths inbred, so residual heterozygosity within parent lines will make reciprocal hybrids nonequivalent at many loci. Nevertheless, growth heterosis is observed in both hybrids (Fig. 1). To produce a tractable number of candidate heterosis genes for functional analysis, we focus on those signatures showing the same mode of expression in both reciprocal hybrids. The intersections of the sets of signatures categorized into modes of expression also show predominance of nonadditive expression, 875 of 906 signatures (96.6%), but the distribution of signatures across categories is now dominated by signatures in the D− category (71.7%), followed by the OD (14.0%), UD (6.2%), D+ (3.0%), additive (ADD) (3.4%), PD+ (0.1%), and PD− (1.5%) categories. Nearly half of the 650 signatures with D− expression patterns have zero counts for both hybrids and one inbred family. Limiting D− patterns to those signatures with nonzero expression in all four larval populations reduces the number of D− candidates to 138 and the total number of candidates in nonadditive categories (OD, 127; D+, 27; D−, 138; and UD, 56) to 348 (Fig. 3).


Despite nearly a century of work on heterosis, little consensus has emerged on the genetic or physiological causes of this important phenomenon. Functional genomics offers a novel approach to unraveling the molecular and physiological causes of heterosis. In this study, we categorize 70% of genotype-dependent patterns of gene expression into classical modes of gene action. We find few additive patterns of gene expression (2%), in which hybrid expression is the average of inbred expression levels, a result differing markedly from that of Swanson-Wagner et al. (26), who report 78% additive patterns in maize, but consistent with levels of whole genome nonadditive patterns of expression observed in crosses of isogenic lines of Drosophila melanogaster (34). The pervasive nonadditivity of whole-genome expression patterns in Drosophila and now in our study is surprising in light of the classical view that additive gene action underlies variation in most quantitative traits. For growth heterosis in oyster larvae, a surprisingly small number of genes, ≈350, only ≈1.5% of the >20,000 transcripts, appears to control the striking variation in body size and physiological function. Although larger than the number of genes that can effect major changes in morphology, the number of genes controlling growth heterosis is on the same scale as the number of genes known to regulate the physiology of life span and lipid accumulation in Caenorhabditis elegans (35).

MPSS has been successfully applied and tested in model species with sequenced genomes (2833). These studies show that MPSS provides excellent coverage of the transcriptome, to the point of revealing previously uncharacterized transcripts (30, 32), and that the expression of most transcripts can be accurately and precisely quantified (33). Oyster MPSS signatures matching the mitochondrial genome sequence (GenBank accession NC_001276) correspond to 26 of 27 DpnII restriction sites in the genome, the majority being on the coding strand with a substantial fraction of unknown function on the noncoding strand (J. Curole, E.M., D.T.M., and D.H., unpublished data). In model species, the major sources of error and bias with MPSS have been identified and quantified (30, 33). Imperfectly matched signatures from the oyster mitochondrial genome yield a sequencing error estimate of 2.4%, compared with the 3.2% estimated for Arabidopsis (30). Furthermore, we observe a significant negative correlation between expression and Megaclone insert length, previously identified as a bias with “classical” Megaclone libraries (32). These subsequent observations strongly imply that MPSS data for the oyster has the same coverage, accuracy, precision, sequencing error, and bias as those characterized in model systems.

With a genome sequence, one can immediately identify >90% of the 17-bp signature sequences from MPSS (27, 29, 31). Our study, on the other hand, illustrates how MPSS can be usefully applied to an organism without a sequenced genome, to address a significant biological problem that has been under study for nearly a century. Key to the success of MPSS for a nonmodel species, however, is the availability of clear phenotypic contrasts, which are afforded by the classical expression of growth heterosis in hybrids produced by crossing of inbred lines of oysters. In this experimental context, phenotypic contrasts among genotypes reveal a set of candidate genes for further functional and genetic analyses of growth heterosis.

To obtain candidate heterosis genes, we fit the expression profile for each of nearly 53,000 signature sequences for inbred and hybrid oysters to a general linear model. Just over half (57%) of the signatures are expressed at insignificant levels and are not associated with genotype. Twenty percent (10,623) of all signatures are expressed at levels that depend on genotype (P < 0.0001), and another 24% are expressed at adequate levels (mean expression in excess of ≈3 tpm) but independently of genotype. At the 5% level of significance adjusted for multiple testing, we expect 2,650 of the nearly 53,000 linear models to be significant by chance, so the 6,297 signatures with significant genotype dependent expression is 2.4 times greater than the number expected by chance.

Among the signatures with identifiable patterns of genotype-dependent expression, we, like Swanson-Wagner et al. (26), observe patterns expected under the classical hypotheses of heterosis (i.e., D+, dominance for high expression and OD, overexpression of genes in hybrids), as well as patterns of expression not anticipated by the classical hypotheses (D−, dominance for low expression, and UD, underexpression of genes in hybrids). Considering each of the hybrids separately, overexpression (OD) of genes is the most prevalent pattern (47% and 53% of categorized signatures in hybrids 35 and 53, respectively), followed by the D−, D+, and UD patterns. We note that the hybrid family with the greater degree of growth heterosis, 53, shows a greater proportion of OD patterns. The prevalence of over- and underdominant expression for oyster larvae differs from what has been reported for either maize (26) or Drosophila (34) hybrid-inbred comparisons, in which only 3.2% and 5% of expression patterns, respectively, show over- or underexpression in hybrids.

The genetic basis of OD or UD phenotypes may be interaction between cis-acting regulatory elements or differences in levels of trans-acting factors. It should be possible to distinguish between cis and trans regulation of expression levels by contingency tests of the association between expression and genotype in the next generation. With cis regulation, the association between heterozygosity at the candidate locus and expression level should remain unchanged; with trans regulation, recombination between the candidate locus and trans-acting elements should eliminate or reduce over- or underexpression in heterozygotes. The D+ and D− patterns of expression, on the other hand, may result from allelic dosage effects (12, 25), monoallelic expression, or possibly trans-acting factors. The likely importance of trans-acting elements in regulating many nonadditive patterns of gene expression suggests that growth heterosis, in general, may result largely from epistatic gene interaction. Untangling the networks of gene regulation and interaction that underlie growth heterosis will require simultaneous application of quantitative genetic and physiological genomic approaches (34, 36, 37).

The functional genomic approach to understanding growth heterosis afforded by MPSS reveals patterns of gene expression, UD and D−, which are not anticipated by classical quantitative genetic theory. These patterns of expression may be related to physiological differences that have been observed between fast- and slow-growing oyster larvae with different genotypes. Pace et al. (18) showed that the physiological bases of growth heterosis in larvae were enhanced feeding rate and metabolic efficiency likely realized through mechanisms of protein synthesis and degradation (turnover). For adult bivalves, Hawkins et al. (19) showed that much of the difference in resting metabolic rates between fast- and slow-growing mussels could be ascribed to differences in whole-body protein turnover. Thus, low expression (UD or D−) of genes involved in protein degradation may well be consistent with the faster growth of hybrid relative to inbred larvae. That three of the nine candidates identified in the Megaclone library insert sequencing have BLAST hits to ribosomal proteins further suggests an important role for protein metabolism in growth heterosis (Table 2). Meyer (38) has extended this result, finding that many of the annotated Megaclone sequences generated from MPSS candidate signatures are involved in protein metabolism. Advances in the physiological genomics of growth heterosis promise to move this century-old debate in a refreshingly new direction.

Table 2.
BLAST hits for nine Megaclone inserts containing signatures subsequently found to have nonadditive patterns of expression in hybrid vs. inbred larval Pacific oysters


Biological Material.

Inbred lines 35 and 51 were derived from families founded in 1996 by pair crosses of Pacific oysters from a natural population (39). The first inbred generation was produced in 1998 by full-sib crosses. In 2000, inbred adults from the two lines were crossed in a replicated 2 × 2 factorial design, using standard hatchery and larval rearing methods (15). Shell lengths of 50 larvae were measured daily under a dissecting microscope with ocular micrometer. Cultures from replicate B were taken at day 6 and sampled for RNA analysis.

Samples of 1–2 million larvae were suspended in 0.2 μm (pore-size) filtered seawater (FSW), enumerated, and centrifuged at high speed. Seawater was aspirated; the larval pellets were rinsed with FSW to remove algal detritus and recentrifuged. Larvae were resuspended in denaturing solution (4 M guanidinium isothiocyanate/25 mM sodium citrate/0.5% wt/vol sodium sarkosyl/0.1 M 2-mercaptoethanol) and immediately ground in liquid nitrogen with a nuclease-free ceramic mortar and pestle. Homogenized samples were stored at −80°C. These crude extracts were later thawed on ice and centrifuged briefly to remove cell debris and other insoluble material. Supernatants were transferred into tubes containing cesium chloride solution and spun at 150,000 × g in an ultracentrifuge for 24 h. The resulting pellets, containing total RNA, were dried under vacuum, resuspended in TES buffer (50 mM Tris/100 mM EDTA/0.5% wt/vol SDS), and extracted with phenol-chloroform-isoamyl alcohol. RNA was precipitated with ethanol, dried under vacuum, and resuspended in 30 μl of nuclease-free water.


Protocols for MPSS have been reported (27).

cDNA signature/tag conjugate library construction.

Poly(A)+ mRNA was extracted from 6-d-old larval oysters. cDNA was synthesized by using an oligo(dT) primer modified to contain a BsmBI restriction sequence and a 5′ biotin molecule. After second-strand synthesis, cDNA was digested with DpnII and the 3′ most fragment collected with streptavidin beads. DNA fragments were digested off of the beads with BsmBI and ligated directionally into a collection of 88 distinct vectors, adding a unique 32-bp identification tag.

Microbead loading.

Six pools of ≈160,000 cDNA-tag conjugates each were PCR amplified (≈1% of tag complexity) and, after single-strand rendering of the identification tags, hybridized with anti-tags present on a library of 88 possible microbeads. Microbeads “loaded” each with ≈100,000 copies of a single cDNA were isolated by using a fluorescence-activated cell sorter. Approximately 1 million loaded microbeads were assembled in a flow cell, and 17 bp of the cDNA fragments on each bead were determined by MPSS.


Sequencing of four bases at a time is done by hybridization to fluorescently labeled encoders; that set of four bases is removed by type II endonuclease digestion, exposing the next four bases, and the process is repeated. The imaged fluorescent reactions are processed to yield the number of beads that have a given signature sequence. To reduce signature losses from self-ligation of exposed palindromic overhangs, two types of initiating adaptors, with type IIs restriction sites offset by two bases, are ligated to two identical replicates of the same signature library. These two alternative sequencing reactions are referred to as TS and FS sequencing frames. For each sample, loaded beads were taken in fixed aliquots and independently sequenced twice with the TS and FS protocols.

Data Preprocessing.

The net result of an MPSS run is a list of unique 17-bp signatures and the count of beads having each signature. Data for each genotype are compiled by signature, from replicate sequencing runs, for each of the TS and FS reading frames. During this compilation, the following filters are applied: (i) to remove signatures with single-strand sequences; (ii) to remove signatures from the cloning vector; (iii) to remove signatures with low confidence base calls (one or more lowercase bases) that have no corresponding, high confidence signatures (all uppercase bases) in other runs of the same reading frame; and (iv) to remove signatures observed in only one run, as long as there is a replicate run (same reading frame), and the observed count is >50. After compilation, a final filter removes signatures for which the average value for all runs is zero, owing to averaging of low counts that round down to zero.

The weighted average and normalized expression levels of each signature are calculated for each of the reading-frame sequences. The weighted average for a signature count, p, is the sum of the number of clones (beads) having the signature, x, divided by the total number of clones sequenced for that reading frame, n. Average expression level is normalized to tpm by multiplying by 106. Finally, the normalized expression levels for a signature are summed across genotypes and compared between the two sequence-reading frames. The reading frame with the highest normalized count is chosen for further analysis, because various artifacts, such as palindromic sequences, are likely to bias counts downward.

The algorithm used to find expressed SNPs starts with signatures, for which expression in inbred family 55 is zero, listed in descending order of abundance in inbred family 33. The algorithm then searches for a SNP partner from a list of signatures sequenced in the same reading frame, for which expression in inbred family 33 is zero, listed in descending order of abundance in inbred family 55. Counts for both signatures were combined under the signature for which inbred family 55 had zero expression, and the pooled data were subjected to statistical analysis.

Statistical Tests.

MPSS data are categorical and binomial. However, the normal distribution provides a good approximation of the binomial distribution, if the interval [μ ± 3σ], where μ = np and σ = np(1 − p), lies between 0 and n. This interval will not include zero, if n > ≈9/p, which means that, for sample sizes in this study (2,066,153 for TS, 2,422,669 for FS), p across genotypes has to be greater than ≈4 tpm.

We compare simultaneously the expression profiles of two inbred and two hybrid genotypes, signature by signature, under the simple model, p = g, by using the general linear model procedure (Proc GLM) of SAS (SAS Institute, Cary, NC). The raw data for each signature are the numbers of ones (presence of the signature) and zeros (absence of the signature), which sum to the grand total of sequences in that reading frame. Type III sums of squares are used to determine model significance, because sample size varies by genotype. We estimate the linear and quadratic contrasts for genotypes 33, 35, 53 and 55 with design vectors L = [−1 0 0 1], Q35 = [−1 2 0 −1], and Q53 = [−1 0 2 −1]. We also test the significance of the following post hoc contrasts: 35 = 33 [−1 1 0 0], 53 = 33 [−1 0 1 0], 35 = 55 [0 1 0 −1], 53 = 55 [0 0 1 −1], 35 = MP [−0.5 1 0 −0.5], 53 = MP [−0.5 0 1 −0.5], and 35 = 53 [0 1 −1 0], where MP is the average of expression levels in the two inbred parental genotypes. Calculated potence (hp = Q/|L|), the linear contrast, and the significance of pairwise contrasts are used to categorize expression patterns, when the linear model is significant at the α = 0.0001 level. This level of significance is higher than the α = 0.05 level adjusted for simultaneous testing of 52,825 signatures, which requires a threshold probability of 1 − (1 − 0.05)1/52825 = 9.71 × 10−7, but we focus on patterns of expression rather than significance of any particular signature. Classifying a pattern as OD, for example, requires hp > 1.0 and the contrast between hybrid and highest expressing parent to be significant at α = 0.001. Categories, for which parent-hybrid similarity is required, such as D+ or D−, are assigned at a significance level of α > 0.01 for the appropriate contrast.


This research was made possible by funding from the U.S. Department of Agriculture National Research Initiative Competitive Grants Program (to D.H.), the USDA Western Regional Aquaculture Center (to D.H. and D.T.M.), the W. M. Keck Foundation (to D.T.M.), and the National Science Foundation (Gen-En; to D.H. and D.T.M.).


massively parallel signature sequencing
partial dominance
single-nucleotide polymorphism
transcripts per million


Conflict of interest statement: J.-Z.L., S.D., C.D.H., and B.B. were employees of Lynx Therapeutics, Inc., at the time this work was done. C.D.H. remains an employee of Solexa, Inc., which merged with Lynx Therapuetics in 2005.

Data deposition: The data reported in this paper have been deposited in the Gene Expression Omnibus (GEO) database, www.ncbi.nlm.nih.gov/geo (series GSE3596, platform GPL3062, and samples GSM83048GSM83051). The sequences from the Megaclone libraries have been deposited in the GenBank database (accession nos. DV736295DV736964).


1. Goffeau A, Barrell BG, Bussey H, Davis RW, Dujon B, Feldmann H, Galibert F, Hoheisel JD, Jacq C, Johnston M, et al. Science. 1996;274:546–567. [PubMed]
2. The C. elegans Sequencing Consortium. Science. 1998;282:2012–2018. [PubMed]
3. Adams MD, Celniker SE, Holt RA, Evans CA, Gocayne JD, Amanatides PG, Scherer SE, Li PW, Hoskins RA, Galle RF, et al. Science. 2000;287:2185–2195. [PubMed]
4. Venter JC, Adams MD, Myers EW, Li PW, Mural RJ, Sutton GG, Smith HO, Yandell M, Evans CA, Holt RA, et al. Science. 2001;291:1304–1351. [PubMed]
5. Lewis EB. Nature. 1978;276:565–570. [PubMed]
6. Shapiro MD, Marks ME, Peichel CL, Blackman BK, Nereng KS, Jónsson B, Schluter D, Kingsley DM. Nature. 2004;428:717–723. [PubMed]
7. Wu P, Jiang T-X, Suksaweang S, Widelitz RB, Chuong C-M. Science. 2004;305:1465–1466. [PubMed]
8. Abzhanov A, Protas M, Grant BR, Grant PR, Tabin CJ. Science. 2004;305:1462–1465. [PubMed]
9. Gowen JW, editor. Heterosis. Ames, IA: Iowa State College Press; 1952.
10. Griffing B. Genetics. 1990;126:753–767. [PMC free article] [PubMed]
11. Crow JF. Genetics. 1998;148:923–928. [PMC free article] [PubMed]
12. Birchler JA, Auger DL, Riddle NC. Plant Cell. 2003;15:2236–2239. [PMC free article] [PubMed]
13. Zouros E, Romerodorey M, Mallet AL. Evolution (Lawrence, Kans) 1988;42:1332–1341.
14. Britten HB. Evolution (Lawrence, Kans) 1996;50:2158–2164.
15. Hedgecock D, McGoldrick DJ, Bayne BL. Aquaculture. 1995;137:285–298.
16. Launey S, Hedgecock D. Genetics. 2001;159:255–265. [PMC free article] [PubMed]
17. Bayne BL, Hedgecock D, McGoldrick D, Rees R. J Exp Mar Biol Ecol. 1999;233:115–130.
18. Pace DA, Marsh AG, Leong P, Green A, Hedgecock D, Manahan DT. J Exp Mar Biol Ecol. 2006;353:188–209.
19. Hawkins AJS, Bayne BL, Day AJ. Proc R Soc London B. 1986;229:161–176.
20. Stuber CW, Lincoln SE, Wolff DW, Helentjaris T, Lander ES. Genetics. 1992;132:823–839. [PMC free article] [PubMed]
21. Xiao JH, Li JM, Yuan LP, Tanksley SD. Genetics. 1995;140:745–754. [PMC free article] [PubMed]
22. Li ZK, Luo LJ, Mei HW, Wang DL, Shu QY, Tabien R, Zhong DB, Ying CS, Stansel JW, Khush GS, et al. Genetics. 2001;158:1737–1753. [PMC free article] [PubMed]
23. Cockerham CC, Zeng ZB. Genetics. 1996;143:1437–1456. [PMC free article] [PubMed]
24. Song R, Messing J. Proc Natl Acad Sci USA. 2003;100:9055–9060. [PMC free article] [PubMed]
25. Auger DL, Gray AD, Ream TS, Kato A, Coe EH, Jr, Birchler JA. Genetics. 2005;169:389–397. [PMC free article] [PubMed]
26. Swanson-Wagner RA, Jia Y, DeCook R, Borsuk LA, Nettleton D, Schnable PS. Proc Natl Acad Sci USA. 2006;103:6805–6810. [PMC free article] [PubMed]
27. Brenner S, Williams SR, Vermaas EH, Storck T, Moon K, McCollum C, Mao JI, Luo S, Kirchner JJ, Eletr S, et al. Proc Natl Acad Sci USA. 2000;97:1665–1670. [PMC free article] [PubMed]
28. Brenner S, Johnson M, Bridgham J, Golda G, Lloyd DH, Johnson D, Luo S, McCurdy S, Foy M, Ewan M, et al. Nat Biotech. 2000;18:630–634. [PubMed]
29. Meyers BC, Vu TH, Tej SS, Ghazal H, Matvienko M., Agrawal V, Ning J, Haudenschild CD. Nat Biotech. 2004;22:1006–1011. [PubMed]
30. Meyers BC, Tej SS, Vu TH, Haudenschild CD, Agrawal V, Edberg SB, Ghazal H, Decola S. Genome Res. 2004;14:1641–1653. [PMC free article] [PubMed]
31. Jongeneel CV, Iseli C, Stevenson BJ, Riggins GJ, Lal A, Mackay A, Harris RA, O'Hare MJ, Neville AM, Simpson AJG, et al. Proc Natl Acad Sci USA. 2003;100:4702–4705. [PMC free article] [PubMed]
32. Jongeneel CV, Delorenzi M, Iseli C, Zhou DX, Haudenschild CD, Khrebtukova I, Kuznetsov D, Stevenson BJ, Strausberg RL, Simpson AJG, et al. Genome Res. 2005;15:1007–1014. [PMC free article] [PubMed]
33. Stolovitzky GA, Kundaje A, Held GA, Duggar KH, Haudenschild CD, Zhou D, Vasicek TJ, Smith KD, Aderem A, Roach JC. Proc Natl Acad Sci USA. 2005;102:1402–1407. [PMC free article] [PubMed]
34. Gibson G, Riley-Berger R, Harshman L, Kopp A, Vacha S, Nuzhdin S, Wayne M. Genetics. 2004;167:1791–1799. [PMC free article] [PubMed]
35. Ashrafi K, Chang FY, Watts JL, Fraser AG, Kamath RS, Ahringer J, Ruvkun G. Nature. 2003;421:268–272. [PubMed]
36. Wayne ML, McIntyre LM. Proc Natl Acad Sci USA. 2002;99:14903–14906. [PMC free article] [PubMed]
37. Anholt RRH, Dilda CL, Chang S, Fanara JJ, Kulkarni NH, Ganguly I, Rollmann SM, Kamdar KP, Mackay TFC. Nat Genet. 2003;35:180–184. [PubMed]
38. Meyer E. Los Angeles: University of Southern California; 2006. PhD dissertation.
39. Langdon C, Evans F, Jacobson D, Blouin M. Aquaculture. 2003;220:227–244.

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences
PubReader format: click here to try


Related citations in PubMed

See reviews...See all...