• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of geneticsGeneticsCurrent IssueInformation for AuthorsEditorial BoardSubscribeSubmit a Manuscript
Genetics. Nov 2004; 168(3): 1457–1465.
PMCID: PMC1448773

Evolutionary Expressed Sequence Tag Analysis of Drosophila Female Reproductive Tracts Identifies Genes Subjected to Positive Selection


Genes whose products are involved in reproduction include some of the fastest-evolving genes found within the genomes of several organisms. Drosophila has long been used to study the function and evolutionary dynamics of genes thought to be involved in sperm competition and sexual conflict, two processes that have been hypothesized to drive the adaptive evolution of reproductive molecules. Several seminal fluid proteins (Acps) made in the Drosophila male reproductive tract show evidence of rapid adaptive evolution. To identify candidate genes in the female reproductive tract that may be involved in female–male interactions and that may thus have been subjected to adaptive evolution, we used an evolutionary bioinformatics approach to analyze sequences from a cDNA library that we have generated from Drosophila female reproductive tracts. We further demonstrate that several of these genes have been subjected to positive selection. Their expression in female reproductive tracts, presence of signal sequences/transmembrane domains, and rapid adaptive evolution indicate that they are prime candidates to encode female reproductive molecules that interact with rapidly evolving male Acps.

GENES whose products participate in reproduction often show signs of adaptive evolution (Swanson and Vacquier 2002). For example, two-dimensional gel electrophoresis has shown that proteins from Drosophila male and female reproductive organs are, on average, twice as diverse between species as those from nonreproductive tissues (Civetta and Singh 1995). A similar pattern has been found at the nucleotide level for Drosophila male accessory gland proteins (Acps)(Aguadé et al. 1992; Tsaur and Wu 1997; Aguadé 1998, 1999; Tsaur et al. 1998; Begun et al. 2000; Swanson et al. 2001a; Kern et al. 2004). Acps are important components of the seminal fluid of the male ejaculate and have been shown to have a variety of effects on the mated female (Wolfner 2002). Acps have been shown to increase the female's egg laying rate (Herndon and Wolfner 1995; Soller et al. 1997, 1999; Heifetz et al. 2000, 2001; Chapman et al. 2003; Liu and Kubli 2003), reduce her receptivity to remating (Chen et al. 1988; Chapman et al. 2003; Liu and Kubli 2003), decrease the female's lifespan (Chapman et al. 1995), and be involved in sperm storage and utilization (Neubaum and Wolfner 1999; Tram and Wolfner 1999; Xue and Noll 2000). An analysis of expressed sequence tags (ESTs) derived from Drosophila simulans male accessory glands and compared to the completed D. melanogaster genome demonstrated that the genes encoding Acps are on average twice as divergent as non-Acp genes (Swanson et al. 2001a). Although no statistically significant departures from neutrality were observed in the tests applied in their study, 11% of the ESTs identified by Swanson et al. (2001a) showed a signature consistent with adaptive evolution by virtue of having a dN/dS ratio greater than one.

Although the nature and evolution of several reproductive molecules contributed by the male have been studied in detail, relatively little is known about the evolution of female reproductive molecules. The few cases studied so far suggest that adaptive evolution may also occur in female reproductive molecules. Positive selection on female reproductive molecules has been detected in mammals (Swanson et al. 2001b, 2003; Jansa et al. 2003) and abalone (Galindo et al. 2003). Here we present the first systematic attempt to identify genes encoding female reproductive proteins in Drosophila and to initiate evolutionary analyses of several such genes.

To this end, we have undertaken an evolutionary EST screen of the reproductive tract of female Drosophila. Proteins produced in the female reproductive tract carry out a variety of important physiological functions. Processes such as sperm storage, control of oogenesis and ovulation, and control over remating rate are likely to involve interactions between female molecules and molecules transferred from the male to the female. For example, the male seminal fluid proteins Acp36DE and Acp62F localize to the sperm storage organs following mating (Neubaum and Wolfner 1999; Lung et al. 2002), Acp26Aa localizes to the base of the ovary (Heifetz et al. 2000), and sex peptide (Acp70A) binds to receptors in the female genital tract (Ottiger et al. 2000). Thus, we expect that some proteins expressed in the female reproductive tract will interact molecularly with Acps, sperm, or other components of the male ejaculate. Molecules secreted into the female reproductive tract may also carry out a variety of functions, such as egg activation, lubrication, or defense against pathogens, that do not necessitate any molecular contribution from the male (Wolfner et al. 2004).

Our first goal in carrying out this EST screen was to identify a suite of genes whose products can be considered candidate female reproductive molecules. Since a recurring observation about reproductive proteins is that many show adaptive divergence (Swanson and Vacquier 2002), we also incorporate evolutionary information into our screen by deriving ESTs from D. simulans (a close relative of D. melanogaster) and aligning them to their putative orthologs in the completed D. melanogaster genome (Adams et al. 2000). We identified 526 genes that show enriched expression in the female reproductive tract, 169 of which encode predicted extracellular or cell surface molecules that could interact with male proteins during reproduction.

Our second goal, given the interspecific amino acid sequence diversity that has been observed for Drosophila male accessory gland genes (Tsaur and Wu 1997; Aguadé 1998, 1999; Tsaur et al. 1998; Begun et al. 2000; Swanson et al. 2001a; Kern et al. 2004), was to determine if there is a similar level of diversity among female Drosophila reproductive molecules. Analysis of nucleotide sequence polymorphism within and/or divergence among Drosophila species reveal statistically robust evidence that at least six genes expressed in the female reproductive tract show signs consistent with having been subjected to positive selection and identify 25 additional candidates that may also show adaptive evolution upon further analysis. The identification of genes involved in male-female interactions during reproduction should provide important molecular insight into sperm precedence (Parker 1970), sexual conflict (Rice 1996; Gavrilets 2000), or cryptic female choice (Eberhard 1996), processes that have been proposed to account for the adaptive evolution of reproductive proteins.


cDNA library preparation:

Total RNA was purified by the guanidinium isothiocyanate/CsCl method (MacDonald et al. 1987) from 600 female reproductive tracts minus ovaries (oviducts, uterus, parovaria, spermathecae, and seminal receptacle) that had been dissected from D. simulans of mixed aged adult flies from a bottle culture. mRNA was purified using QIAGEN (Valencia, CA) oligotex spin columns. Oligo(dT)-primed cDNA was synthesized using superscript reverse transcriptase and cloned into the pCMV-Sport6 vector (Invitrogen, San Diego). We did not perform in-solution subtractive hybridization or normalize the cDNA library because these methods typically result in truncated cDNAs, and we desired full-length cDNA for our evolutionary comparisons. The resulting library contained 130,000 CFUs, of which 99% were recombinant. The average insert size was 1.2 kb. Two sets of probes were utilized for differential hybridization. First, oligo(dT)-primed first-strand male cDNA was prepared from mixed age and mating status whole adult male D. simulans flies using Bethesda Research Laboratories (Gaithersburg, MD) superscript II reverse transcriptase incorporating 32P-labeled dCTP and then denatured at 65° for 30 min in 0.3 m NaOH. Second, a random-primed probe was generated from a mixture of RT-PCR products from the three female yolk protein genes from D. melanogaster: YP1, YP2, and YP3 (Barnett et al. 1980). These genes were screened out of the library since yolk protein RNAs are abundantly expressed in the fat body, which is associated with the reproductive tract (Barnett et al. 1980) (they are also expressed in the ovary, which was removed). Hybridization was for 18 hr at 65° in 5× SSPE, 5× Denhardt's, 0.5% SDS, 0.2 mg/ml salmon sperm DNA. Final washes were at 65°, 0.1× SSPE for 10 min. Sequencing was from QIAGEN purified plasmid DNA using ABI big dye terminator sequencing chemistry analyzed on an ABI 3100 automated sequencer. EST sequences are deposited in GenBank under accession nos. CO391819, CO392724, CO408479, and CO408480.

Polymorphism survey:

DNA was extracted using the PureGene DNA isolation kit from isofemale lines of D. melanogaster and D. simulans previously collected by C. Aquadro in Beltsville, Maryland. To maximize the power of our statistical tests, we focused our analyses on intron regions, which should maximize variation within and between species under neutrality. PCR primers and conditions are available as online supplementary material at http://www.genetics.org/supplemental/. PCR products were diluted eightfold with water and sequenced directly using ABI big dye terminator sequencing chemistry and analyzed on an ABI 3100 automated sequencer. Sequences are deposited in GenBank under accession nos. AY665365, AY665366, AY665367, AY665368, AY665369, AY665370, AY665371, AY665372, AY665373, AY665374, AY665375, AY665376, AY665377, AY665378, AY665379, AY665380, AY665381, AY665382, AY665383, AY665384, AY665385, AY665386, AY665387, AY665388, AY665389, AY665390, AY665391, AY665392, AY665393, AY665394, AY665395, AY665396.

Divergence study:

We assessed DNA sequence divergence among five to eight increasingly divergent species of Drosophila for five genes. For each we used either all or overlapping subsets of the following species: D. erecta, D. eugracilis, D. lutescens, D. melanogaster, D. pseudoobscura, D. simulans, D. teissieri, and D. yakuba (detailed in results). We used two tree topologies [differing only in the placement of D. erecta (Ko et al. 2003)] and the results were consistent. The two topologies were: (pseudoobscura, lutescens, (eugracilis, (erecta, ((teissieri, yakuba), (melanogaster, simulans))))) and (pseudoobscura, lutescens, (eugracilis, ((erecta, (teissieri, yakuba)), (melanogaster, simulans)))). Sequences for D. melanogaster and D. pseudoobscura were obtained from public databases (http://genome.ucsc.edu/). Stocks for the other species (except our own D. simulans) were obtained from the Drosophila Species Stock Center in Tucson, Arizona. Since the analyses are based upon coding regions, we amplified the coding sequence from cDNA. Total RNA was extracted from mixed-age females using Trizol Reagent (Invitrogen). Random decamer primed cDNA was synthesized using MMLV-Reverse Transcriptase (Ambion, Austin, TX). Primers were designed in conserved regions of the genes of interest, which were identified by aligning the D. melanogaster gene sequences with their tblastn best hits in the genome of D. pseudoobscura. PCR primers and conditions are available as online supplementary material at http://www.genetics.org/supplemental/. PCR products were purified using the QIAquick PCR purification kit (QIAGEN) and sequenced using an ABI 3700 sequencer (Macrogen). Sequences are deposited in GenBank under accession nos. AY665365, AY665366, AY665367, AY665368, AY665369, AY665370, AY665371, AY665372, AY665373, AY665374, AY665375, AY665376, AY665377, AY665378, AY665379, AY665380, AY665381, AY665382, AY665383, AY665384, AY665385, AY665386, AY665387, AY665388, AY665389, AY665390, AY665391, AY665392, AY665393, AY665394, AY665395, AY665396.

Evolutionary and bioinformatic analyses:

The D. simulans EST sequences were aligned against the D. melanogaster predicted coding sequences, and the alignment was used to calculate dN/dS ratios using the maximum-likelihood methods (Goldman and Yang 1994) implemented in the program PAML (Yang 2000). Assessment of the significance of excess dN over dS was determined as follows. dN and dS were estimated as two free parameters by maximum likelihood (L1). The likelihood was also calculated for the null model having dN equal to dS (L0). The negative of twice the difference in the log-likelihood obtained from these two models (−2[log(L0) − log(L1)]) was compared to the chi-square distribution with 1 d.f. For the polymorphism survey, Tajima's D (Tajima 1989), Fu and Li's D (Fu and Li 1993), and Fay and Wu's H (Fay and Wu 2000) were calculated using DnaSP4.0 (Rozas and Rozas 1999). Significance was determined by coalescent simulations with R (recombination) estimated from the data by the method of Hudson (1987). These three statistics for polymorphism data analyze the frequency of alleles (frequency spectrum) within the sample. The departures from neutrality include an excess of rare alleles (Tajima 1989; Fu and Li 1993) or an excess of high-frequency-derived alleles (Fay and Wu 2000). These specific departures are expected to be associated with recent selection acting at or near a locus. During a selective sweep, in the presence of recombination, linked variation is dragged toward fixation, resulting in an excess of high-frequency-derived mutations in regions flanking the target of selection. The fixation of the favored variant results in the elimination of polymorphism at sites immediately surrounding the selected site (size of region is dependent upon recombination and the strength of selection). As new mutations occur in this region after the sweep and drift upward in frequency, there is an initial excess of rare alleles since every new mutation produces a new allele. The time to return to an equilibrium frequency distribution is a function of the population size and can be quite slow for large populations.

For the divergence analyses, we used PAML (Yang 2000) to calculate the likelihood of a neutral model where no codons could have a dN/dS ratio > 1 (L0) and compared it to the likelihood of a model in which a subset of sites could have a dN/dS ratio > 1 (L1) (Yang and Bielawski 2000). The negative of twice the difference in the log-likelihood obtained from these two models (−2[log(L0) − log(L1)]) was compared to the chi-square distribution with degrees of freedom equal to the difference in number of estimated parameters. Variation in the dN/dS ratio between sites was modeled using both discrete (PAML models M0 and M3) and β-(PAML models M7 and M8) distributions. We consider the comparison of model M0 and M3 to be a test for variation in the dN/dS ratio between sites and not a robust test of adaptive evolution. The comparison of M7 and M8 is a robust test of adaptive evolution. To determine if the dN/dS ratio significantly exceeds 1, we compared the M8 model to the likelihood of a model (M8A) with the additional proportion of sites fixed at a dN/dS ratio of 1 (Swanson et al. 2003). Details of the distributions and test statistics can be found in Yang et al. (2000). Signal sequences were predicted using the program SignalP (http://www.cbs.dtu.dk/services/SignalP-2.0/; Nielsen et al. 1997). Transmembrane regions were predicted using the TMHMM methods (Sonnhammer et al. 1998), using the TMHMM server (http://www.cbs.dtu.dk/services/TMHMM-2.0/).


Evolutionary EST screen identifies candidate female reproductive genes:

We constructed a cDNA library from dissected D. simulans female reproductive tracts minus ovaries. Ovaries were excluded because they express a diverse array of transcripts important for embryonic development, and we wished to enrich our cDNA library for candidate molecules expressed in, or secreted from, reproductive epithelia. We performed a differential hybridization screen of our cDNA library with 32P-labeled cDNA made from whole adult male D. simulans. Low and nonhybridizing clones were selected for further analysis to enrich the collection of ESTs to be analyzed for those with predominant expression in female reproductive tracts (although transcripts expressed at low levels in both sexes are still present). It is important to note the possibility that not all proteins important in reproduction are female specific or enriched. As such, our approach may have screened out some non-sex-specific genes whose products, in females, interact with male proteins. However, the need to screen out abundant general molecules like actin, tubulin, etc., made it critical to include this step in our screen. We selected 960 clones for sequencing. Of these, we were able to obtain sequence reads of >100 bp for 908 clones. These were used for further analyses.

The 908 ESTs corresponded to 526 independent genes. We focused on genes predicted to encode extracellular or cell surface molecules, since they could potentially be receptors or binding partners for Acps or sperm or be involved in male-independent extracellular processes. We used a bioinformatics approach to identify genes encoding proteins with a predicted secretory signal sequence and/or transmembrane domains. The identification of a signal sequence relies on a correct prediction of the first coding exon. Since initial exons are notoriously difficult to predict (Davuluri et al. 2001) and some proteins have internal secretory signals, we also included genes containing one or more predicted transmembrane regions. Thirty-five encoded proteins with a predicted signal sequence and transmembrane domain, 75 had just a predicted signal sequence, and 59 had predicted transmembrane domains but no predicted signal sequence.

Several male reproductive proteins show the molecular signature of adaptive evolution, and several hypotheses to account for that rapid evolution would predict a similar pattern for the female proteins with which they interact. We thus incorporated evolutionary information into our screen by deriving our ESTs from D. simulans, which allows comparison of them to their putative orthologs in the completed D. melanogaster genome. We then calculated the rate of synonymous (silent, dS) and nonsynonymous substitution (amino acid replacement changes, dN) using maximum-likelihood methods (Goldman and Yang 1994). The average dN/dS ratio of the 461 protein-coding ESTs is 0.15 ± 0.25 (with the average dN being 0.013 and dS being 0.091).

The signature of adaptive evolution is a dN/dS ratio significantly exceeding 1, as equal numbers of nonsynonymous and synonymous substitutions, normalized to the number of possible nonsynonymous and synonymous changes in the gene, are expected under strict neutrality. Our goal here, however, is to use a genomic screen to identify candidate genes that have been subjected to positive selection, possibly at only a small subset of their codons. For example, mammalian egg coat proteins (ZP proteins) have an overall dN/dS ratio of ~0.5, but upon detailed analysis incorporating variation in the dN/dS ratio between sites using maximum likelihood (Yang et al. 2000) it can be demonstrated that these genes are subjected to positive selection (Swanson et al. 2001b) with a class of codons having a dN/dS ratio > 1. We therefore surveyed the literature for articles utilizing the method of Yang et al. (2000) for detecting adaptive evolution through analysis for variation in the dN/dS ratio between sites. We have plotted the proportion of genes with evidence of positive selection in relation to their overall dN/dS ratio in Figure 1. At a dN/dS ratio >0.5, 19 of 20 genes analyzed showed statistical evidence for adaptive evolution, suggesting this may be a reasonable value to identify candidate genes that may have been subjected to adaptive evolution. The genes in Figure 1 that fall between a dN/dS ratio of 0.3–0.5 also include a high proportion that show statistical evidence for adaptive evolution upon closer examination; however, these may be overrepresented in our analyses due to the lack of reports detailing negative results (and thus they are not included in our analysis). The genes, references, and summary information are available as online supplementary material at http://www.genetics.org/supplemental/ for the 70 genes analyzed in Figure 1. Although only 25% of the 70 genes reported failed to show statistical evidence for adaptive evolution in subsequent PAML analysis, the proportion of genes under positive selection is surely overestimated due to the lack of reports that failed to detect adaptive evolution. Nonetheless, genes with an overall dN/dS ratio >0.5 are more likely to have been subjected to adaptive evolution and are thus good candidates for further study. In our EST screen, 27 out of the total of 461 protein-coding genes have dN/dS ratios >0.5 (Figure 2), including eight of the candidate receptor proteins (containing signal sequences and/or transmembrane regions; Table 1).

Figure 1.
Analysis of 70 genes, from published research articles on detecting adaptive evolution by analysis for variation in the dN/dS ratio between sites by the method of Yang et al. (2000). Additional information and references can be found as online supplementary ...
Figure 2.
Plot of dN vs. dS for the 461 D. simulans ESTs that matched protein-coding regions of D. melanogaster genes. The solid line is the neutral expectation of dN/dS = 1. The dashed line is the cutoff of dN/dS = 0.5 used to identify candidate genes that may ...
Classification of ESTs based upon evolutionary and bioinformatics analyses

Some of the genes identified by this female reproductive tract evolutionary EST approach have predicted ORF sequences consistent with likely functions for Drosophila reproductive proteins. Sixteen predicted peptidases and eight predicted protease inhibitors were found. At least two Drosophila male seminal fluid proteins that are transferred to females undergo proteolytic cleavage (Monsma et al. 1990; Bertram et al. 1996), and in at least one case this cleavage is dependent on contributions from the female as well as the male (Park and Wolfner 1995). Although the nature of the female contribution is unknown, it could involve proteases (to cleave) and protease inhibitors (to confine cleavage to appropriate sites in the protein) such as the predicted ones identified here. Additionally, there are 47 different proteins with putative transporter activity and 11 different putative signal tranducer genes that could be involved in regulating the mated female's physiology (Table 2). For example, it has been hypothesized that a transporter moves the Acp70a (sex peptide) from the reproductive tract to the hemolymph, where it binds receptors in the nervous system of the female (Ding et al. 2003). Finally, there are several genes predicted to be involved in defense or immunity. These candidates are all prime targets for functional analyses. A summary of the molecular functions based upon the gene ontology classification (Ashburner et al. 2000) is provided in Table 2. Details of all genes identified in our screen can be found as online supplementary material at http://www.genetics.org/supplemental/.

Gene Ontology Functions

Divergence and polymorphism studies demonstrate adaptive evolution:

The evolutionary EST approach utilized here (isolating ESTs from one organism and comparing to the completed genome of a close relative; Swanson et al. 2001a) is aimed at identifying candidate genes for further tests for adaptive evolution. Each individual prediction of adaptive evolution needs to be independently verified. To test if any of the candidate genes identified herein have actually been subjected to positive selection, we performed a polymorphism survey of nine of the genes from D. melanogaster and D. simulans isofemale lines isolated from Maryland (Table 3) and divergence analyses on five of the same genes in five to eight Drosophila species (Table 4). Genes were chosen on the basis of predicted extracellular localization of the protein they encode and/or overall dN/dS ratio >0.5. For the polymorphism survey, we analyzed the frequency spectrum (i.e., analysis of proportion of alleles at high vs. low frequencies) of the polymorphisms for departures from equilibrium neutral expectations (Aquadro 1997). In particular, we analyzed for an excess of rare alleles (i.e., singletons; Tajima 1989; Fu and Li 1993) or an excess of high-frequency-derived polymorphisms (Fay and Wu 2000). Either pattern could have resulted from a recent selective sweep or a population bottleneck. To maximize the power of our statistical tests, we focused our analyses on intron regions, which should maximize variation within and between species under neutrality. We ruled out any genome-wide confounding effects, such as demographics (e.g., population bottleneck), on these statistics, since three loci (Table 3) and additional unpublished studies of these samples (C. F. Aquadro, unpublished results) conform to equilibrium neutral expectations. We performed polymorphism surveys for nine loci and found evidence for selective sweeps at six of these loci (Table 3), suggesting the recent action of positive selection at or near these genes. Our results are bolstered by finding evidence for recent selective events using multiple statistics that utilize different regions of the frequency spectrum (i.e., high and low frequency). The genes under positive selection by this analysis include two putative proteases, a predicted transmembrane receptor, and three genes with unknown function.

Polymorphism survey identifies positive selection in several candidate genes
Detection of positive selection by maximum-likelihood analysis

For the divergence studies, we sequenced from several additional Drosophila species five of the genes identified from our polymorphism analysis as having evidence for a recent selective sweep in D. melanogaster and/or D. simulans. We then analyzed the sequence data using maximum-likelihood methods (Nielsen and Yang 1998; Yang et al. 2000) to detect variation in the dN/dS ratio between sites. Divergence analyses were not performed on CG17108 due to the biased amino acid and codon usage seen in this gene, which may induce errors in parameter estimations using codon models. Whereas the polymorphism-based tests are capable of detecting recent selection in a single species, the divergence analyses can detect repeated episodes of positive selection on the same codons in several species. A significant result using these latter methods suggests that a subset of codons in a gene has been subjected to positive selection in several species. We find evidence of variation in the dN/dS ratio for all five genes using the discrete model M3. Four of these genes have a class of sites with dN/dS > 1. These four genes are still considered as only candidates for adaptive evolution since using a discrete model with three classes of dN/dS ratios compared to a single overall average dN/dS ratio is not a robust test of adaptive evolution (Swanson et al. 2001b) and should be considered as only a test for variable dN/dS ratios between sites. Using a more refined test with a beta distribution of dN/dS for “neutral” or functionally constrained codons that covers the interval 0–1, we find evidence of positive selection acting upon a subset of codons for two of the five genes studied (Table 4). In both cases the sites in this extra class have dN/dS ratios significantly >1, since a model (M8) with a freely estimated extra class is significantly better than a model where the extra class has a dN/dS ratio fixed at 1 (M8A; Table 4). One gene (CG3066) is a predicted trypsin-like serine protease. Several of the codons inferred to be under positive selection in this gene lie within the predicted trypsin catalytic domain. Furthermore, several putatively selected codons lie in the predicted clip domain, which may be involved in protein-protein interactions (Jiang and Kanost 2000). The second gene (CG16707) does not belong to any predicted functional class.


Adaptive evolution is becoming an increasingly common observation in the study of reproductive proteins. The vast majority of studies have focused on male-derived factors (Swanson and Vacquier 2002), perhaps in part because these are easier to characterize and more have been identified. However, it is clear that female genotype plays an important role during reproduction. In mammals, genes encoding the egg coat proteins ZP2 and ZP3 have been demonstrated to undergo adaptive evolution. Several of the sites predicted to be subjected to adaptive evolution are in regions implicated in sperm-egg binding (Swanson et al. 2001b; Jansa et al. 2003), indicating the selective pressure may relate to sperm-egg interaction. In Drosophila, it has been demonstrated that females play an important role in sperm competition (Price 1997; Clark and Begun 1998; Clark et al. 1999). The class of genes studied in this article includes several genes expressed in the female reproductive tract and subjected to adaptive evolution. The identification of candidate genes encoding Drosophila female reproductive proteins is a crucial step toward understanding, at the molecular level, the male and female interactions that occur during reproduction.

Proteins transferred with sperm to the female during copulation significantly influence the mated female's behavior and physiology in some animals, such as insects, as well as the reproductive success of the participant gametes (Wolfner 1997, 2002). The rapid amino acid sequence divergence of some of these male accessory gland proteins (Tsaur and Wu 1997; Aguadé 1998, 1999; Tsaur et al. 1998; Begun et al. 2000; Swanson et al. 2001a) begs explanation, and several competing (though not mutually exclusive) evolutionary hypotheses have been proposed to explain the rapid evolution (Parker 1970; Eberhard 1996; Rice 1996; Gavrilets et al. 2000; Swanson and Vacquier 2002). Evaluation of these hypotheses, and of the mechanism of action of these proteins, requires knowledge of the proteins in the female with which the male accessory gland proteins interact. To identify genes that could encode such Acp-interacting or -regulated proteins, we carried out an evolutionary EST screen of the female reproductive tract in D. simulans and D. melanogaster and identified 908 ESTs corresponding to 526 independent genes. These genes encode proteins predicted to mediate diverse biological functions and include a number of candidates for proteins in position to interact with Acps (by virtue of being secreted or having transmembrane domains). This screen complements a previous evolutionary EST screen of the male accessory gland (Swanson et al. 2001a). Together these screens provide two sets of genes that likely include partners in molecular interactions that modulate reproductive success in these species.

Of the genes we identified here from the female reproductive tract, 461 contained sufficient protein-coding sequence in the D. simulans EST to make a comparison of nonsynonymous (dN) and synonymous (dS) substitutions between species. Twenty-seven proteins had a ratio of nonsynonymous to synonymous substitutions >0.5, a level that we argue is a useful cutoff to identify genes likely to show evidence of positive selection on further more detailed analysis. Nine candidate proteins with signal sequences and/or transmembrane domains, including two with elevated levels of dN/dS substitution ratios, were further examined for evidence of recent positive selection by analysis of DNA sequence polymorphism in population samples of both D. melanogaster and D. simulans. Six of the nine genes showed evidence of a recent adaptive fixation at or near the candidate locus (Table 3). Subsequent analysis of sequence divergence at five of these genes among five to eight species of Drosophila revealed significant evidence for positive selection accelerating amino acid sequence divergence at between 3 and 9% of the codons in two genes. One of these two genes encodes a serine protease, while the other encodes a protein of unknown function (Table 4).

It is worth noting that the two types of evolutionary analyses utilized here (polymorphism and divergence) are most powerful at detecting different kinds of adaptive evolution. The polymorphism surveys are most powerful at detecting recent selective events (Simonsen et al. 1995). The divergence analyses are most powerful when recurrent selection acts upon a subset of codons over most lineages studied (Anisimova et al. 2001). Importantly, detection of nonneutral patterns by either method should be considered evidence of adaptive evolution. The selective pressure driving the divergence of these genes remains unknown, and determination of the function of the molecules should shed light on the selective pressures.

As a group, the genes chosen as candidates on the basis of the presence of a signal sequence and/or transmembrane region have a level of nonsynonymous sequence divergence (dN) that is 50% greater than that of the noncandidate reproductive genes. While the level of synonymous divergence (dS) of the candidate genes is consistent with the value expected for this species pair (~0.10; Bauer and Aquadro 1997), it is greater than that of the noncandidate loci by 25%. This difference in dS likely reflects the previously seen positive correlation between protein sequence conservation and synonymous site divergence reported by Akashi et al. (1996). The increased dN is similar to that observed for the male accessory gland genes (Swanson et al. 2001a). Although a lower proportion of ESTs with dN/dS ratios >0.5 was observed among the female reproductive tract ESTs reported here than among male accessory gland ESTs from (Swanson et al. 2001a) analysis (6% female vs. 19% male Acp), the total number of genes with dN/dS ratios >0.5 was similar (27 female vs. 33 male Acp). The polymorphism surveys of genes expressed in the female reproductive tract were consistent with positive selection for six out of nine genes analyzed (Table 3). This is a higher proportion than that for surveys of male accessory glands, in which statistical departures from neutrality based largely on frequency distributions were observed for three out of nine comparisons in D. melanogaster (Begun et al. 2000) and zero out of seven comparisons in D. simulans (Kern et al. 2004). Variance in rates across lineages is consistent with positive selection in some Acp genes in the latter study. The discrepancy may be due to differences in selective pressures, sample sizes (the female polymorphism survey had more individuals), or populations analyzed [female was cosmopolitan (this study) and male was African].

The genes identified here are likely candidates to be the ones encoding molecules made in the female reproductive tracts that may interact with male-derived factors and should be the target of future functional analyses. Prediction of candidate genes encoding reproductive proteins will facilitate their functional characterization through allelic association studies and biochemical and genetic characterization. It is likely that some of the genes identified in this screen are involved in female-specific functions, such as egg activation, lubrication, or immunity. Moreover, some of the genes identified here as having been subjected to positive selection may prove to be binding partners of male reproductive proteins, including the Acps and sperm surface proteins. It will be of particular interest to determine if the ligand and its receptor show similar evolutionary dynamics. These studies will help provide molecular insights into sperm precedence (Parker 1970), sexual conflict (Rice 1996; Gavrilets 2000), and female choice (Eberhard 1996).


We thank J. Rozas for a beta release of DnaSP 4.0, Jennifer Calkins for help with Figure 1, Lawrence Harshman, Andy Clark, members of the Swanson, Aquadro, and Wolfner labs, and reviewers for thoughtful suggestions. Support was provided by National Institutes of Health (NIH) grant HD42563, National Science Foundation grant DEB-0111613, and NIH National Research Service Award postdoctoral fellowship GM20889 (to W.J.S.); and NIH grants GM036431 (to C.F.A.) and HD38921 (to M.F.W.). A.W. is a Howard Hughes Medical Institute predoctoral fellow.


  • Adams, M. D., S. E. Celniker, R. A. Holt, C. A. Evans, J. D. Gocayne et al., 2000. The genome sequence of Drosophila melanogaster. Science 287: 2185–2195. [PubMed]
  • Aguadé, M., 1998. Different forces drive the evolution of the Acp26Aa and Acp26Ab accessory gland genes in the Drosophila melanogaster species complex. Genetics 150: 1079–1089. [PMC free article] [PubMed]
  • Aguadé, M., 1999. Positive selection drives the evolution of the Acp29AB accessory gland protein in Drosophila. Genetics 152: 543–551. [PMC free article] [PubMed]
  • Aguadé, M., N. Miyashita and C. H. Langley, 1992. Polymorphism and divergence in the Mst26A male accessory gland gene region in Drosophila. Genetics 132: 755–770. [PMC free article] [PubMed]
  • Akashi, A., S. Ono, K. Kuwano and S. Arai, 1996. Proteins of 30 and 36 kilodaltons, membrane constituents of the Staphylococcus aureus L form, induce production of tumor necrosis factor alpha and activate the human immunodeficiency virus type 1 long terminal repeat. Infect. Immun. 64: 3267–3272. [PMC free article] [PubMed]
  • Anisimova, M., J. P. Bielawski and Z. Yang, 2001. Accuracy and power of the likelihood ratio test in detecting adaptive molecular evolution. Mol. Biol. Evol. 18: 1585–1592. [PubMed]
  • Aquadro, C. F., 1997. Insights into the evolutionary process from patterns of DNA sequence variability. Curr. Opin. Genet. Dev. 7: 835–840. [PubMed]
  • Ashburner, M., C. A. Ball, J. A. Blake, D. Botstein, H. Butler et al., 2000. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 25: 25–29. [PMC free article] [PubMed]
  • Barnett, T., C. Pachl, J. P. Gergen and P. C. Wensink, 1980. The isolation and characterization of Drosophila yolk protein genes. Cell 21: 729–738. [PubMed]
  • Bauer, V. L., and C. F. Aquadro, 1997. Rates of DNA sequence evolution are not sex-biased in Drosophila melanogaster and D. simulans. Mol. Biol. Evol. 14: 1252–1257. [PubMed]
  • Begun, D. J., P. Whitley, B. L. Todd, H. M. Waldrip-Dail and A. G. Clark, 2000. Molecular population genetics of male accessory gland proteins in Drosophila. Genetics 156: 1879–1888. [PMC free article] [PubMed]
  • Bertram, M. J., D. M. Neubaum and M. F. Wolfner, 1996. Localization of the Drosophila male accessory gland protein Acp36DE in the mated female suggests a role in sperm storage. Insect Biochem. Mol. Biol. 26: 971–980. [PubMed]
  • Chapman, T., L. F. Liddle, J. M. Kalb, M. F. Wolfner and L. Partridge, 1995. Cost of mating in Drosophila melanogaster females is mediated by male accessory gland products. Nature 373: 241–244. [PubMed]
  • Chapman, T., J. Bangham, G. Vinti, B. Seifried, O. Lung et al., 2003. The sex peptide of Drosophila melanogaster: female post-mating responses analyzed by using RNA interference. Proc. Natl. Acad. Sci. USA 100: 9923–9928. [PMC free article] [PubMed]
  • Chen, P. S., E. Stumm-Zollinger, T. Aigaki, J. Balmer, M. Bienz et al., 1988. A male accessory gland peptide that regulates reproductive behavior of female Drosophila melanogaster. Cell 54: 291–298. [PubMed]
  • Civetta, A., and R. S. Singh, 1995. High divergence of reproductive tract proteins and their association with postzygotic reproductive isolation in Drosophila melanogaster and Drosophila virilis group species. J. Mol. Evol. 41: 1085–1095. [PubMed]
  • Clark, A. G., and D. J. Begun, 1998. Female genotypes affect sperm displacement in Drosophila. Genetics 149: 1487–1493. [PMC free article] [PubMed]
  • Clark, A. G., D. J. Begun and T. Prout, 1999. Female x male interactions in Drosophila sperm competition. Science 283: 217–220. [PubMed]
  • Davuluri, R. V., I. Grosse and M. Q. Zhang, 2001. Computational identification of promoters and first exons in the human genome. Nat. Genet. 29: 412–417. [PubMed]
  • Ding, Z., I. Haussmann, M. Ottiger and E. Kubli, 2003. Sex-peptides bind to two molecularly different targets in Drosophila melanogaster females. J. Neurobiol. 55: 372–384. [PubMed]
  • Eberhard, W. G., 1996 Female Control: Sexual Selection by Cryptic Female Choice. Princeton University Press, Princeton, NJ.
  • Fay, J. C., and C.-I Wu, 2000. Hitchhiking under positive Darwinian selection. Genetics 155: 1405–1413. [PMC free article] [PubMed]
  • Fu, Y. X., and W. H. Li, 1993. Statistical tests of neutrality of mutations. Genetics 133: 693–709. [PMC free article] [PubMed]
  • Galindo, B. E., V. D. Vacquier and W. J. Swanson, 2003. Positive selection in the egg receptor for abalone sperm lysin. Proc. Natl. Acad. Sci. USA 100: 4639–4643. [PMC free article] [PubMed]
  • Gavrilets, S., 2000. Rapid evolution of reproductive barriers driven by sexual conflict. Nature 403: 886–889. [PubMed]
  • Gavrilets, S., R. Acton and J. Gravner, 2000. Dynamics of speciation and diversification in a metapopulation. Evolution 54: 1493–1501. [PubMed]
  • Goldman, N., and Z. Yang, 1994. A codon-based model of nucleotide substitution for protein-coding DNA sequences. Mol. Biol. Evol. 11: 725–736. [PubMed]
  • Heifetz, Y., O. Lung, E. A. Frongillo, Jr. and M. F. Wolfner, 2000. The Drosophila seminal fluid protein Acp26Aa stimulates release of oocytes by the ovary. Curr. Biol. 10: 99–102. [PubMed]
  • Heifetz, Y., U. Tram and M. F. Wolfner, 2001. Male contributions to egg production: the role of accessory gland products and sperm in Drosophila melanogaster. Proc. R. Soc. Lond. Ser. B Biol. Sci. 268: 175–180. [PMC free article] [PubMed]
  • Herndon, L. A., and M. F. Wolfner, 1995. A Drosophila seminal fluid protein, Acp26Aa, stimulates egg laying in females for 1 day after mating. Proc. Natl. Acad. Sci. USA 92: 10114–10118. [PMC free article] [PubMed]
  • Hudson, R. R., 1987. Estimating the recombination parameter of a finite population model without selection. Genet. Res. 50: 245–250. [PubMed]
  • Jansa, S. A., B. L. Lundrigan and P. K. Tucker, 2003. Tests for positive selection on immune and reproductive genes in closely related species of the murine genus mus. J. Mol. Evol. 56: 294–307. [PubMed]
  • Jiang, H., and M. R. Kanost, 2000. The clip-domain family of serine proteinases in arthropods. Insect Biochem. Mol. Biol. 30: 95–105. [PubMed]
  • Kern, A. D., C. D. Jones and D. J. Begun, 2004. Molecular population genetics of male accessory gland proteins in the Drosophila simulans complex. Genetics 167: 725–735. [PMC free article] [PubMed]
  • Ko, W. Y., R. M. David and H. Akashi, 2003. Molecular phylogeny of the Drosophila melanogaster species subgroup. J. Mol. Evol. 57: 562–573. [PubMed]
  • Liu, H., and E. Kubli, 2003. Sex-peptide is the molecular basis of the sperm effect in Drosophila melanogaster. Proc. Natl. Acad. Sci. USA 100: 9929–9933. [PMC free article] [PubMed]
  • Lung, O., U. Tram, C. M. Finnerty, M. A. Eipper-Mains, J. M. Kalb et al., 2002. The Drosophila melanogaster seminal fluid protein Acp62F is a protease inhibitor that is toxic upon ectopic expression. Genetics 160: 211–224. [PMC free article] [PubMed]
  • MacDonald, R. J., G. H. Swift, A. E. Przybyla and J. M. Chirgwin, 1987. Isolation of RNA using guanidinium salts. Methods Enzymol. 152: 219–227. [PubMed]
  • Monsma, S. A., H. A. Harada and M. F. Wolfner, 1990. Synthesis of two Drosophila male accessory gland proteins and their fate after transfer to the female during mating. Dev. Biol. 142: 465–475. [PubMed]
  • Neubaum, D. M., and M. F. Wolfner, 1999. Mated Drosophila melanogaster females require a seminal fluid protein, Acp36DE, to store sperm efficiently. Genetics 153: 845–857. [PMC free article] [PubMed]
  • Nielsen, H., J. Engelbrecht, S. Brunak and G. von Heijne, 1997. A neural network method for identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites. Int. J. Neural Syst. 8: 581–599. [PubMed]
  • Nielsen, R., and Z. Yang, 1998. Likelihood models for detecting positively selected amino acid sites and applications to the HIV-1 envelope gene. Genetics 148: 929–936. [PMC free article] [PubMed]
  • Ottiger, M., M. Soller, R. F. Stocker and E. Kubli, 2000. Binding sites of Drosophila melanogaster sex peptide pheromones. J. Neurobiol. 44: 57–71. [PubMed]
  • Park, M., and M. F. Wolfner, 1995. Male and female cooperate in the prohormone-like processing of a Drosophila melanogaster seminal fluid protein. Dev. Biol. 171: 694–702. [PubMed]
  • Parker, G. A., 1970. Sperm competition and its evolutionary consequences in the insects. Biol. Rev. 45: 525–567.
  • Price, C. S., 1997. Conspecific sperm precedence in Drosophila. Nature 388: 663–666. [PubMed]
  • Rice, W. R., 1996. Sexually antagonistic male adaptation triggered by experimental arrest of female evolution. Nature 381: 232–234. [PubMed]
  • Rozas, J., and R. Rozas, 1999. DnaSP version 3: an integrated program for molecular population genetics and molecular evolution analysis. Bioinformatics 15: 174–175. [PubMed]
  • Simonsen, K. L., G. A. Churchill and C. F. Aquadro, 1995. Properties of statistical tests of neutrality for DNA polymorphism data. Genetics 141: 413–429. [PMC free article] [PubMed]
  • Soller, M., M. Bownes and E. Kubli, 1997. Mating and sex peptide stimulate the accumulation of yolk in oocytes of Drosophila melanogaster. Eur. J. Biochem. 243: 732–738. [PubMed]
  • Soller, M., M. Bownes and E. Kubli, 1999. Control of oocyte maturation in sexually mature Drosophila females. Dev. Biol. 208: 337–351. [PubMed]
  • Sonnhammer, E. L., G. von Heijne and A. Krogh, 1998. A hidden Markov model for predicting transmembrane helices in protein sequences. Proc. Int. Conf. Intell. Syst. Mol. Biol. 6: 175–182. [PubMed]
  • Swanson, W. J., and V. D. Vacquier, 2002. Rapid evolution of reproductive proteins. Nat. Rev. Genet. 3: 137–144. [PubMed]
  • Swanson, W. J., A. G. Clark, H. M. Waldrip-Dail, M. F. Wolfner and C. F. Aquadro, 2001. a Evolutionary EST analysis identifies rapidly evolving male reproductive proteins in Drosophila. Proc. Natl. Acad. Sci. USA 98: 7375–7379. [PMC free article] [PubMed]
  • Swanson, W. J., Z. Yang, M. F. Wolfner and C. F. Aquadro, 2001. b Positive Darwinian selection drives the evolution of several female reproductive proteins in mammals. Proc. Natl. Acad. Sci. USA 98: 2509–2514. [PMC free article] [PubMed]
  • Swanson, W. J., R. Nielsen and Q. Yang, 2003. Pervasive adaptive evolution in mammalian fertilization proteins. Mol. Biol. Evol. 20: 18–20. [PubMed]
  • Tajima, F., 1989. Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics 123: 585–595. [PMC free article] [PubMed]
  • Tram, U., and M. F. Wolfner, 1999. Male seminal fluid proteins are essential for sperm storage in Drosophila melanogaster. Genetics 153: 837–844. [PMC free article] [PubMed]
  • Tsaur, S. C., and C.-I Wu, 1997. Positive selection and the molecular evolution of a gene of male reproduction, Acp26Aa of Drosophila. Mol. Biol. Evol. 14: 544–549. [PubMed]
  • Tsaur, S. C., C. T. Ting and C.-I Wu, 1998. Positive selection driving the evolution of a gene of male reproduction, Acp26Aa, of Drosophila: II. Divergence versus polymorphism. Mol. Biol. Evol. 15: 1040–1046. [PubMed]
  • Wolfner, M. F., 1997. Tokens of love: functions and regulation of Drosophila male accessory gland products. Insect Biochem. Mol. Biol. 27: 179–192. [PubMed]
  • Wolfner, M. F., 2002. The gifts that keep on giving: physiological functions and evolutionary dynamics of male seminal proteins in Drosophila. Heredity 88: 85–93. [PubMed]
  • Wolfner, M. F., S. Applebaum and Y. Heifetz, 2004 Insect gonadal glands and their gene products in Comprehensive Insect Physiology, Biochemistry, Pharmacology and Molecular Biology, edited by L. Gilbert, K. Iatrou and S. Gill. Elsevier, Amsterdam/New York.
  • Xue, L., and M. Noll, 2000. Drosophila female sexual behavior induced by sterile males showing copulation complementation. Proc. Natl. Acad. Sci. USA 97: 3272–3275. [PMC free article] [PubMed]
  • Yang, Z., 2000 Phylogenetic Analysis by Maximum Likelihood (PAML). University College London, London.
  • Yang, Z., and J. P. Bielawski, 2000. Statistical methods for detecting molecular adaptation. Trends Ecol. Evol. 15: 496–503. [PubMed]
  • Yang, Z., R. Nielsen, N. Goldman and A. M. Pedersen, 2000. Codon-substitution models for heterogeneous selection pressure at amino acid sites. Genetics 155: 431–449. [PMC free article] [PubMed]

Articles from Genetics are provided here courtesy of Genetics Society of America
PubReader format: click here to try