• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of plntphysLink to Publisher's site
Plant Physiol. Apr 2005; 137(4): 1211–1227.
PMCID: PMC1088315

Sequencing and Analysis of Common Bean ESTs. Building a Foundation for Functional Genomics1,[w]


Although common bean (Phaseolus vulgaris) is the most important grain legume in the developing world for human consumption, few genomic resources exist for this species. The objectives of this research were to develop expressed sequence tag (EST) resources for common bean and assess nodule gene expression through high-density macroarrays. We sequenced a total of 21,026 ESTs derived from 5 different cDNA libraries, including nitrogen-fixing root nodules, phosphorus-deficient roots, developing pods, and leaves of the Mesoamerican genotype, Negro Jamapa 81. The fifth source of ESTs was a leaf cDNA library derived from the Andean genotype, G19833. Of the total high-quality sequences, 5,703 ESTs were classified as singletons, while 10,078 were assembled into 2,226 contigs producing a nonredundant set of 7,969 different transcripts. Sequences were grouped according to 4 main categories, metabolism (34%), cell cycle and plant development (11%), interaction with the environment (19%), and unknown function (36%), and further subdivided into 15 subcategories. Comparisons to other legume EST projects suggest that an entirely different repertoire of genes is expressed in common bean nodules. Phaseolus-specific contigs, gene families, and single nucleotide polymorphisms were also identified from the EST collection. Functional aspects of individual bean organs were reflected by the 20 contigs from each library composed of the most redundant ESTs. The abundance of transcripts corresponding to selected contigs was evaluated by RNA blots to determine whether gene expression determined by laboratory methods correlated with in silico expression. Evaluation of root nodule gene expression by macroarrays and RNA blots showed that genes related to nitrogen and carbon metabolism are integrated for ureide production. Resources developed in this project provide genetic and genomic tools for an international consortium devoted to bean improvement.

Common bean (Phaseolus vulgaris) is the most important grain legume for direct human consumption; it comprises 50% of the grain legumes consumed worldwide (McClean et al., 2004). Total production exceeds 23 million metric tons, of which 7 million metric tons are produced in Latin America and Africa (Food and Agriculture Organization of the United Nations, 2001). Diets in countries from Latin America and eastern Africa often contain sufficient carbohydrates (through cereals such as maize, rice, and wheat), but are poor in proteins. Dietary proteins can be found in scarce animal products but are usually derived from legumes. In several countries, such as Mexico and Brazil, common bean is important as a primary source of dietary protein (Broughton et al., 2003). Common bean is one of the most ancient crops in the Americas. A nucleus of diversity of common bean is located in Ecuador and northern Peru, from where beans are dispersed into South and Central America, where domestication led to their separation and the formation of two distinct gene pools, the Andean and the Mesoamerican (Gepts, 1998).

Partial sequencing of cDNA inserts or expressed sequence tags (ESTs) obtained from many plant tissues and organs has been used as an effective method of gene discovery, molecular marker generation, and transcript pattern characterization. It is an efficient approach for identifying a large number of plant genes expressed during different developmental stages and in response to a variety of environmental conditions. In addition, once ESTs are generated, they provide a resource for transcript-profiling experiments. Currently, only the grasses surpass the legumes (Fabaceae family) for the number of publicly available ESTs. There are nearly 986,000 nucleotide sequences representing the Fabaceae family available from the National Center for Biotechnology Information (NCBI) taxonomy browser (October, 2004; http://www.ncbi.nlm.nih.gov/Taxonomy). Over 92% of the ESTs deposited for the Fabaceae family are derived from the model legumes Medicago truncatula and Lotus japonicus and the crop legume soybean (Glycine max). Despite the importance of common beans as a crop legume, very little EST information is currently publicly available. Only 575 ESTs from common bean and 20,120 ESTs from the related species, runner bean (Phaseolus coccineus), have been deposited in GenBank's EST database. For this reason, we have undertaken a survey of the bean transcriptome by analyzing ESTs from diverse organs. Our research has been performed within the framework of Phaseomics, the international consortium for Phaseolus genomics (Broughton et al., 2003), developed to establish the necessary framework of knowledge and materials for the advancement of bean genomics, transcriptomics, and proteomics. A major goal of Phaseomics is to help generate new common bean varieties that are suitable and desired by farmers and consumers.

Nitrogen (N) and phosphorus (P) are critical macronutrients required for plant growth. In the bean-growing regions of the developing world, soils are frequently depleted in N and P (Graham and Vance, 2003). Moreover, N and P fertilizer use is limited due to high costs and poor infrastructure. Understanding and improving mechanisms that lead to improved N and P nutrition are critical to food production and security. While root nodule symbiosis and P nutrition have been research objectives in the genomics of model legumes M. truncatula and L. japonicus, these species are forage crops and indigenous to temperate regions (Handberg and Stougaard,1992; Cook, 1999). In addition, N and P nutrition have not been the major focus of soybean genomics research. In this article, we document the sequencing and contig assembly of more than 15,000 ESTs from organs of common bean. Our common bean EST project was originally initiated to develop EST profiles of N2-fixing root nodules and P-deficient roots. However, during the course of the project, it became apparent that EST resources also needed to be developed for common bean pods and leaves. We also report macroarray transcriptome analysis of root nodule contigs.


Features of Generated ESTs

In an effort to develop an EST platform for common bean, 5 cDNA libraries were constructed, 4 from the Mesoamerican cultivar Negro Jamapa and 1 from the Andean cultivar G19833. The sources of RNA to construct each library were pods, leaves, P-deficient roots, and nodules for the Mesoamerican genotype and leaves for the Andean genotype. Single-pass 5′ sequencing resulted in 3,400 to 4,900 ESTs from each of the Mesoamerican and Andean libraries (sequences deposited in GenBank, accession nos. CV528971CV544303). In addition, single-pass 3′ sequencing of the Andean genotype yielded an additional 854 sequences. In total, 21,026 ESTs were sequenced (Table I). This number includes the 575 common bean ESTs already present in GenBank. Between 19% and 33% of the sequenced ESTs from the 5 libraries were discarded and not considered for contig assembly due to low-quality sequence or the absence of insert in the clone. In addition, clones identified as chimeric or alternative splice products were not included in contig assembly. Redundant ESTs were grouped into contigs using the program Phrap (http://www.phrap.org/phredphrap/phrap.html). Of the total 15,781 EST sequences considered acceptable for contig assembly, 5,703 of these were classified as singletons and the remaining 10,078 assembled into 2,266 contigs ranging in EST redundancy from 2 to 264 (Table II). Library-specific contigs ranged from 44 to 228, depending upon the organ. Total contigs and singletons comprised a nonredundant gene set of 7,969 different transcripts. All EST sequences, contig images, single-nucleotide polymorphism (SNP), and gene family data analyses are available (http://www.ccg.unam.mx/phaseolusest/Data_download.htm; see also supplemental data online).

Table I.
Sequencing and contigging statistics of common bean ESTs
Table II.
Identification of tissue-specific contigs from common bean ESTs

Functional Annotation

To identify putative functions for genes encoding ESTs, BLASTX analysis was used to compare the common bean contigs and singletons to the Uniref 100 protein database (Apweiler et al., 2004). The 2,226 contigs were initially grouped into 4 main categories: metabolism (34%), cell cycle and plant development (11%), interaction with the environment (19%), and unknown function (36%). These were further subdivided into 15 subcategories, shown in Figure 1. The metabolism category was subdivided into genes from carbon (C)/energy, amino acid/protein, nucleic acid/nucleotide, fatty acid/lipid, and secondary metabolism, as well as nutrient assimilation and possible functions in other metabolic areas; the first two subcategories were most abundant. The cell cycle and development category was subdivided into genes for cell structure, differentiation, cell cycle, apoptosis, and plant development, nodulation, and senescence. The category of interaction with the environment was subdivided into genes involved in transport/membrane proteins, stress/defense, and signal transduction/regulation. In this category, genes involved in signal transduction/regulation were the most abundant. The unknown function category included genes with unknown function in plants, genes with homology to DNA or proteins with unknown function, and those with no hit found.

Figure 1.
Based on homology (E-values ≤10) the 2,226 contigs were grouped in 4 main categories, metabolism (34%), cell cycle and plant development (11%), interaction with the environment (19%), and unknown function (36%), and subdivided into 15 subcategories ...

Contigs Composed of Most Abundant ESTs

Analysis of EST frequency (abundance) comprising a contig and the source of the contig can provide insights with respect to gene expression levels and biochemical functions occurring in an organ or tissue. Therefore, to identify genes that were highly expressed in certain tissues, we identified the contigs that were most abundantly expressed in pods, leaves, P-deficient roots, and root nodules (Table III). The 20 contigs from each library composed of the most redundant ESTs are shown in Table III. Those contigs having ESTs from a single organ source are noted as specific. Given our methodology, contigs may appear in the top 20 of multiple tissues. A larger version of Table III, including the UniProt accession numbers, is available at our Web site (http://www.ccg.unam.mx/phaseolusest/Data_download.htm; see also supplemental data online).

Table III.
Common bean contigs composed of the most redundant ESTs

Since pods were collected over a range of maturity dates, contigs composed of abundant ESTs reflect genes involved in both pod and seed growth and development. Contigs related to seed traits have homology to albumins (UniProt accessions Q39837 and Q9ZQX0), lectin (Q8L683), lipoxygenases (O24320, P27481, and Q9FQF9), acid phosphatases (O49855), β-glucosidase (Q9XJ67), and lipid transfer proteins (O24440 and Q8W539). By comparison, contigs related to pod function included photosynthetic proteins such as chlorophyll a/b-binding proteins (Q39831, Q40512, Q43437, Q9LKI0, Q9LKI1, Q9SQL2, Q9XF89, and Q9XQB1), PSI reaction center protein (Q9S7N7), and storage protein (O23808). Unexpectedly, a contig annotated as nodulin 26 (contig 2,670, Q39882), which corresponds to a membrane transporter, contained 20 pod-derived ESTs. Nodulin 26 ESTs were also found in leaves and roots, but not in nodules. A contig containing numerous ESTs for alcohol dehydrogenase (contig 2,662, Q8LJR2) was also found in pods. Of the 20 contigs noted as those containing numerous ESTs in pods, 6 were pod specific.

The leaf contigs composed of the most abundant ESTs from both the Mesoamerican and Andean cultivars are shown in Table III. As expected, many contigs from leaf ESTs of both cultivars related to photosynthesis and similar processes. Among the contigs from the Andean cultivar are several involved in amino acid metabolism. These were not evident in the Mesoamerican sequences. Conversely, there are 9 contigs in the Mesoamerican group that had no comparable sequences in the Andean leaf group, including 2 nodulin 30s (Q39882 and Q41121), 1 leghemoglobin (Q03972), and a carbonic anhydrase (Q9XQB0), which is not represented in the Mesoamerican cultivar. Thus, the complement of contigs between the two germplasm sources was quite distinct. These differences in contigs may represent genotypic, growth condition, and/or developmental stage variables.

Because root ESTs were derived from P-stressed plants, contigs composed of abundant root ESTs reflect not only root function, but also those that may be related to stress. This is exemplified in the five root contigs containing the most abundant ESTs that have homology to a stress-related pathogenesis protein (P25985), an extensin (Q41707), a plasma membrane intrinsic protein (Q9XGG8), a metallothionein (Q75NH5), and an S-adenosyl-methionine (SAM) decarboxylase (Q8W3Y2), all of which are related to biotic/abiotic stress. Noticeably, several other contigs encode putative transport/membrane, oxidative stress, transcription factor, and phosphatase proteins. Five of the most abundant root contigs were found only in the root library.

Nodule contigs composed of the most abundant ESTs have homology to putative proteins involved in core functions related to N fixation, including oxygen control (leghemoglobin Q03972] and ascorbate peroxidase [Q41712]), C metabolism (Suc synthase [Q8GTA3], Suc nonfermenting protein 1 [SNF1; Q9XIW0], aldolase [O65735], malate dehydrogenase [MDH; Q9FSF0]), amino acid synthesis (Gln synthetase N-1 [P00965]), and ureide synthesis (uricase [P53763] and inosine dehydrogenase [Q84XA3]). Interestingly, several of the nodule contigs encode putative proteins functioning in plant-microbe interactions, for example, CDR-1 (Q6XBF8), 2-on-2 hemoglobin (Q6QDC2), epoxide hydrolyase (Q9ZP87), hypersensitive-induced response protein (Q6L4S3), and polygalacturonase (O81798). Putative membrane-trafficking and transport proteins (Q7XJQ3), nodulins 24 (P04145) and 55 (Q02917), and annexin (O65848) were also highly represented. Surprisingly, of the 20 nodule contigs shown in Table III, none were found only in nodules. Several other contigs composed of 2 to 10 ESTs were nodule specific. A complete list of the contigs containing a higher number of ESTs is available (http://www.ccg.unam.mx/phaseolusest/Data_download.htm; see also supplemental data online).

Comparisons to Other Legume EST Projects

In recent years, considerable effort has focused on the identification of nodule-enhanced or nodule-specific genes. To allow comparisons between projects, the 340 nodule-specific M. truncatula EST contigs identified by Fedorova et al. (2002) were compared to the 228 nodule-specific common bean contigs. Surprisingly, only 17 of the 340 contigs identified by Fedorova et al. had homology to nodule-specific common bean contigs. To determine whether this was due to differences in gene expression between M. truncatula and common bean, the 340 tentative consensus sequences identified by Fedorova et al. were BLASTed against all common bean EST sequences. Of the 340 contigs, only 25% had a homolog (E < 10−12) in common bean. This suggests that an entirely different repertoire of genes is expressed in common bean nodules. While further sequencing of nodule ESTs is necessary to confirm this observation, some support comes from the work of Lee et al. (2004). Comparison of the 20 most abundant contigs in soybean nodules to those in common bean revealed that only leghemoglobins and Suc synthase were shared in common.

Identification of Phaseolus-Specific Contigs

Ten contigs (477; 616; 642; 825; 917; 1,041; 1,067; 1,372; 1,843; and 2,376) were identified with no or Phaseolus-only BLASTX hits to the Uniref 100 protein database or to non-Phaseolus sequences in the database of legume sequences. To verify that these contigs were indeed Phaseolus specific, TBLASTX was used to compare them to the EST_others database and the Arabidopsis (Arabidopsis thaliana) genome. Comparisons to the EST_others database would detect homology to genes expressed in a variety of conditions. Comparisons to the Arabidopsis genome allowed identification of sequences whose expression had not been detected in other species and could also be used to find homology to genes that have not yet been predicted. These additional analyses confirmed that 9 of the 10 contig sequences were indeed Phaseolus specific. Full-length sequencing of the ESTs in these contigs and RNA-blot expression studies may provide further insight into the function of these genes.

Identification of Gene Families

Single-linkage clustering, as described by Graham et al. (2004), was used to assign common bean contigs and singletons to putative gene families. The common bean contig and singleton sequences were combined into a single dataset. This dataset was then compared to itself using TBLASTX and an E-value cutoff of 10−12. Any sequences with overlapping BLAST reports were assigned to a putative gene family. Using this technique, we were able to identify 944 gene families ranging in size from 2 to 109 members. A full description of these data is available at our Web site (http://www.ccg.unam.mx/phaseolusest/Data_download.htm; see also supplemental data online). This type of analysis had two important benefits for our research. First, we could identify sequences that were likely to cross-hybridize in future northern and macroarray experiments. For example, group 8 was composed of 109 sequences mostly with homology to protein receptor kinases. While some members of this group show quite distant homology, others are very similar. A second advantage of this approach is that sequences that had no homology to known proteins often clustered into gene families with known proteins. In the case of group 8, 19 of the 109 sequences were annotated as hypothetical proteins and 2 had no BLAST homology to the UniProt database. By comparing sequence alignments of these sequences with representative members of group 8, we can determine whether they really are protein kinases.

Analysis of SNPs

SNPs were identified between the Andean and the Mesoamerican genotypes by comparing the Andean leaf ESTs against all other ESTs from all other tissue libraries of the Mesoamerican genotype. A total of 645 contigs (28% of the total) contained at least 1 sequence from both genotypes and could be mined for potential SNPs. Two different criteria were used to identify SNPs. High-quality SNPs were confirmed by two or more sequences from each genotype showing the same base change. A total of 138 high-quality SNPs were found in 72 contigs. Lower quality SNPs were confirmed by one sequence in one genotype and at least two sequences in the other. A total of 421 SNPs, representing 196 contigs, were identified in this class. Together, these 559 SNPs corresponded to 199 contigs, giving an average SNP per contig number of 2.8. As expected, the majority of the SNPs were due to base pair mutations (94.9%) compared to insertion-deletion events (5.1%). Among the base pair mutations, transversions (34.5%) were less common than transitions (65.6%) and, among these, Cys-to-thymidine mutations (65.1%) were more common than adenine-to-guanine mutations (34.9%).

SNPs were found in a range of contigs. Due to the nature of the comparison between EST libraries conducted here, where Andean ESTs were all from leaf tissue, many of the SNPs were found in contigs representing highly expressed leaf genes involved in the structure of the PSI and PSII, and in the CO2 assimilation process. Confirming their high level of expression in leaf tissue, the photosynthesis-related genes were homologous to the contigs with the greatest number of ESTs, ranging from >20 up to 161 individual sequences in the case of contig 2,685 with homology to the ribulose bisphosphate carboxylase precursor (Q43874).

Expression Analysis for Selected ESTs by RNA Gel Blots

Tissue-specific or tissue-enhanced ESTs were chosen from nodules, pods, leaves, and P-deficient root cDNA libraries to verify transcript abundance in different plant tissues by RNA blots. Five ESTs were selected to verify nodule-specific and/or nodule-enhanced expression (Fig. 2A). All were highly expressed in nodules, with a sulfate transporter (contig 2,167), SNF (contig 2,434), and leghemoglobin appearing to be expressed only in nodules. Two different-size RNAs were detected with the SNF-like cDNA probe. Most of the pod ESTs selected for RNA-blot analysis (Fig. 2B) are expressed in a pod-enhanced manner, independent of the EST redundancy, since pod storage protein (contig 2,671) is represented by 44 pod ESTs and myoinositol-1-P synthase (contig 2,532) is represented by 5 pod ESTs in this cDNA library. Lipoxygenase (contig 2,628) transcript is detected in pods, but also in leaves, with the greatest expression in stems.

Figure 2.
RNA blots for ESTs identified as highly expressed from nodule (A), pod (B), leaf (C), and P-deficient root (D) libraries. Total RNA (15 μg) from each organ was separated by electrophoresis, transferred to nylon membranes, and probed with each ...

Figure 2C shows that, with the exception of a hypothetical protein (contig 2,608), most of the ESTs selected from the 2 leaf cDNA libraries are expressed in a leaf-enhanced manner and leaf-specific expression was detected for a carbonic anhydrase (contig 2,534) transcript. Interestingly, a transcript of a lower Mr for plastidic aldolase (contig 2,668) was detected in nodules as compared to other organs. The unexpected hybridization of Rubisco (contig 2,682) to nodule RNA is puzzling. However, the different size of the transcript detected in nodules could reflect a chimeric clone or the presence of a very abundant transcript in nodules with high homology with Rubisco. Nodule ESTs annotated as Rubisco can be found in M. truncatula and L. japonicus databases.

Transcript abundance analysis of 7 selected ESTs from the P-deficient root library (Fig. 2D) shows that only 2 (pathogenesis-related protein [contig 2,665] and aquaporin [contig 2,522]) were more highly expressed in roots than in other tissue. The pathogenesis-related protein contig is composed of 38 root ESTs and the aquaporin contig is composed of 4 root ESTs. Independent of the number of ESTs, transcript levels of aquaporin are higher than those of pathogenesis-related protein in roots. The other ESTs in Figure 2D were selected as specific sequences of a P-deficient root cDNA library, but none show root-enhanced expression. The transcript of a putative phosphatase (contig 2,286) EST was clearly detected in P-deficient roots, but was not detected in any other tissue, including roots grown in the presence of P, suggesting that this phosphatase plays a specific role in phosphate release processes that take place in roots under P deprivation.

High-Density Macroarrays for Nodule ESTs

Macroarray approaches, as described previously (Fedorova et al., 2002; Uhde-Stone et al., 2003; Colebatch et al., 2004), were used to evaluate global expression of the nodule-isolated ESTs. Nylon filter arrays of 2,007 ESTs from the nodule cDNA library were performed to evaluate nodule gene expression in comparison with other bean organs, such as root, leaf, stem, and pod, from which we used 2 experimentally independent sources of RNA isolated from plants grown under similar conditions. The spotted ESTs included 1,486 singletons and 300 contigs, representing a 1,786-unigene set.

From the 3 to 5 independent nylon filter arrays hybridized with first-strand cDNA from nodules, roots, leaves, stems, and pods, only those replicates (2 or 3) with a high determination coefficient (r2 ≥ 0.8) were chosen to identify genes with reliable expression levels: those showing signal intensity values higher than 1.5-fold the local background through all selected test hybridizations. A total of 565 genes were obtained and subsequently used to calculate normalized expression ratios of the nodules relative to the other organs (see “Materials and Methods”). As expected, these genes exhibited significantly different expression levels across all organs, applying both the Student's t test for paired observations (P < 0.001) and the nonparametric Wilcoxon signed-rank test (P < 0.001). Figure 3, A to D, shows a graphic representation of the 565 EST expression ratios for nodule versus root, leaf, stem, and pod, respectively. Expression ratio values higher than 1 (top horizontal line at y = 1) represent ESTs with increased expression in nodules versus other organs (Fig. 3). Whenever the ratio was lower than 1, we estimated the inverse of that ratio and changed the sign such that these values will appear below the line at y = −1 (Fig. 3). Obviously, by definition, there will be no values between 1 and −1.

Figure 3.
Macroarray expression ratios of common bean ESTs. Polyubiquitin-normalized expression ratios of nodules (N) versus root (R), leaf (L), stem (S), and pod (P) were obtained for 565 selected ESTs as explained in “Materials and Methods.” Expression ...

Figure 3A shows that the expression ratio of nodules to roots was lower than nodules as compared to other organs. This might be due to the fact that either (1) the roots used for RNA isolation and macroarray hybridization were obtained from nodulated bean plants after nodules were removed; or (2) nodules are derived from root cortical cells. The data shown in Figure 3A revealed that 31 ESTs had 5-fold or higher nodule-root expression ratios. From these, 2 ESTs identified as villin 2 (NOD_247_F07) and Suc synthase (contig 2,654) showed the highest expression ratio (8; Table IV). Forty-nine ESTs had a higher expression in roots as compared to nodules (Fig. 3A). From these, an EST identified as ring-H2 finger protein (contig 905) has the highest expression in roots versus nodules (expression ratio = −12).

Table IV.
Macroarray expression ratios of P. vulgaris ESTs identified as abundant in the root nodule library

Greater differences in ratios of gene expression were observed when comparing nodules with leaves and stems; these large ratios reflect very different function between nodules and those source organs (Fig. 3, B and C). In nodules versus leaves and stems, 188 and 294 ESTs had expression ratios of 10 or higher, respectively (Fig. 3, B and C). From these, 99 and 138 ESTs were expressed 20-fold or more in nodules than in leaves and stems, respectively. In the comparisons of nodules versus leaves and nodules versus stems, totals of 6 and 26 ESTs, respectively, were found with expression ratios higher than 50. As shown in Table IV, at least 15 ESTs showed very high expression ratios (ranging from 52–135) both in nodule-leaf and nodule-stem. The functional categories of these ESTs were identified as proteins for nodulation or nodulins, such as leghemoglobin (contig 2,686), nodulin 30 (contig 2,679), and early nodulin 55-2 (contig 2,589), as well as proteins involved in C metabolism, defense, or regulation. Data from Figure 3, B and C, show that 61 and 44 ESTs, respectively, were more expressed in leaves and in stems than in nodules. The most highly expressed ESTs in leaves versus nodules were identified as VirF-interacting protein (NOD_225_E10; expression ratio = −9), ring-H2 finger protein (contig 905; expression ratio = −8), and one without homology to known genes (contig 2,009; expression ratio = −6); the first 2 were also highly expressed in roots and pods as compared to nodules.

Pods, as well as nodules, can be considered as sink organs; pods receive photosynthate from the leaves and mobilize N for pod development and seed formation. In general, expression ratios found in nodules versus pods were not as high as those found when comparing leaves and stems (Fig. 3). A total of 197 ESTs had nodule-pod expression ratios higher than 10. From these, 65 had 20-fold or higher expression ratios. Only 3 ESTs (nodulin 30, an unknown protein, and a hypothetical protein) had nodule-pod expression ratios higher than 50. Forty-three ESTs were more highly expressed in pods than in nodules (Fig. 3D); VirF-interacting protein (NOD_225_E10; expression ratio = −9) and a nonidentified EST showed the highest expression in pods versus nodules (expression ratio = −6).

Transcriptomic Analysis of Nodule C and N Metabolism

Although transcriptome studies of genes related to nodule N and C metabolism have been reported for the model legumes M. truncatula (Györgyey et al., 2000) and L. japonicus (Colebatch et al., 2002, 2004), temperate species that assimilate and transport fixed N as amides, comparable information for soybean, which assimilates and transports fixed N as ureides similar to common bean, was recently published (Lee et al., 2004). Analysis of common bean root nodule contigs and ESTs by macroarray experiments showed that numerous genes encoding enzymes of N and C metabolism had enhanced expression in nodules and showed a high ratio of expression compared to other tissues.

At least 11 enzymes of C metabolism appeared to have enhanced expression as evidenced by either the expression ratio of nodule-root in macroarrays or abundant ESTs (Table V). Notably Suc synthase (contig 2,654) and phosphoenolpyruvate (PEP) carboxylase (PEPC; contig 2,265), enzymes that contribute to sugar use, had high expression (Figs. 2 and and4).4). Several enzymes involved in general glycolysis (triose phosphate isomerase [contig 2,550], phosphoglycerate kinase [contig 2,537], and enolase [contig 2,622]), also had enhanced transcript levels (Fig. 4), as well as Glc-6-P dehydrogenase (Table V), a key source of NADPH for nodules.

Figure 4.
RNA blots for ESTs involved in nodule C and N metabolism identified as highly expressed in root nodules. Total RNA (15 μg) from each organ was separated by electrophoresis, transferred to nylon membranes, and probed with each (32P)-labeled EST. ...
Table V.
Genes encoding enzymes of C and N metabolism having enhanced expression in nodules as evidenced by macroarrays and/or contigs composed of abundant ESTs

With respect to N metabolism, four enzymes related to initial assimilation of fixed N into Gln had enhanced expression. In addition, another two enzymes related to ureide metabolism had enhanced expression.

Confirmation of macroarray results and contig analysis for common bean root nodule genes involved in C and N metabolism was obtained through RNA blots (Fig. 4). Even though expression of most genes involved in C metabolism that we tested was not nodule specific, the greatest transcript abundance was usually found in nodules. Reflecting nodule function, those genes involved in N metabolism are most clearly expressed preferentially in nodules (Fig. 4). In contrast with soybean (Lee et al., 2004), in bean the abundance of those ESTs involved in ammonia assimilation and ureide synthesis clearly reflects the nodule metabolic profile. Consistent with other studies of nodule N metabolism, NADH-dependent Glu synthase (GOGAT) transcript levels were quite low. Moreover, no NADH-GOGAT ESTs were found in the root nodule sequences. Interestingly, pods have abundant transcript expression for many of the genes involved in nodule N and C metabolism.


In this article, we provide an initial platform for functional genomics of common bean by the identification of almost 8,000 unique genes assembled from more than 20,000 ESTs sequenced from various plant organs. These sequences enrich the collection of ESTs in this important crop and provide new understanding of bean metabolism, development, and adaptation to stress. Roughly 3,400 to 4,900 ESTs were sequenced from each of 5 cDNA libraries of different bean tissues, and we identified 2,226 contigs (with 2 or more ESTs each) which were classified into 15 functional subgroups. From these contigs, 36% represented sequences of unknown function or had no homology to previously identified proteins in the UniProt database (Apweiler et al., 2004). Another 34% corresponded to genes involved in C and N metabolism. These subgroup percentages are similar to those noted for nodules of M. truncatula (Györgyey et al., 2000) and L. japonicus (Colebatch et al., 2004) and proteoid roots of white lupin (Lupinus albus; Uhde-Stone et al., 2003). The third most abundant common bean functional subgroup was composed of contigs involved in signal transduction (8.7%). Transcripts involved in signal transduction were also large components of the ESTs noted in M. truncatula and L. japonicus (Fedorova et al., 2002; Colebatch et al., 2004). Some 5.7% of the bean contigs corresponded to genes implicated in biotic/abiotic stress. An abundance of contigs related to stress may be due to our selection of the libraries, root nodules, and P-deficient roots. Uhde-Stone et al. (2003) reported that 10.7% of the ESTs sequenced from P-deficient cluster roots of white lupin were related to stress. Although there are some 986,000 nucleotide sequences deposited in GenBank that are derived from the Fabaceae family, prior to this report only 575 came from common bean. We have extended that number by over 25-fold. These bean EST sequences provide the foundation for genome-wide transcript studies through either macro- or microarrays. In addition, they are a source of defined molecular markers for mapping bean linkage groups and anchoring physical maps.

A comparison of EST redundancy in contigs having sequences derived from multiple organs can provide a broad overview of gene expression and biochemical functions occurring within an organ (Colebatch et al., 2002, 2004; Fedorova et al., 2002; Journet et al., 2002; Uhde-Stone et al., 2003). Both common and unique features of plant organs may be identified by extracting and comparing a limited number of contigs containing redundant EST sequences, as evidenced by our identification of the 20 primary contigs from each library having the most EST sequences. Cursory examination of the top 20 contigs showed, not unexpectedly, that 82% of those from leaves encode proteins related to a single function, photosynthesis/light harvesting. Some 45% of the most prominent pod contigs encode proteins related to three functions, protein storage, lipid metabolism, and photosynthesis. Similarly, 75% of the 20 most prominent nodule contigs encode proteins related to nodulins, C and N assimilation, and oxygen control. Although there is more diversity of function among the 20 most prominent contigs from P-deficient roots, at least 13 encode proteins related to stress. For simplicity, we have noted the contigs involved in the primary functions of bean organs; more detailed analysis of the entire EST profile from each organ will reveal other features that may prove important to bean biology and improvement.

Transcript expression evaluated by macroarrays provides a detailed picture of nodule biology, particularly C and N metabolism. While whole-nodule transcript studies of L. japonicus and M. truncatula (Colebatch et al., 2002, 2004; Fedorova et al., 2002; Journet et al., 2002) show significant gene induction of enzymes related to amide production, particularly 4-C organic acids and Asn, common bean-nodule C and N enzymes favor ureide synthesis. Enhanced transcript expression for nodule Glc-6-P dehydrogenase the first committed step in the oxidative branch of the pentose phosphate pathway, supports an interpretation that a portion of bean nodule metabolism favors production of NADPH and ribulose-5-P, the component sugar, for de novo purine synthesis (Table V). Induction of the nonoxidative branch of the pentose phosphate pathway is shown by increased expression of ribulose-5-P 3-epimerase, which can provide both 3- and 6-C intermediates for PEP and glycolysis, respectively (Fig. 4). Metabolism favoring ureide production is also reflected in the fact that all of the enzymes required for de novo purine synthesis can be found as nodule ESTs with some being nodule specific. Direct production of ureides from purines is demonstrated by increased transcript abundance for uricase and xanthine dehydrogenase along with a phosphoribosylformylglycinamide amidotransferase contig that is composed mainly of nodule ESTs (Fig. 4; Table V).

We found several enzymes related to sugar use and glycolysis to be up-regulated in nodules that are also reflected in the contigs containing abundant ESTs from nodules. Suc synthase, the initial enzyme in Suc cleavage, which is critical for N fixation, is highly expressed in bean nodules (Fig. 4) and the corresponding contig (2,654) has 21 ESTs from nodules. Interestingly, the five enzymes of glycolysis (Table V; Fig. 4) that we find enhanced in bean nodules are involved in the synthesis of 3-C intermediates that ultimately lead to PEP, which is the fundamental backbone for both malate and Asn synthesis (Deroche and Carrayol, 1988).

Malate is considered the primary C source in nodules used by bacteroids for energy to reduce N (Appels and Haaker, 1991). It is interesting that the two pivotal enzymes required for malate synthesis, PEPC and MDH, have enhanced expression in bean nodules and are represented in the most abundant contigs from nodules.

The initial assimilation of fixed N into the nodule amino acid pool is catalyzed by glutamine synthetase (GS) and NADH-GOGAT in concert with Asp aminotransferase (AAT; Gantt et al., 1992). Macroarray and RNA blots show that both GS isoforms, β and γ (Lara et al., 1984), as well as AAT have enhanced expression in bean nodules and have numerous ESTs in the nodule (Fig. 4). By contrast, we did not find any NADH-GOGAT ESTs, but RNA blots show the transcript is enhanced in nodules but expressed at a low level (Fig. 4). This is consistent with the suggestion that NADH-GOGAT may be the rate-limiting step in nodule N assimilation (Temple et al., 1998). We have isolated two distinct NADH-GOGAT cDNAs from nodules and one appears to be related to N2 fixation (M. Lara, L. Blanco-López, and C. Vance, unpublished data).

During the review of our submission, Lee et al. (2004) reported an analysis of the soybean root nodule-enhanced transcriptome. Surprisingly, a comparison of the 20 most abundant contigs in soybean nodules to those in common bean revealed that only leghemoglobins and Suc synthase were present in both. The remaining complement of the 20 most abundant nodule contigs was quite diverse. Comparisons to the nodule-specific ESTs identified by Fedorova et al. (2002) also demonstrated little overlap. Another noteworthy difference between the nodule contigs of various legumes is the absence in common bean and soybean of contigs encoding Cys cluster proteins and calmodulin-like proteins, which are highly abundant in M. truncatula nodules (Graham et al., 2004). These striking differences in nodule contigs between species illustrate not only the diversity of legume nodule genes but also the importance of transcriptome analysis of the same organ from different species.

Although the abundance of ESTs within a contig derived from an organ or tissue can frequently correlate with transcript expression within the tissue, conclusions drawn from in silico analyses can be misleading. We chose 20 ESTs that were specific to or highly enhanced in a particular organ and evaluated their expression in various organs by RNA blot (Fig. 2). Although many of the ESTs gave expression patterns similar to that expected from in silico data, several had abundant expression in organs other than the one from which they were selected. For example, lipoxygenase (contig 2,628), a hypothetical protein (contig 2,632), and zinc finger protein (contig 2,266) derived from pods, leaves, and roots, respectively, have quite high expression in other organs. These results could be due to several factors, including mRNA stability, growth conditions, and developmental stage. An added complexity in correlating in silico results with RNA blots is the occurrence of contigs as multigene families. In fact, of the 2,226 contigs we identified, 943 belonged to gene families. At this stage of limited sequencing, most (557) of the gene families are composed of 2 sequences, while 36 gene families contain 10 sequences, and 3 gene families contain 60+ sequences. From this analysis, we can conclude that 21% (3,358) of the ESTs used for contig assembly are members of gene families. Inclusively, our findings show the necessity of verifying in silico EST expression data by RNA blots and/or quantitative reverse transcription-PCR.

This study also showed the utility of mining EST collections in common bean for SNPs. To reduce errors caused by single-pass sequencing and low base quality values, we used two different criteria for identifying SNPs. Lower quality SNPs were supported by one sequence in one genotype and at least two sequences in the other. Using these criteria, a SNP could be found every 508 bp. High-quality SNPs were supported by at least two sequences from each genotype. Similarly, these criteria identified a SNP every 601 bp. By combining these data together, we identified 529 SNPs in 214 kb of SNP-containing contigs, giving a SNP every 387 bp. These values are similar to those found for equivalent comparisons made in other in-breeding species of plants, but less frequent than in maize (Tenaillon et al., 2001). It was promising to find this frequency of SNPs in coding regions of common bean and perhaps was not unexpected due to the large genetic differences between the source genotypes whereby each represented a different gene pool of the species (Andean and Mesoamerican, respectively; Broughton et al., 2003). It would be necessary to confirm SNP frequency in further analysis of a greater number of ESTs or whole-gene sequences and in a panel of representative common bean genotypes that could be screened by PCR amplification and resequencing as has been done in other crop species (Tenaillon et al., 2001; Zhu et al., 2003; Russell et al., 2004). Our discovery of a large number of SNPs in expressed sequences should allow the genetic mapping of many of the genes underlying agricultural characteristics in common bean and, given the leaf library source of ESTs evaluated in this study, it would be possible to begin with the genetic mapping of several genes important in photosynthetic efficiency. SNP marker development, however, will depend on the establishment of technology and experimental protocols that allow their routine use in plant breeding programs (Morales et al., 2004).

Because of our overriding interests in bean root nodule development and function, this project was initiated to focus on global characterization of bean nodule transcripts. This priority is evidenced by our in-depth analysis of bean nodule metabolism. As the Phaseomics consortium coalesced and defined its goals, it became apparent that the bean community needed EST profiles of additional bean organs. Thus, we sequenced ESTs from pods, leaves, and P-deficient roots. Future reports will concentrate on more detailed characterization of and research with ESTs from other bean organs and development of SNP-based molecular markers from the current set of EST sequences.


Plant Material

Two genotypes of common bean (Phaseolus vulgaris) were used for library construction. The first was the Mesoamerican cultivar Negro Jamapa 81, plants of which were grown in greenhouses at Centro de Investigación sobre Fijación de Nitrógeno (CIFN)/Universidad Nacional Autónoma de México (Cuernavaca, Mexico) and at University of Minnesota (St. Paul), as previously reported (Ortega et al., 1992). Negro Jamapa 81 is a black-seeded variety that was selected by F. Cárdenas at the Experimental Station in Cotaxtla,Veracruz, from a landrace collection. Plants of Negro Jamapa were inoculated with Rhizobium tropici CIAT 899 and watered with N-free nutrient solution, as reported by Silvente et al. (2003); mature nodules were collected 15 d postinoculation (dpi). Leaves were collected from inoculated plants 15 dpi; plants were at a vegetative developmental stage, prior to flowering. Pods at different stages of development were collected from inoculated plants. For P-deficiency stress conditions, seedlings of Negro Jamapa were germinated for 3 d, the cotyledons were cut, and the plantlets were watered for 3 weeks with nutrient solution deprived of P, showing evident symptoms of P deprivation. The second genotype used was the Andean cultivar G19833, which was grown in a greenhouse at CIAT in Cali, Colombia, under a 12-h photoperiod, average relative humidity of 74.7%, and night/day temperatures of 20°C/28°C. The cultivar G19833 is a landrace from Peru, with yellow and black mottled seed that is one parent of a genetic-mapping population that has been used at CIAT to map microsatellite markers (Blair et al., 2003) and to study low-P adaptation quantitative trait loci (Liao et al., 2004; Yan et al., 2005).

Preparation of cDNA Libraries

A total of 5 cDNA libaries were made, 4 from Negro Jamapa 81 and 1 from G19833. In the case of Negro Jamapa 81, total RNA was isolated from different plant organs: (1) young (1.5–5 cm) and mature (15 cm) pods from inoculated plants; (2) leaves from 15-d-old nodulated plants; (3) roots from P-deficient plants; and (4) mature effective nodules harvested after 15 dpi with R. tropici CIAT 899. For all the libraries made from Negro Jamapa 81, poly(A+) RNA was obtained from total RNA using oligo(dT) cellulose. The poly(A+) RNA used for the pod library was obtained from total RNA combined from young and mature pods in a 1:1 (w/v) ratio. Conversion of polyadenylated RNA to cDNA was performed in the phage Uni-ZAP XR with a Stratagene (La Jolla, CA) synthesis and cloning kit. The cDNA synthesis of poly(A+) mRNA was primed by oligo(dT)-XhoI adapter primer with MNLV-reverse transcriptase, while the second strand was synthesized via polymerase I ribonuclease H coincubation. EcoRI adapter was added to the blunted double-stranded cDNA followed by XhoI digestion. Recovered cDNA was directionally cloned into the EcoRI-XhoI Uni-ZAP XR vector, according to the manufacturer's instructions. The cDNA from all libraries was size selected via Sephacryl S-500 spin columns as part of the procedure described by the manufacturer (Stratagene). The fifth cDNA library, made for the genotype G19833, was prepared from total RNA isolated from leaves and vegetative meristems of 3-week-old plants. For this library, poly(A+) RNA was purified and reverse transcribed, and cDNAs were directionally cloned into the NotI/SalI sites of the pCMVSport6.0 vector (Invitrogen, Carlsbad, CA).

Generation of ESTs

For conversion of the 4 Negro Jamapa 81 cDNA phage libraries (ZAP XR vector) into the plasmid form (pBluescript), mass excision was performed, according to the procedure described by the manufacturer (Stratagene). Single colonies of Escherichia coli strain SOLR carrying the excised phagemid were replicated, and glycerol stocks were stored in microtiter plates at −80°C. Plasmid DNA from a nodule cDNA library was isolated using the QIAprep 96 Turbo Miniprep kit, according to the manufacturer's instructions (Qiagen, Valencia, CA). The plasmid DNA isolation of the other three libraries was made by a modified alkaline lysis method. Sequencing of the plasmid cDNA was performed by the Advanced Genetic Analysis Center (St. Paul) for the pod, root, and nodule libraries and at the CCG (Cuernavaca, Mexico) for the leaf library. Standard T3 sequencing primer was used for 5′ single-stranded sequencing. For the G19833 library, the clones were transformed into E. coli EMDH12S cells, which were plated on Q plates with carbenicillin (100 mg L−1). A Q-Bot was used to pick and array colonies into plates and filters. Plasmid DNA was isolated using a modified alkaline lysis method and the individual cDNAs were sequenced either from the 5′ end, using a SP6 primer, or from the 3′ end with a T7 primer at the Clemson University Genomics Institute (Clemson, SC) and at CIAT.

EST Processing and Contig Assembly

Common bean EST sequences were analyzed using a processing pipeline developed by the Center for Computational Genomics and Bioinformatics (CCGB) at the University of Minnesota (Lamblin et al., 2003). Sequence base calls were made using Phred (Ewing et al., 1998) with a quality cutoff of 10. Vector filtering was performed using the CCGB program gstvf4 (Lamblin et al., 2003). Processed ESTs 100 bases or longer were assembled into contigs using Phrap (http://www.phrap.org/phredphrap/phrap.html) with a minimum match of 50 and a minimum score of 100. Once contig assembly was completed, visualization software developed by CCGB was used to assess contig quality. Contigs were individually inspected for low-quality sequence, chimeras, and splice variants. Following the trimming of low-quality sequence and the removal of chimeras and splice variants, the ESTs were reassembled. This procedure was repeated three times or until no new ESTs were added to contigs. The final assembly constituted the common bean gene index.

BLAST Analyses

BLASTX (Altschul et al., 1997) comparisons against the Uniref 100 protein database (August, 2004; Apweiler et al., 2004) were used to assign putative function to common bean contigs and singletons. In addition, TBLASTX was used to compare the common bean sequences to a database of legume sequences. This database included the Lotus japonicus, Medicago truncatula, and Glycine max/soja gene indices from The Institute for Genomic Research (TIGR; Quackenbush et al., 2001) and all publicly available sequences from the genera Arachis, Lupinus, Phaseolus, Robinia, and Pisum (available from the NCBI taxonomy browser [http://www.ncbi.nlm.nih.gov/Taxonomy/taxonomyhome.html]). For both BLAST searches, an E-value cutoff of 10−4 was used and Perl scripts were used to parse the results.

Contigs with no or Phaseolus-only BLASTX (Altschul et al., 1997) hits (E < 10−4) to the Uniref 100 protein database and the database of legume sequences were identified as candidate Phaseolus-specific contigs. For further verification, TBLASTX was used to compare these contigs to GenBank's EST_others database (August, 2004) and the Arabidopsis (Arabidopsis thaliana) genome (The Arabidopsis Information Resource [TAIR] version 5). Contigs with no hits more significant than 10−4 to non-Phaseolus sequences in these databases were considered Phaseolus specific.

To allow comparisons between EST projects, the nodule-specific M. truncatula EST contigs identified by Fedorova et al. (2002) were compared to all common bean sequences. For this analysis, the program TBLASTX (Altschul et al. 1997) was used with and E-value cutoff of 10−12.

RNA Gel Blots

For northern analysis, RNA was extracted from 0.2 g of frozen nodule, root, stem, and leaf using an RNA extraction kit (BIO-101, Irvine, CA). The RNA (10 μg) was denatured in 50% formamide, 17% formaldehyde, and 10% MOPS buffer (200 mm MOPS, pH 7.0, 50 mm Na-acetate, and 1 mm EDTA) at 65°C for 5 min. Twenty micrograms of total RNA were separated on 1.2% agarose gel containing 2.2 m formaldehyde in MOPS buffer and transferred to positively charged nylon membranes (Hybond-N+; Amersham, Buckinghamshire, UK) by downward capillary transfer in 20× SSC. After a 30-min prehybridization (300 mm Na2HPO4, pH 7.2, 7% SDS), the blot was hybridized for 24 h at 65°C with [32P]-labeled specific probes. After stringent washing, radioactive membranes were exposed to x-ray film (Kodak, Rochester, NY) overnight at −70°C. Three repetitions were done for each probe and similar results were obtained. The blots shown are representative of the three repetitions.

Nylon Filter Arrays

The cDNA portion of each nodule EST was amplified by PCR, using standard T3 and T7 primers. Before spotting, the quality of each PCR product was evaluated by gel electrophoresis. The PCR products were spotted in replicate, onto Gene Screen Plus membranes (NEN Life Science Products, Boston) using the Q-bot (Genetix, Boston) automated spotting system with a 96-pin gravity griddling head with 0.4-mm pin diameter.

Total RNA was isolated from mature nodules elicited by R. tropici CIAT 899 and nodule-deprived roots, leaves, and stems from inoculated Negro Jamapa 81 bean plants at 18 dpi. Pod RNA was obtained from a mixture of young developing and mature pods taken from two independent sources. In two independent experiments, RNA was isolated from the organs of plants grown under similar conditions. Total RNA was also isolated from P-deficient roots. Radiolabeled cDNA probes were synthesized by reverse transcription of 30 μg of total RNA for 1 h in the presence of 50 μCi γ [32P]dATP using SuperScriptII reverse transcriptase, according to the manufacturer's instructions (Stratagene) at 42°C. To complete cDNA synthesis, the reaction was prolonged for 30 min with 1 μL of 5 mm cold ATP. Unincorporated γ[32P]dATP was removed by purification with a Sephadex G50 column and labeling efficiency was measured by scintillation counting. The final concentration of each probe was adjusted to 106 cpm mL−1 hybridization solution. Hybridizations were performed in 50% (w/v) formamide, 0.5 m Na2HPO4, 0.25 m NaCl, 7% (w/v) SDS, and 1 mm EDTA at 42°C. Blots were washed with 3 subsequent washes: 2× SSC/0.1% SDS; 0.5× SSC/0.1% SDS; 0.1× SSC/0.1% SDS at 42°C in 200 mL of wash buffer. Four to seven independent nylon filter arrays were hybridized with cDNA from each organ.

Data Analysis of Nylon Filter Arrays

Radioactivity of each spot was quantified using a Phosphor Screen imaging system (Molecular Dynamics, Sunnyvale, CA). The signal intensity of each spot was determined automatically using the software Array-Pro Analyzer (Media Cybernetics, Carlsbad, CA). This program allows the normalization of quantified signals against the background. The normalized intensities were reported in Excel (Microsoft, Redmond, WA) files and linked to the corresponding cDNA clone. In order to work with highly reproducible experiments, linear regression analysis was performed for each pair of membrane replicas; only those replicas for which the linear model could explain at least 80% of the variation (determination coefficient r2 ≥ 0.8) were further taken into consideration. This process yielded a total of 3, 3, 2, 2, and 2 well-correlated replicas for nodule, root, leaf, stem, and pod, respectively.

Genes were considered as reliably expressed if they showed intensity/background ratios greater than 1.5 through all related parallel hybridizations. A final gene set was obtained by joining the genes expressed in each organ and removing all duplications. Single expression values per organ were then calculated as the gene average expression in the sets of correlated replicas. Given that the expression differences between any two organs follow a bell-shaped distribution (data not shown), the t test for paired observations was applied to determine whether genes show significantly different expression values from organ to organ. Nevertheless, we also applied the nonparametric Wilcoxon signed-rank test for matched pairs, which does not rely upon the assumption of normality. Both tests strongly supported the hypothesis of differential expression (P < 0.001).

The housekeeping gene polyubiquitin (EST NOD_206_B07) served as an internal normalization control for calculating expression ratios between pairs of organs. The signal intensity value of each gene was divided by the signal value of the polyubiquitin EST in the respective organ. Normalized expression ratios were estimated by dividing the polyubiquitin-normalized signal intensities in nodules by the polyubiquitin-normalized signal intensities in the other organs. Original signal intensities and transformed data of all experiments are available from our Web site (http://www.ccg.unam/phaseolusest/Data_download.htm; see also supplemental data online).

Identification of Gene Families Using Single-Linkage Clustering

In order to identify gene families, the common bean contigs and singletons were combined into a single dataset. TBLASTX (E-value cutoff of 10−12) was used to compare the dataset against itself. As described by Graham et al. (2004), any sequences with at least one sequence in common in their BLAST reports were combined into a putative gene family.

Identification of SNPs

The ace file output of Phrap was used as input to the PolyBayes SNP detection program along with the base values assigned by Phred for each of the contigged sequences. Perl scripts were used to parse the PolyBayes output file and identify SNPs in two categories. High-probability SNPs had SNP probability values >0.5 and the specific SNP was found in two EST sequences from each genotype. Lower probability SNPs had SNP probability values >0.5 and the SNP were found in one EST from one genotype and at least two ESTs from the other. Perl scripts were used to identify and store 50 bp of sequence on either side of the SNP.

Upon request, all novel materials described in this publication will be made available in a timely manner for noncommercial research purposes, subject to the requisite permission from any third-party owners of all or parts of the material. Obtaining any permission will be the responsibility of the requester.

Sequence data from this article have been deposited with the EMBL/GenBank data libraries under accession numbers CV528971 through CV544303.


We acknowledge the technical assistance provided by Mike Atkins, Mike Palmer, and Jeff Tomkins at the Clemson University Genomics Institute and help from Monica C. Muñoz, Eliana Gaitan, and Joe Tohme at CIAT. We also gratefully acknowledge Guillermo Dávila and Rosa I. Santamaria for providing the facility and for technical assistance for DNA sequencing at CCG, Unversidad Nacional Autónoma de México, and for Eric Verdorn's assistance in bioinformatics at the University of Minnesota.


1This work was supported in part by Consejo Nacional de Ciencia y Tecnología, Mexico (grant no. G31751–B at CCG), U.S. Department of Agriculture, Agricultural Research Service, Current Research Information System (project no. 3640–21000–019–00D at the University of Minnesota), and by U.S. Agency for International Development at International Center for Tropical Agriculture. M.R. received a postdoctoral fellowship from Consejo Nacional de Ciencia y Tecnología, Mexico.

[w]The online version of this article contains Web-only data.



  • Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25: 3389–3402 [PMC free article] [PubMed]
  • Appels MA, Haaker H (1991) Glutamate oxalacetate transaminase in pea root nodules. Plant Physiol 95: 740–747 [PMC free article] [PubMed]
  • Apweiler R, Bairoch A, Wu CH, Barker WC, Boeckmann B, Ferro S, Gasteiger E, Huang H, Lopez R, Magrane M, et al (2004) UniProt: the universal protein knowledgebase. Nucleic Acids Res 32: D115–D119 [PMC free article] [PubMed]
  • Blair MW, Pedraza F, Buendia HF, Gaitán-Solís E, Beebe SE, Gepts P, Tohme J (2003) Development of a genome-wide anchored microsatellite map for common bean (Phaseolus vulgaris L.). Theor Appl Genet 107: 1362–1374 [PubMed]
  • Broughton WJ, Hernández G, Blair M, Beebe S, Gepts P, Vanderleyden J (2003) Beans (Phaseolus spp.)—model food legume. Plant Soil 252: 55–128
  • Colebatch G, Desbrosses G, Ott T, Krusell L, Montanari O, Kloska S, Kopka J, Udvardi MK (2004) Global changes in transcription orchestrate metabolic differentiation during symbiotic nitrogen fixation in Lotus japonicus. Plant J 39: 487–512 [PubMed]
  • Colebatch G, Sebastian K, Ben T, Susanne F, Thomas A, Udvardi MK (2002) Novel aspects of symbiotic nitrogen fixation uncovered by transcript profiling with cDNA arrays. Mol Plant Microbe Interact 15: 411–420 [PubMed]
  • Cook DR (1999) Medicago truncatula a model in the making. Curr Opin Plant Biol 2: 301–304 [PubMed]
  • Deroche ME, Carrayol E (1988) Nodule phosphoenolpyruvate carboxylase: a review. Physiol Plant 74: 775–782
  • Ewing B, Hillier L, Wendl MC, Green P (1998) Base-calling of automated sequencer traces using Phred. I. Accuracy assessment. Genome Res 8: 175–185 [PubMed]
  • Fedorova M, Van De Mortel J, Matsumoto PA, Cho J, Town CD, Vanden-Bosch KA, Gantt JS, Vance CP (2002) Genome-wide identification of nodule-specific transcripts in the model legume Medicago truncatula. Plant Physiol 130: 519–537 [PMC free article] [PubMed]
  • Food and Agriculture Organization of the United Nations (2001) FAOSTAT Agriculture Data. http://www.fao.org/Statistics
  • Gantt JS, Larson RJ, Farnham MW, Pathirana SM, Miller SS, Vance CP (1992) Aspartate aminotransferase in effective and ineffective alfalfa nodules. Plant Physiol 98: 868–878 [PMC free article] [PubMed]
  • Gepts P (1998) Origin and evolution of common bean: past events and recent trends. Hort Sci 33: 1124–1130
  • Graham MA, Silverstein KAT, Cannon SB, VandenBosch KA (2004) Computational identification and characterization of novel genes from legumes. Plant Physiol 135: 1179–1197 [PMC free article] [PubMed]
  • Graham PH, Vance CP (2003) Legumes: importance and constraints to greater use. Plant Physiol 131: 872–877 [PMC free article] [PubMed]
  • Györgyey J, Vaubert D, Jimenez-Zurdo JI, Charon C, Troussard L, Kondorosi A, Kondorosi E (2000) Analysis of Medicago truncatula nodule expressed sequence tags. Mol Plant Microbe Interact 13: 62–71 [PubMed]
  • Handberg K, Stougaard J (1992) Lotus japonicus, an autogamous, diploid legume species for classical and molecular genetics. Plant J 2: 487–496
  • Journet EP, van Tuinen D, Gouzy J, Crespeau H, Carreau V, Farmer MJ, Niebel A, Schiex T, Jaillon O, Chatagnier O, et al (2002) Exploring root symbiotic programs in the model legume Medicago truncatula using EST analysis. Nucleic Acids Res 30: 5579–5592 [PMC free article] [PubMed]
  • Lamblin AF, Crow JA, Johnson JE, Silverstein KA, Kunau TM, Kilian A, Benz D, Stromvik M, Endre G, VandenBosch KA, et al (2003) MtDB: a database for personalized data mining of the model legume Medicago truncatula transcriptome. Nucleic Acids Res 31: 196–201 [PMC free article] [PubMed]
  • Lara M, Porta H, Padilla J, Folch J, Sánchez F (1984) Heterogeneity of glutamine synthetase polypeptides in Phaseolus vulgaris L. Plant Physiol 76: 1019–1023 [PMC free article] [PubMed]
  • Lee HL, Hur CG, Oh CJ, Kim HB, Park SY, An CS (2004) Analysis of the root nodule-enhanced transcriptome in soybean. Mol Cells 18: 53–62 [PubMed]
  • Liao H, Yan X, Rubio G, Beebe SE, Blair MW, Lynch JP (2004) Basal root gravitropism and phosphorus acquisition efficiency in common bean. Funct Plant Biol 31: 959–970
  • McClean P, Kami J, Gepts P (2004) Genomic and genetic diversity in common bean. In RF Wilson, HT Stalker, EC Brummer, eds, Legume Crop Genomics. AOCS Press, Champaign, IL, pp 60–82
  • Morales M, Roig E, Monforte AJ, Arús P, Garcia-Mas J (2004) Single-nucleotide polymorphisms detected in expressed sequence tags of melon (Cucumis melo L.). Genome 47: 352–360 [PubMed]
  • Ortega JL, Sánchez F, Soberón M, Lara M (1992) Regulation of nodule glutamine synthetase by CO2 levels in bean (Phaseolus vulgaris L.). Plant Physiol 98: 584–587 [PMC free article] [PubMed]
  • Quackenbush J, Cho J, Lee D, Liang F, Holt I, Karamycheva S, Parvizi B, Pertea G, Sultana R, White J (2001) The TIGR gene indices: analysis of gene transcript sequence in highly sampled eukaryotic species. Nucleic Acids Res 29: 159–164 [PMC free article] [PubMed]
  • Russell J, Booth A, Fuller J, Harrower B, Hedley P, Machray G, Powell W (2004) A comparison of sequence-based polymorphism and haplotype content in transcribed and anonymous regions of the barley genome. Genome 47: 389–398 [PubMed]
  • Silvente S, Camas A, Lara M (2003) Molecular cloning of the cDNA encoding aspartate aminotransferase from bean root nodules and determination of its role in nodule nitrogen metabolism. J Exp Bot 54: 1545–1551 [PubMed]
  • Temple SJ, Vance CP, Gantt JS (1998) Glutamate synthase and nitrogen assimilation. Trends Plant Sci 3: 51–56
  • Tenaillon MI, Sawkins MC, Long AD, Gaut RL, Doebley JF, Gaut BS (2001) Patterns of DNA sequence polymorphism along chromosome 1 of maize (Zea mays ssp. mays L.). Proc Natl Acad Sci USA 98: 9161–9166 [PMC free article] [PubMed]
  • Uhde-Stone C, Zinn KE, Ramírez-Yañez M, Li A, Vance CP, Allan DL (2003) Nylon filters array reveal different gene expression in proteoid roots of white lupin in response to phosphorus deficiency. Plant Physiol 131: 1064–1079 [PMC free article] [PubMed]
  • Yan X, Liao H, Beebe SE, Blair MW, Lynch JP (2005) Molecular mapping of QTLs associated with root hairs and acid exudation as related to phosphorus uptake in common bean. Plant Soil (in press)
  • Zhu YL, Song QJ, Hyten DL, Tassell CP, van Matukumalli LK, Grimm DR, Hyatt SM, Fickus EW, Young ND, Cregan PB (2003) Single-nucleotide polymorphisms in soybean. Genetics 163: 1123–1134 [PMC free article] [PubMed]

Articles from Plant Physiology are provided here courtesy of American Society of Plant Biologists
PubReader format: click here to try


Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...


  • Compound
    PubChem Compound links
  • EST
    Published EST sequences
  • MedGen
    Related information in MedGen
  • Nucleotide
    Published Nucleotide sequences
  • Protein
    Published protein sequences
  • PubMed
    PubMed citations for these articles
  • Substance
    PubChem Substance links
  • Taxonomy
    Related taxonomy entry
  • Taxonomy Tree
    Taxonomy Tree