![]() | ![]() |
Formats:
|
||||||||||||||||||||||||||
Copyright Velasco et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. A High Quality Draft Consensus Sequence of the Genome of a Heterozygous Grapevine Variety 1IASMA Research Center, San Michele all'Adige, Trento, Italy 2Myriad Genetics Inc, Salt Lake City, Utah, United States of America 3454 Life Sciences Corporation, Branford, Connecticut, United States of America 4Roche Diagnostics Corporation, Roche Applied Science, Indianapolis, Indiana, United States of America 5Amplicon Express Inc., Pullman, Washington, United States of America 6Technology Park Lodi, Lodi, Italy 7Department of Plant Systems Biology, VIB, Gent University, Gent, Belgium 8Department of Biological Chemistry, Padova University, Padova, Italy 9Department of Computer Science, Royal Holloway, University of London, Egham, Surrey, United Kingdom Brian Dilkes, Academic Editor University of California at Davis, Genome Center, United States of America #Contributed equally. * To whom correspondence should be addressed. E-mail: riccardo.velasco/at/iasma.it Conceived and designed the experiments: RV AZ MT DP ST JLS MHS SKB AG FS RV.Performed the experiments: MT DAC MP LMF SV JR GM DI GC BW DM TM MF JTM GE RO NG MS YC CD AM KS QT TH AL CP BT KV RB.Analyzed the data: RV AZ MT DAC AC DP MP SV GM GC DM MF MP PG MM CS JB FC ASA CP BT AS VS JF LS SMG ST CM VS SKB PF AG YVP FS.Contributed reagents/materials/analysis tools: RV AZ AC MS LD AM KS QT TH AL JF LS KV ST RB PF YVP FS RV.Wrote the paper: RV AZ MT DAC AC DP MP SV GM MP PG JB FC ASA BT AS SMG ST CM JLS RB MHS VS SKB PF AG YVP FS RV. Received October 5, 2007; Accepted November 21, 2007. This article has been cited by other articles in PMC.Abstract Background Worldwide, grapes and their derived products have a large market. The cultivated grape species Vitis vinifera has potential to become a model for fruit trees genetics. Like many plant species, it is highly heterozygous, which is an additional challenge to modern whole genome shotgun sequencing. In this paper a high quality draft genome sequence of a cultivated clone of V. vinifera Pinot Noir is presented. Principal Findings We estimate the genome size of V. vinifera to be 504.6 Mb. Genomic sequences corresponding to 477.1 Mb were assembled in 2,093 metacontigs and 435.1 Mb were anchored to the 19 linkage groups (LGs). The number of predicted genes is 29,585, of which 96.1% were assigned to LGs. This assembly of the grape genome provides candidate genes implicated in traits relevant to grapevine cultivation, such as those influencing wine quality, via secondary metabolites, and those connected with the extreme susceptibility of grape to pathogens. Single nucleotide polymorphism (SNP) distribution was consistent with a diffuse haplotype structure across the genome. Of around 2,000,000 SNPs, 1,751,176 were mapped to chromosomes and one or more of them were identified in 86.7% of anchored genes. The relative age of grape duplicated genes was estimated and this made possible to reveal a relatively recent Vitis-specific large scale duplication event concerning at least 10 chromosomes (duplication not reported before). Conclusions Sanger shotgun sequencing and highly efficient sequencing by synthesis (SBS), together with dedicated assembly programs, resolved a complex heterozygous genome. A consensus sequence of the genome and a set of mapped marker loci were generated. Homologous chromosomes of Pinot Noir differ by 11.2% of their DNA (hemizygous DNA plus chromosomal gaps). SNP markers are offered as a tool with the potential of introducing a new era in the molecular breeding of grape. Introduction Grapes (67 million t; http://faostat.fao.org/site/336/DesktopDefault.aspx) and their derivatives have a large and expanding worldwide market. Grapes can be grown at latitudes from 50°N to 40°S and up to 3,000 meters above sea level, with almost 98% of grape vineyards planted with Vitis vinifera L. ssp. sativa cultivars of Eurasian origin. Ever since the development of wine-making in Iran between 5,440 and 5,000 B.C. [1], wine has been an important component of many cultures. It has been celebrated by the Ecclesiates, by Horace, Goethe, Jefferson and the Nobel laureate J. C. Cela. A traditional icon of the Mediterranean diet [2], the grape has more recently been extensively cultivated in the New World and its cultivation is now moving to Asia. Given grape's content of resveratrol, quercitin and ellagic acid, grape products may contribute to reducing the incidence of cardiovascular and other diseases [3]. V. vinifera ssp. sativa, domesticated from the wild ssp. sylvestris [4], bears hermaphroditic self-fertilizing flowers. However, outbreeding by means of wind and insect pollination is the norm. As a result, cultivars are highly heterozygous and carry many deleterious recessive mutations [5]. Inbreeding depression is severe, so that sterility often ensues from the second or third generation of selfing. All wild Vitis species have 38 chromosomes (n = 19) and most interspecies hybrids are fertile [5]. The high chromosome number suggests a paleopolyploid state of the genome [6], an argument recently presented in the frame of a recent partial assembly of the grape genome [7] but still remaining controversial.Grape has the potential to become a model organism for fruit trees. The species can be transformed [8] and micropropagated via somatic embryogenesis [9]. Compared to other perennials, the genome size is relatively small, 475 Mb [10], similar to rice (Oryza sativa, 430 Mb; [11]), barrel medic (Medicago truncatula, 500 Mb, http://medicago.org/) and black cottonwood poplar (Populus trichocarpa, 465 Mb; [12]). In this paper we report a high-quality draft sequence of the grapevine genome. The genome is derived from the Pinot Noir clone ENTAV 115, a variety grown in a range of soils for the production of red and sparkling wines. The sequence provides information on the overall organization, gene content and structural components of the DNA of the 19 LGs of V. vinifera. The Sanger sequencing method was used to generate 6.5X coverage of the genome. This has been integrated with sequence reads generated by a scalable, highly parallel sequencing by synthesis (SBS) method with throughput significantly greater than capillary electrophoresis. The 4.2X coverage provided by SBS was crucial in identifying polymorphic sites and in closing most of the gaps between DNA contigs. This is the first project which utilizes both the longer Sanger and shorter SBS methods to determine the sequence of a large eukaryotic genome. Results and Discussion Sequencing and assembly The DNA of V. vinifera was extracted from young shoots and sequenced and assembled using the whole genome shotgun (WGS) method. Two techniques were adopted: the Sanger dye primer sequencing of paired reads [13] and 454 (SBS) of unpaired reads [14], which provided 6.5X and 4.2X genome coverage respectively (see Materials and Methods). In order to develop criteria for assembly, a preliminary experiment was conducted to assess heterozygosity: it was found to correspond to approximately 1 SNP per 0.1 Kb and 1 in/del per 0.45 Kb (see Text S1). The assembly program [11] was accordingly modified to accept a specified level of mismatches in overlapping sequences (details in Materials and Methods and in Text S1). The program also incorporated information on clone size, which ranged from 2 to 130 Kb (Table S1). The assembly started with unique sequences and progressively included sequences with a higher degree of repetitiveness. To avoid merging repeats into a single genomic sequence, the overlapping unique sequence contigs were merged if the rate of polymorphism did not exceed 2% and if the resulting sequence coverage of the overlap did not exceed 150% of the average coverage (see Text S1). These criteria were modified so that contigs with many supporting links were merged. In most cases, this procedure produced a correct assembly. Applying the procedure to about 6.6 M reads from Sanger sequencing, 90.6% of which represented paired clone ends, 211,374 initial seed contigs of unique sequences were generated. By using long clone links with non-repetitive clone ends, seed contigs were ordered into metacontigs (ordered assembly of contigs, referred to as supercontigs or scaffolds in other publications). After the sequences were merged into 120,000 contigs, data were combined with 4.2 genome-equivalents of SBS data. This helped to identify polymorphic sites and closed 25% of the remaining gaps between contigs. After removal of 10,847 contigs composed only of tandemly repeated sequences and disposal of 7,003 contigs shorter than 1,000 bp, the iterative assembly produced 58,611 contigs (Figure S1 and Table S2) corresponding to 530.9 Mb of genomic DNA. 44,179 of the 58,611 contigs were assembled into 2,093 metacontigs and the remaining 14,432 contigs were singletons. The final assembled sequences are deposited at the EMBL/Genbank/DDBJ databases (accession numbers: AM423240-AM489403, data released 2006-12-19). Metacontig data are available at http://genomics.research.iasma.it. The removed contigs represented mostly centromeric and rRNA gene sequences. Based on their read coverage, their sizes were estimated as 14.5 Mb and 16.3 Mb, respectively. Cultivated V. vinifera is highly heterozygous. As a result, many of the resulting contigs were consensus sequences derived from an alignment of the two haplotypes. The set of Pinot Noir chromosome pairs included a considerable number of haplotype-specific gaps (sequences present in one haplotype but not in the other; on this issue see also the ‘Pinot Noir genome structure and evolution’ section). The total length of the 1,042,174 identified gaps corresponded to 48.9 Mb. In some chromosomal regions, the two alternative haplotypes were too different for the algorithm employed during assembly to combine them into a single contig. Such separated contigs corresponded to the hemizygous DNA (22,061 contigs with the total length of 65.1 Mb). The total size of the genome represented by different homologous chromosomes can be estimated as twice the length of the sequences represented by the two haplotypes merged into a consensus (416.8×2 = 833.6 Mb), plus the sequence length represented by hemizygous DNA and gaps, respectively 65.1 and 48.9 Mb. After including the centromeric and rRNA regions (14.5×2+16.3×2 = 61.6 Mb), the size of the diploid genome was subsequently estimated to be 1,009.2 Mb, which gives an average 504.6 Mb per haploid genome (Table 1).
A region of 403,443 bp (preliminary experiment; see Text S1) was used to monitor the correctness of the assembly. Thirty four of the 37 contigs which mapped to the preliminary experiment sequence belonged to the metacontig assembled from the full genome sequence and were in the correct order. The remaining three contigs were not included because they contained repetitive clone links. Twenty two of the 36 boundaries between adjacent contigs were overlapping but not aligned due to large heterozygous inserts. The remaining 14 contig pairs corresponded to gaps: nine short gaps between 52 and 354 bp and five gaps larger than 500 bp. The largest gap (2.4 Kb) contained tandem repeats. Most of the gaps were associated to heterozygous inserts of repetitive elements. The total gap size, 8,067 bp, corresponded to about 2% of the region considered. Metacontig integration into the genetic map The next phase of the assembly involved positioning metacontigs in the genome using a genetic map developed at the Istituto Agrario di San Michele all'Adige (IASMA). Genetic mapping was based on 94 individuals derived from a F1 Syrah X Pinot Noir cross where the latter was the pollen donor. The map contained 1,006 markers [15], which were used both to anchor BAC contigs to a physical map (http://genomics.research.iasma.it) and to order metacontigs along linkage groups (LGs). A set of 799 additional SNP markers was developed based on polymorphic sites identified in contigs and was used to anchor and orient metacontigs to LGs. This genetic map included 1,767 molecular markers arranged in 19 LGs covering 1,276 cM (Figure S2; http://genomics.research.iasma.it). The SNP-based markers were also helpful in merging the adjacent metacontigs not previously merged because of repetitive or low-quality links between them. Integration of the DNA sequence and genetic map of LG4 is shown in Figure 1
Gene annotation and gene content Five quality levels were adopted for transcript assignment (see Materials and Methods): i) transcripts confirmed by tentative consensus sequences (TCs) and gene predictions (8,110); ii) transcripts confirmed by TCs aligned to the genome (8,160) and among transcripts not confirmed by TC; iii) the retained transcripts predicted at the exon level by different methods (4,028); iv) transcripts which were positive in gene prediction methods with differences at the exon level but with correct gene boundaries (308); v) transcripts which were found by different methods with contrasting results: only genes encoding proteins with significant similarities to known proteins were accepted (8,979). In total 29,585 genes were predicted. Grape gene content is comparable to Arabidopsis (26,819) and markedly different compared with rice (41,046) and poplar (45,555) genomes. Gene annotation followed a consensus approach. More than 79% of the genes predicted for the grape genome were annotated. Conserved putative grape genes were searched by the BLAST program with rice, poplar and Arabidopsis as references. A decision tree was implemented and used to carry this out. Sets of gene clusters with different levels of similarities among species as well as unique and putative species-specific genes were built. Using strict rules for homology determination, the subset of grape specific genes amounted to 16,859 (Figure 2
Functional classification of the predicted genes was carried out by an automatic procedure. The manually revised final classification (Figure S3) shows the functional classes and their percentage in the gene set. Putative grape-specific genes were not characterized by a particular annotation profile or by relative abundance in the functional classes. A slight numerical difference in favour of grape was noted for genes related to lignin biosynthesis and to berry specific pectins. These metabolic pathways are less significant in Arabidopsis and poplar respectively. Genes relative to disease resistance and wine quality are discussed in further detail below. Disease resistance genes Resistance to parasites in plants is controlled by the non-host and gene-for-gene pathways [16]. The non-host type was discovered only recently [17], [18]. The gene-to-gene pathway is frequently present in cultivated plants displaying dominant resistance genes, responsible for the initiation of signal transduction leading to deployment of defense mechanisms [19]. The majority of R proteins contain a nucleotide binding site (NBS) and a carboxy-terminal leucine-rich repeat (LRR) domain. The NBS is part of a conserved domain acting as a molecular switch for the signal transduction. The LRR is credited with recognition specificity akin to an antibody-like detector of pathogen effectors [20]. At the N-terminus NBS-LRR proteins carry either the coiled coil (CC) domain or a domain homologous to the Toll/Interleukin-1 Receptor (TIR, [21]), allowing classification of NBS genes into two groups, the CC-NBS-LRR, present in all angiosperms, and the TIR-NBS-LRR, specific to dicotyledonous species [22]. Based on resistance domain analyses, the grape genome was found to contain 341 NBS genes (Figure 3
Besides NBS genes, the grape genome contains several signalling components of plant disease response which are encoded by genes EDS1, PAD4, COI1, MPK4, JAR1, ETR1 and NDR1, known to be recruited by resistance gene products (Table S3). The NPR1 gene, a regulator of the systemic acquired response to pathogens [23], is present in one copy in grape and in Arabidopsis, but has five copies in poplar. Likewise, RAR1 and EIN2 are present in single copies in the grape genome. Genes encoding the pathogenesis-related proteins (PRs, [24]) include nine copies of PR-1, eight of PR-2, five of PR-3, one copy of PDF1, one of PDF2, and several copies of PR5 and protease inhibitor-like genes (Figure 3B In addition, the grape genome contains eight genes similar to the MLO gene for mildew resistance in barley, compared to the 15 MLO-like genes known for Arabidopsis [25]. MLO proteins belong to a large family of seven-transmembrane domain proteins specific to plants, encoded by genes homologous to barley MLO [25]. MLO recessive alleles confer an effective resistance against mildew pathogens. Furthermore, the powdery mildew non-host resistance-related genes PEN1, PEN2 and PEN3 [17], [18] were found in 5, 5 and 10 copies, respectively. In grape, the disease-related genes represent a significant part of the genome. In spite of this, many grape varieties, including Pinot Noir, are susceptible to several fungi, such as grey mould (Botrytis cinerea), downy mildew [26] and powdery mildew [27], which have to be kept under control by heavy fungicide treatments. The failure to mount an effective defense response is probably due to a defective pathogen recognition. It is known that NBS-LRR genes are undergoing diversifying selection [28], e.g., variation in the sequence of the Arabidopsis gene RPS2 shows a signature consistent with pathogen-stimulated selection [29]. Moreover, the extent of variation in the activity of NBS-LRR genes may have been affected by balancing selection [30]–[33]. Grape alleles of the same resistance genes did not co-evolve in the presence of the agriculturally most important grape pathogens [34]. Indeed, allelic variation due to SNPs present in functional resistance domains was associated with the phenotypic divergence between resistant and susceptible genotypes only when susceptible V. vinifera and resistant non-vinifera clones were considered [34]. In addition, the long time interval necessary for the grape to complete one generation, together with its vegetative propagation, makes it difficult to match the evolutionary rates of microbial or insect pests, which in vineyards are boosted by massive use of chemicals [35]. Such detailed knowledge of the grape genome will serve to accelerate the development of genetic strategies to counter crop loss due to dynamic and genetically diverse pathogens. The TIR-NBS-LRR genes are preferentially located in LG 18, the CC-NBS-LRR genes in LGs 9 and 13 and the truncated NBS genes in LGs 12 and 13 (Figure 3B Several clusters of NBS genes mapped to chromosomal regions where genetic resistance to fungal diseases, such as downy and powdery mildew, were previously assigned (Figure 3B Phenolic and terpenoid pathways Grape secondary metabolites, particularly polyphenols, have a strong influence on wine quality [40]. Most phenolics derive from phenylalanine via phenylalanine ammonia-lyase (PAL). They encompass a range of structural classes and biological functions and include lignins, phenolic acids such as hydroxycinnamic and hydroxybenzoic acids, and polyphenols such as flavonoids and stilbenes. Flavonoids are the most common plant phenolics. In flowers and fruits they attract pollinators and seed dispersers and are particularly involved in UV-scavenging and disease resistance [41]. Flavonoids contribute to human health [42]. The flavonoid skeleton, synthesized by chalcone synthase (CHS), is converted to chalcones, flavanones, flavonols, flavanols, anthocyanins and proanthocyanidins (condensed tannins). In red grape, flavanols and anthocyanins are abundant, the latter accumulating mostly in the berry skin and the former in the seeds [43]. In the last decade considerable effort has been made in identifying and cloning grape flavonoid biosynthetic genes [44]–[47]. The grape genome sequence now offers the opportunity of compiling an exhaustive overview of the phenylpropanoid pathway. Gene predictions corresponding to all those genes known to encode enzymes of the pathway could now be found. These include C4H and 4CL (acronyms are explicated in note 1 of Figure 4A
Within the phenylpropanoid pathway, relatively large gene families have been described for poplar compared to Arabidopsis [48]. Our results highlight some significant differences, such as the number of PAL and F3'5'H gene copies which were even greater in grape. In general, grape and poplar secondary metabolism exhibits a tendency toward gene family expansion. Conversely, in Arabidopsis all enzymes of the central flavonoid metabolism, except for FLS, are encoded by single genes [41]. This is consistent with the noted low metabolic investment in flavonoids of Arabidopsis, a species which reproduces without the need for insect pollination and has no perennial woody habit. In grape, as in a few other species, the condensation of p-coumaroyl-CoA with malonyl-CoA gives rise to stilbenes via stilbene synthase (StSy; [49]). Among stilbenes, monomers and oligomers (viniferins) of resveratrol contribute to resistance to fungal pathogens [50]. Resveratrol has gained attention due to its alleged beneficial effects on human health [51]. Stilbene synthase belongs to a large family: the analysis of the grape genome predicts at least 21 copies. This number agrees well with a recent StSy sequence analysis in infected grape leaves [26] but it differs from the one predicted in the PN40024 grape genome sequence [7]. Most of these copies, as well as most PAL genes, are clustered in LG 16. Further, several peroxidase genes were predicted, some of which could participate in the formation of viniferins, as previously suggested [50]. Recently, a resveratrol glucosyltransferase putatively involved in piceid synthesis has been isolated and biochemically characterized in V. labrusca grape berry [52]. Our analysis revealed that its homolog in Pinot Noir (99% sequence similarity) is present as a single gene mapping on LG 3. Terpenoids are among the most abundant and structurally diverse group of natural metabolites. Volatile and non-volatile terpenes are essential for plant growth and development (e.g., gibberellin phytohormones), but they are also key players in the interaction of plants with the environment [53]. The substrates for the biosynthesis of about 22,000 terpenes are isopentenyl diphosphate (IPP) and dimethylallyl diphosphate (DMAPP). The mevalonate (MVA) and the mevalonate-independent DOXP/MEP pathways are responsible for the synthesis of IPP and DMAPP in the cytosolic and plastidic compartments respectively [54]. DOXP/MEP is the dominant route for monoterpene biosynthesis in the grape berry [55]. Three prenyltransferases produce terpene precursors, prenyl diphosphates, geranyl diphosphate (GPP), farnesyl diphosphate (FPP) and geranylgeranyl diphosphate (GGPP). Terpene synthases (TPS) catalyze the formation of hemiterpenes [51], monoterpenes (C10), sesquiterpenes (C15) or diterpenes (C20) from the substrates DMAPP, GPP, FPP or GGPP respectively (Figure 4B All TPSs are similar in physico-chemical properties. Moreover, the close sequence relatedness of their genes prevents discrimination of their catalytic functions, supporting a rapid divergence of catalytic activity of closely related TPS genes [53]. Three classes of TPSs are described and only classes II and III are specific for the plant secondary metabolism [56]. Forty seven TPS genes participate in the secondary metabolism in poplar [12], while in grape only 35 TPSs were identified, a number close to the 32 found in Arabidopsis. In the grape genome, they are located mainly on LGs 9, 10 and 19 (class I TPs on LGs 7, 9, 10 and 19, Table S5). Several higher plant genes of the terpenoid pathways have been cloned [57], but only a few of them had previously been identified in grape [58]. Having the complete sequence of the grape genome, 124 genes related to the terpenoid pathway were identified (Table S5). Of these, 110 were mapped to all LGs. Functionally, 24 are related to carotenoids, 24 to abscisic acid metabolism, 10 to gibberellin hormones, and 6 cover steps of the core terpenoids pathway: 5 prenyltransferases and 1 isopentenyl diphosphate delta-isomerase. For the MVA and non-MVA pathways, nine (4 DXS, DXR, ISPD, ISPE, ISPF, ISPH) and eight (2 AACT, HMGS, 3 HMGR, MK/MVK, MVD) putative genes were identified respectively. Plant monoterpens are preferentially confined to specialized organs. They play an important role in defense as well as acting as allelopathic agents and attractants for pollinators [59]. In grape, monoterpenes contribute to wine free volatiles: typical components of the aroma-rich grape varieties are linalool, geraniol, nerol, citronellol and α-terpineol, which are stored in exocarps and vacuoles. Monoterpene biosynthesis has not yet been studied because several metabolic steps may take place without enzymatic catalysis. Moreover, the knowledge of mechanisms controlling monoterpene synthase activity is still largely incomplete. In the grape genome four monoterpene synthase genes were identified encoding linalool synthase, limonene synthase, myrcene synthase and α-terpineol synthase. Transcription factors In grape, 2004 TF genes were identified (Figure 5A
Sixty-two families of TF genes were found, a number similar to the 64 for Arabidopsis, 62 for rice and 63 for poplar [63]. TF families like MYB, AP2/EREBP, bHLH and MADS-box include a large number of members [11], [60]. We compared the number of genes in each of the 60 grape TF families in common to the other three plant genomes: finding a nearly linear correlation (Figure 6
Across the species mentioned, MYB (279) are the most abundant [11], [64]. They play a role in controlling the accumulation of secondary metabolites in the grape berry [65]–[67]. A gene from this TF family is also known to play a key role in the regulation of anthocyanins and flavonols during the non-climacteric ripening of strawberry [68]. Non-climacteric ripening (occurring in fruit such as strawberry and grapevine) is a process characterized by the absence in respiratory pick and ethylene bursts, two phenomena typical of the climacteric fruits ripening. In the grape genome were also found 143 leucine-zipper genes. Together with EREBP TFs they contribute to the plant's defense response [69]. In tepary and common bean, a bZIP gene plays an important role in the response to water deficit and in the regulation of abscisic acid levels. [70] In the grape genome, the MADS-box family is also over-represented. These TFs regulate flowering-related phenomena, as well as other metabolisms [71]. MADS-box TFs may have been important during plant evolution because they allow plant reproductive structures to adapt to variations in climatic conditions [72]. It was found that two tandem MADS-box genes (MADS-RIN and MADS-MC) regulate fruit ripening and inflorescence determinancy in the climacteric fruit tomato. Mutation at rin locus caused a failure in the normal ripening physiology [73]. A ripening mechanism common to both climacteric and non-climacteric species, such as grape, has been hypothesized [74]. In support of this, we identified two TF classes in grape, AP2/EREBP and EIL, which contribute to ethylene signalling during ripening of climacteric fruits, and also found ethylene receptors belonging to ETR/ERS families. Repetitive elements Matching the sequences of assembled contigs with original reads made it possible to characterize each DNA segment by the number of matching reads (see Materials and methods). For the read coverage of 10.7X, a DNA segment was considered unique when represented by 15 or fewer matches. Moderately repeated sequences (2 to 8 copies per genome) were expected to have 16–100 matches. Sequences with more than 100 matches were considered highly repetitive. They were masked before gene prediction, thus excluding most of the coding parts of repetitive elements from the putative gene set. Dispersed highly repetitive DNA sequences were identified by an iterative procedure, and the resulting collection of 90,483 repetitive segments were grouped into 136 types. Members of each type were translated and compared to each other and the similarity scores were used in a UPGMA-like clustering. The similarity tree consisted of eight clusters lacking a common root (Figure S4), each of which was assigned to the known classes of repetitive DNA sequences (Table S7). Grape transposable elements (TEs), totalling 108.5 Mb, represent the most abundant set of repeats. The repeats were included in group I (retrotransposons: Copia, Gypsy, LINE) and group II (DNA transposons: Mutator, CACTA, hAT) according to Feschotte et al. [75]. The most abundant TEs were Gypsy/athila-like elements followed by Copia elements. DNA transposons were represented by 9,562 copies (7.1 Mb). The TEs seem to be more abundant in grape compared to poplar [12], Arabidopsis and rice [11]. Putatively autonomous TEs were identified by significant BLAST analysis against the Uniprot database. TEs without a significant BLAST hit were attributed to the non-autonomous group (Table S7). Out of 136 repeat types, 20 were classified as long tandem repeats with a unit size from 100 to 430 bp. They were grouped into ten major sub-classes. Short tandem repeats (microsatellites) were also identified. Their thresholds, number of copies and total DNA length are reported in Table S8. Microsatellites cover 2.1 Mb, including the telomeric repeats (TTTAGGG). Out of 171 contigs with identified telomeric sequences, 42 had telomeric ends. In the linkage map, they represent potential markers for telomeres. An alternative estimation of the length of identified repetitive DNA was performed using the number and total length of reads matching repeat sequences, identified above. This new estimate gave a value of 138.5 Mb, corresponding to 27.4% of the 504.6 Mb genome size. Non-coding RNAs MicroRNAs MicroRNAs (miRNAs) and trans-acting siRNAs (ta-siRNAs) have a significant role in plant development and stress response [76], [77]. The majority of the 1220 plant miRNAs listed in the miRBAse [78] are from Arabidopsis (184), rice (243) and poplar (215). A BLAST search of sequences similar to the Arabidopsis miRNAs genes was performed on the grape genome. Allowing for three or fewer mismatches, 143 miRNA genes representing 28 families ([78]; Table 3, Table S9) were identified.
Three types of miRNAs (miR827, miR828 and miR846) were not previously found outside Arabidopsis, and were considered “non-conserved” miRNAs [79], [80]. However, these genes are present in the grape genome, indicating that they were either lost in the lineage leading to Populus or are missing from its genome assembly. The miRNA passenger strands (miRNAs*) are highly conserved between grape and Arabidopsis [79]. Sequences predicted to produce ta-siRNAs [81] are conserved in several plant species, grape included. Putative grape miRNAs and siRNAs target the same classes of genes as they do in Arabidopsis, rice and poplar: transcription factor genes, genes involved in stress response and nutrient uptake, genes for RNA silencing and the non coding RNA TAS3 (Table 3). In grape, 56 RNA-dependent DNA polymerase genes are potentially targeted by miR396 and miR846, a phenomenon not reported in other plant species. BLAST searches identified four Dicer-like proteins (Helicase, RNAse IIIa/b domains), nine Argonautes (PAZ/PIWI domains), and six RNA-dependent RNA polymerases (RdRp domains), indicating the presence in the grape genome of a complex RNA processing machinery (Figure S5). Transfer RNA Small nuclear RNA Non-coding RNAs include five major and four minor snRNA families, all components of splicing factors. The Arabidopsis snRNA list of Wang and Brendel [83] was used to search for similar sequences in grape. We found 89 snRNA genes and pseudogenes (75 in Arabidopsis) (Table S11). Several snRNA genes were clustered in the genome. Ribosomal RNA Large rRNA units consist of two segments, one hosting the genes 18S rRNA, 5.8S and 28S, the second containing three arrays of tandem repeats. In grape the length of the rRNA unit is around 10.8 Kb. The variable segment includes three arrays of tandem repeats: about 40 copies of a 44–45 bp repeat, three copies of a 150 bp repeat and 5.5 copies of a 193 bp repeat. The unit is repeated 1450–1550 times in the genome (16.1 Mb). rRNA units may contain insertions of retrotransposons of three different lengths (2870, 2950, and 5800 bp). Retroelements in rRNA sequences may cause transposition of rRNA sequences. The DNA sequence for the small ribosomal RNA unit (1,250 bp) contains two genes for 5S rRNA, 120 bp each with a single nucleotide difference between them. In the genome the unit was represented by 170–180 copies. Together, large rRNA and 5S rRNA sequences were estimated to amount to 16.3 Mb. Pinot Noir genome structure and evolution The existence of structural diversity between homologous chromosomes within plant species has been reported [85]. This type of molecular variation seems to be common in allogamous plants [86] and could also be a characteristic of autogamous species [33]. Grape does not tolerate long term inbreeding [5] and high outcrossing rates maintain the genome in a heterozygous state, as evident in the remarkable variation found in collections of grape varieties [87]. The genome sequence data from a cultivated grape variety provides unprecedented insight into the structural nature of heterozygosity in an outcrossing species. The variation within this clone of grape consists largely of chromosome-specific gaps and hemizygous DNA. In addition to the regions in which it was possible to merge haplotypes representing DNA from both chromosomes in a consensus sequence, regions were found which were chromosome-specific, i.e., either with different DNA sequence flanked by orthologous regions of the two homologous chromosomes (hemizygous DNA) or gaps corresponding to sequences absent in one chromosome but not in the other. One million gaps, covering 48.9 Mb, and 65.1 Mp corresponding to hemizygous DNA distributed in 22,610 contigs were identified. These data allow us to conclude that the homologous chromosomes of Pinot Noir differ on average by 11.2 % of their DNA sequences and that the grape genome exists in a dynamic state, mediated at least in part by transposable element activity, as reported for helitron TE [88]. Indeed, the large grape genomic gaps are frequently bordered by 5 bp direct repeats, reminiscent of a type of DNA excision mediated by a precise process of transposition [89]. The genomic region represented in Figure 7A
In the preliminary experiment (see Text S1), it was found that the frequency of SNPs correlated with deletions and insertions. Segments with less than one in/del per Kb had 4.4 SNPs per Kb, whereas segments with one or more in/del per Kb had 16.7 SNPs per Kb. A total of 2 millions SNPs (1,751,176 anchored and the remaining present in other assembled sequences) were discovered and validated and more than a million in/dels were annotated on the sequence with defined location. Our data allow us to extend the evaluation of nucleotide variation to the entire genome rather than to limited resequenced DNA regions [86]. Among recently sequenced animal genomes, a high SNP frequency was found in sea urchin [90] and Cyona intestinalis [91]. Across the grape genetic map (Figure 7B Coding and non-coding regions demonstrated different degrees of polymorphism with 2.5 and 5.5 SNPs per Kb respectively. One or more SNPs were found in 86.7% of anchored genes and 71.4% of genes had more than four SNPs (Figure 7C In several regions of the 19 LGs, SNP frequency peaks between 5 and 7.5 per 1 Kb, even if the frequency may reach values much higher than those cited (Figure 7B Arabidopsis and poplar have likely undergone three rounds of whole genome duplications during evolution [12], [97], [98], although this has been challenged recently [7]. The first duplication (referred to as 1R, [98], [99]) may have predated the divergence of monocots and eudicots, while the second one (2R) probably occurred around the radiation of the core-eudicots prior to the divergence of poplar and Arabidopsis [12], [99]. The most recent duplications in poplar and Arabidopsis have occurred after their divergence [94]. The current thinking is that Vitis is an early diverging lineage within the rosids that has diverged prior to the divergence of Arabidopsis and poplar [100]. We determined the relative age of grape duplicated genes from the number of synonymous substitutions per synonymous site (KS). The age distribution of Vitis duplicates shows a clear peak of KS values between 0.6 and 1.2 suggesting a relatively recent large-scale duplication event (Figure 7D Different approaches were taken to estimate the age of the youngest large-scale duplication event. First, it should be noted that the youngest peak lies to the left of the peak formed by KS values between orthologs of Vitis and Arabidopsis (Figure S6) although one should be very cautious in comparing different KS distributions due to different substitution rates in different organisms. Second, we also detected duplicated segments, covering about half of the genome, using a previously described method [102]. KS values of genes in these duplicated blocks (Figure 7E Jaillon et al. [7] propose that three ancestral genomes contributed to the Vitis lineage and suggest ancestral hexaploidization for most eudicots, while not finding evidence for a recent duplication in grape. Furthermore, they suggest that, since their split, poplar has undergone an additional whole genome duplication, while Arabidopsis has undergone two additional genome duplications. These results are at odds with our findings. Reanalysis of Arabidopsis and poplar genomes (not shown) uncovers, for both, many homologous segments with a multiplication level between five and eight, which suggests three rounds of duplications for both genomes [97]. If the Arabidopsis and poplar genomes were ancient hexaploids, to which two additional genome duplications had been added, fragment multiplication of up to twelve should be expected for Arabidopsis, and up to six in poplar. The fact that there is substantial ambiguity in the dating of the duplicates in duplicated segments suggests that the most recent large-scale duplication event reported here for Vitis might have occurred in close proximity to the Vitis speciation event. Therefore, an alternative scenario than the one presented by Jaillon et al. [7] that we would like to put forward is shown in Figure 8
Concluding remarks The Grapevine Genome Initiative was established with the aim of accelerating the breeding of a difficult perennial species. Grape breeding for disease resistance, if not for immunity, would be a solution to the problem of the emergence of aggressive races of micro-organisms that are currently controlled by massive use of agrochemicals. The problem is not a simple one: how to modify a complex and highly heterozygous genome without altering wine quality. Precise knowledge of all the genes influencing quality and resistance traits is an absolute prerequisite for such modifications. A high number of genes related to disease–resistance have been identified; many of them have been mapped to LGs and a large part of them are tagged with one or more SNPs. These resistance genes, however, did not co-evolve in the presence of the most important grape pathogens [34], a condition which may have not sufficiently protected the species. This is in part the reason why a deep knowledge of the grape genome is the starting point for developing genetic strategies to counter pathogens. Description of the grape genome sequence opens the opportunity for molecular breeding in grape. The fertility of hybrids between wild and domesticated grape species with 19 seemingly co-linear chromosomes [5], [106]–[108] makes it feasible to introduce new resistance genes via traditional breeding. The NBS gene clusters identified here can be associated with QTLs affecting disease resistance or tolerance behaviour of grape varieties (this is the case with LGs 12, 14, 15 and 18; [27], [39]). This large and underexploited reservoir of resistance genes could be easily moved in clusters across genomes by choosing appropriate molecular markers to selectively introgress only the resistance traits. This would prevent the loss of alleles important for grape and wine quality. Thus, the anchored sequence of the grape genome, together with the large arsenal of SNP loci, now offers a tool to open a new era in the molecular breeding of grape. WGS using longer read dye-terminator sequences can be combined with shorter SBS sequence data using dedicated assembly programs. Using this method we have resolved a complex heterozygous eukaryotic genome. Future whole genome sequencing efforts should be able to combine these two methods to produce assemblies in shorter times while reducing the need for resources. The ability to resolve the haplotypes in Pinot Noir suggests that sequencing DNA mixtures, for example more than one genotype of a given crop, is practical. Such an approach generates both a consensus sequence of the genome and a set of mapped marker loci to be used in breeding programs. Materials and Methods DNA source In order to prepare shotgun libraries, DNA was extracted from young shoots of Pinot Noir, clone ENTAV115, randomly sheared and size-selected. Two BAC libraries were also constructed ([109]; Keygene, Wageningen, NL) and clones assembled in a physical map (http://genomics.research.iasma.it). A population of 94 F1 plants from the cross between Syrah and Pinot Noir was the source of the DNA used for mapping markers and anchoring metacontigs. Libraries Fosmid and shotgun libraries were from DNA purified by a CTAB method [110]. Sheared DNA (Gene Machines Hydroshear, Ann Arbor, MI) was size selected to produce libraries with insert sizes of 2, 3, 6, 10 and 12 Kb. DNA was ligated to a high copy plasmid vector and transformed into DH10B T1r E.coli cells (Invitrogen, Carlsbad, CA). The fosmid library was produced from DNA fragments between 30 and 45 Kb. DNA inserts were ligated into a pCC1FOS vector packaged with MaxPlax lambda extracts and transfected into EPI300-T1r E.coli cells (Epicentre, Madison, WI). LB agar contained chloramphenicol and 99,840 clones were picked (QPix2 Genetix, Hampshire, UK) into 384 well plates containing LB freezing medium, incubated for 18 h, replicated and stored at −80°C. Sanger shotgun sequencing DNA was amplified from bacterial cultures by a rolling circle technology (Templiphi kit; GE Healthcare, Amersham) and Sanger sequenced on MegaBACE 4500. Clones with inserts from 6 to 20 Kb, BAC clones and clones from fosmid libraries were amplified by the Templiphi large kit. BAC clones were bidirectional dye terminator sequenced on ABI Prism® 3730. Sequencing by synthesis (SBS) Pinot Noir DNA isolated as described was subjected to nebulization to generate fragments of approximately 620 bp. These were amplified as in Margulies et al. [14] and sequenced on the Genome Sequencer 20 (Roche Applied Sciences, Indianapolis, IN). The standard protocols for 454 Sequencing using the Genome Sequencer 20 system call for the generation of a library of tagged single stranded DNA molecules (see Margulies et al [14] for details). This single stranded library is then tested for optimal sequencing parameter through generation of sequencing beads by emulsion PCR with dilutions of the single stranded library. This titration step determined that three microlites of a single stranded library were used to generate 23 million beads. The standard GS20 pyrosequencing profile uses a sequencial flow of each nucleotide in a repeating pattern of TACG. This pattern is repeated for 42 cycles as per the standard protocol and generates 100bp of sequence information on average. For the purposes of generating longer sequencing reads the sequence profile of 42 cycles of nucleotide flows was changed to 100 cycles which increased the average read length from 105 bp to 200 bp. The GS20 has standard software to recognize high quality reads and convert the signal (light) into a base call. The standard software GS20 package was used to generate the sequence files. In total, 12.5 million reads corresponding to 2,111 million Q20 bases were produced. Primer walking Clones bridging neighboring contigs were selected for gap closure. The clones were grown in 384-well plates and sequence-specific primers were designed and used in dye terminator sequencing reactions resolved on MegaBACE 4500. Genome assembly 6.2 million reads for a total of 3.5 billion Q20 bases were produced by Sanger sequencing from 43 libraries (Table S1) and about 90.6% of reads were paired. Chloroplast sequences were detected and the chloroplast genome was assembled for assessing the sequence quality and insert size distribution of each library, characteristics that were used in assembly. Chloroplast forward and reverse reads validated the correctness of data tracking and the contamination level for each sequencing plate. The size of the chloroplast genome was 160,928 bp. Remarkably, the sequence was identical (without a single mismatch) to the one already published [100]. SBS data were essential to identify polymorphic sites and close small gaps. The amount of chloroplast and mitochondrial sequences in SBS data was 5.5 and 2.0%, respectively, vs 3.1 and 1.8% in Sanger sequences. Four programs developed at Myriad Genetics Inc. were organized into a pipeline for WGS assembly: (1) Sanger and SBS sequences were compared by the Match program. It produced a table of pairwise sequence overlaps with indication of the sequence orientation, offset and match score. The overlaps were accepted if they involved more than 50 bp with no more than 2% of polymorphic positions. (2) Consensus sequences were built using the Assemble program, adapted to specified levels of heterozygosity (2% or less) and large gaps (up to 500 bp). The program reads the sequence and quality data in Fasta or GDE format, considers clone sizes and performs multiple alignments, building the consensus sequence and reporting polymorphisms of the sequence. (3) Sequences were aligned with the Align program in a two-step procedure including fast search of identical segments and optimal alignment of gaps up to 7 Kb. Larger or multiple gaps may still be a problem for the alignment and leave some overlapping contigs not merged. (4) Visual comparison of two sequences was performed by the Dotmap program. The result of the assembly is a Fasta file of assembled contig sequences with quality values assigned for each position and the list of positions of polymorphisms. (5) Metacontigs were constructed as ordered and oriented groups of contigs linked with paired reads matching to non-repetitive parts of the contigs. We used also marker information to avoid building chimeric metacontigs from different LGs (see Text S1 for more details). Genetic maps and genome integration Metacontigs were integrated in the 19 grape LGs based on the genetic map derived from the cross Syrah X Pinot Noir. To improve marker density, polymorphic sites identified during WGS were selected for developing 799 additional SNP-based markers (http://genomics.research.iasma.it) using the SNPlex™ Genotyping System [111]. DNA was prepared according to the instructions and the samples were analyzed on the ABI PRISM® 3730xl (Applied Biosystems, Foster City, CA). Data were analyzed by Gene Mapper v. 4.0 (Applied Biosystems, Foster City, CA). The genetic maps were followed a double pseudo-testcross strategy [112]. Marker phase was determined by the Phasing algorithm (http://math.berkeley.edu/dustin/tmap/; [113]), which provides LG assignment and ordering of loci. LG were assembled with a minimum LOD of 8.0 and a maximum distance of 35 cM. Homologous LGs of the two parents were merged in a consensus map. Genes and gene families Methods used were FgenesH [114], homology-based FgenesH+ [114], Twinscan [101], GlimmerHMM [115] and Tentative Consensus [94] transcripts derived from 320,000 ESTs deposited in databases. Trimmed sequences were clustered using MegaBLAST [116] and aligned using Cap3 [117]. After quality testing 28,856 TCs were retained. BLAST searches against Uniprot and plant protein databases, annotated with GO terms, of various domain libraries were the base for gene annotations GO terms were extracted from BLAST searches against KEGG databases, KOBAS of metabolic pathways and InterproScan [118] and clustered using their semantic similarity [119], accuracy weight and the path from the root node of the ontology to the most detailed annotation. More than 79% of the gene models were annotated. Functional classification was based on Gene Ontology (www.geneontology.org) and manually controlled. Homologs across species were established using a BLAST search against Rice, Poplar and Arabidopsis, considering sequence alignment coverage, best multi directional BLAST hits, sequence identity and protein domains. Sets of clusters reflected different levels of similarity among species as well as unique and putative species-specific genes. For the analysis of specific gene families, methodological variations were introduced as reported in text. Genome duplication Genes with similarities to TEs were removed and paralogs identified as in Li et al. [120]. Age distributions were build as described by Maere et al [105]. Duplicated segments were analyzed with i-ADHoRe [102], based on the following parameters: gap size of 40 genes, Q value of 0.9, probability cut off of 0.001, and a minimum of 3 homologs to define a duplicated segment. Phylogenetic trees for duplicated genes (so-called anchors) in duplicated segments were based on pairs of grape paralogs representing the reciprocal best hits with aligned length of >150 amino acids and considering comparisons with proteins from Physcomitrella patens, used as outgroup, and the best Arabidopsis homolog. Proteins were aligned with CLUSTALW and only unambiguously aligned regions were considered. Tree construction used seqbot, protdist, neighbour and consense from the PHYLIP package [121] with 1000 replicates. Only topologies with over 70% bootstrap support were considered. For each paralog, if the topology was (Grape1, Grape2) Arabidopsis, it was concluded that the paralog was duplicated after the split of grape and Arabidopsis. Repetitive elements Based on 10.7X coverage, a DNA segment was defined unique when associated to 15 or less matches. The threshold was selected as the middle point between two Poisson distributions, with 10X and 20X the expected coverages corresponding to unique and duplicated segments, respectively. For dispersed repetitive sequences, an iterative procedure was developed. Each segment was searched against all sequences, starting with the repeat presenting the highest number of matches. At each iteration, the program identified repeats with decreasing similarity to the original seed repeat, and the complete set of copies of a particular repeat cluster was obtained. These DNA segments were masked and the remaining sequences were searched for the next repeat with the highest number of matches. Members of each of the identified repeat types were translated and compared using BLAST program. The similarity scores were used in a UPGMA-like clusterization. Short tandem repeat (microsatellite) motifs were identified by a specifically designed program considering their number above a threshold. This was selected based on the occurrence of the motif in the genome so that the number of segments with units exceeding the threshold would be less than 1. Non-coding RNAs Methods used for miRNA detection and individuation are cited in the caption of Figure S5. Methods and reference papers for tRNA, snRNA and snoRNA are cited in the text. Ribosomal RNA were defined and computed according to assembly program of Myriad Genetics Inc. (Salt Lake City, Utah). Transcription factors The reference information was from PlnTFDB, an integrate plant transcription factor database [63] including genes from A. thaliana (ArabTFDB), P. trichocarpa (PoplarTFDB) and O. sativa (Rice TFDB) (available at http://plntfdb.bio.uni-potsdam.de). For each TF family, conserved domains were used as queries for searching similar sequences in the grape genome. The protein domains of identified TF were classified using the Pfam database [122]. Text S1. Supporting text; supporting references (0.04 MB DOC) Click here for additional data file.(40K, doc) Figure S1. Histograms of contig size distribution. Histograms showing the distribution of the assembled contigs in size classes. The average contig size is 9.1 Kb. Half of the genome is covered by 7,878 contigs larger than 18.2 Kb. (0.08 MB TIF) Click here for additional data file.(84K, tif) Figure S2. Anchored and oriented metacontigs along the 19 LGs. Representation of the 435.1 Mb of V. vinifera genomic sequence contained in 397 metacontigs aligned and oriented to the genetic map of the 19 LGs. Distances (shown in brackets on the left for some markers) refer to Troggio et al.'s dense map [1] (http://genomics.research.iasma.it). Most metacontigs were anchored to the map using markers with unique sequence locations: SSRs, BAC-end sequences or SNP-based markers derived from either ESTs or assembled sequences of the two haplotypes of the Pinot Noir genome. Metacontigs with no marker information were associated to other metacontigs anchored to the map. There are reliable links between them but they are not merged for several reasons, i.e., too large an overlap between them due to some contigs at the end of one metacontig not being in the proper place; gaps too large due to missing contigs; poor quality or insufficient number of links. Approximate size in Kb of each metacontig is indicated on the right. Gaps separating metacontigs are of undefined size. (2.27 MB TIF) Click here for additional data file.(2.2M, tif) Figure S3. Grape gene class assignment based on putative function. Functional classification of putative grape genes (total and grape-specific) based on Gene Ontology (www.geneontology.org). (4.92 MB TIF) Click here for additional data file.(4.8M, tif) Figure S4. Repetitive element classification and clustering. Phenograms showing the relative similarities of 95 types of repetitive elements out of the 136 identified in the assembled V. vinifera genome. The remaining 41 repeat types (without ORFs or with ORFs shorter than 200 bp) are not included. Repeat types were classified according to Feschotte et al. [2]. Clustering was performed by an all vs all comparison using the BLAST program and was visualized by DrawTree (Myriad Genetics, Salt Lake City, Utah). (0.65 MB TIF) Click here for additional data file.(665K, tif) Figure S5. Major RNA silencing proteins present in V. vinifera. Major proteins participating in the RNA silencing pathways in V. vinifera have been identified by homology to Arabidopsis proteins using tBLASTN against the V. vinifera genome. Coding sequences of predicted genes were verified in the TC trancripts database. Protein alignments and trees were performed using MEGA version 4 [3]. Aligned protein sequences are: A) Dicer-like proteins (DCLs). Aligned protein sequences are Vitis vinifera putative Dicer-like proteins and A. thaliana DCL1 (At1g01040/Q9SP32), DCL2 (At3g03300/NP_566199), DCL3 (At3g43920/NP_189978) and DCL4 (At5g20320/AAZ80387). B) AGO proteins. Aligned protein sequences are V. vinifera putative Argonaute proteins and A. thaliana AGO1 (At1g48410/NP_849784), AGO2 (At1g31280/NP_174413), AGO3 (At1g31290/NP_174414), AGO4 (At2g27040/NP_565633), AGO5 (At2g27880/Q9SJK3), AGO6 (At2g32940/NP_180853), AGO7 (At1g69440/AAQ92355), AGO8 (At5g21030/NP_197602), AGO9 (At5g21150/CAD66636), and AGO10 (At5g43810/Q9XGW1). C) RNA-dependent-RNA-polymerases (RDRs). Aligned proteins are V. vinifera putative RDRs and A. thaliana RDR1 (AT1g14790/NP_172932), RDR2 (AT4g11130/NP_192851), RDR3 (AT2g19910/NP_179581), RDR4 (AT2g19920/NP_179582), RDR5 (AT2g19930/ NP_179583), and RDR6 (AT3g49500/NP_190519). (0.11 MB TIF) Click here for additional data file.(102K, tif) Figure S6. Duplicated state of the grape genome. Age distributions of Vitis paralogs (pink line) and Vitis-Arabidopsis orthologs (blue bins). (0.40 MB TIF) Click here for additional data file.(397K, tif) Table S1. Details of libraries used in sequencing and estimation of the V. vinifera genome coverage. (0.03 MB DOC) Click here for additional data file.(40K, doc) Table S2. Summary of the whole genome shotgun assembly of V. vinifera. (0.04 MB DOC) Click here for additional data file.(42K, doc) Table S3.: Resistance-related genes of V. vinifera. (0.10 MB DOC) Click here for additional data file.(986K, doc) Table S4. Gene family members involved in the core phenylpropanoid pathway, flavonoid and stilbene branches in V. vinifera. (0.10 MB DOC) Click here for additional data file.(110K, doc) Table S5. Putative genes encoding enzymes participating in the terpenoid pathway of V. vinifera. (0.15 MB DOC) Click here for additional data file.(156K, doc) Table S6. Transcription factors of V. vinifera. (2.20 MB DOC) Click here for additional data file.(2.1M, doc) Table S7. Repetitive elements in the assembled V. vinifera genome. (0.04 MB DOC) Click here for additional data file.(43K, doc) Table S8. Microsatellites identified in the assembled V. vinifera genome. (0.03 MB DOC) Click here for additional data file.(39K, doc) Table S9. Current state of IASMA database dedicated to V. vinifera mature miRNAs and miRNAs*, including the predicted fold-back structures of the pre-miRNAs. (0.09 MB DOC) Click here for additional data file.(99K, doc) Table S10. Number of identified tRNAs containing specified anticodons compared with the corresponding numbers of Arabidopsis. (0.07 MB DOC) Click here for additional data file.(68K, doc) Table S11. Number of genes identified in the V. vinifera genome for each of the nine families of snRNAs compared to Arabidopsis. (0.04 MB DOC) Click here for additional data file.(39K, doc) Table S12. Number of genes included in the different snoRNA families identified in the V. vinifera genome by searching against Arabidopsis snoRNAs [10]. (0.10 MB DOC) Click here for additional data file.(101K, doc) Acknowledgments Special thanks to: David Neale for critical reading of the manuscript, Jessica Zambanini, Monica Dallaserra, Alessandra Zatelli and Michelangelo Policarpo for the technical support. Footnotes Competing Interests: The authors have declared that no competing interests exist. Funding: This work has been funded by the Province of Trento, Italy. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. References 1. McGovern PE. 2003. Ancient Wine: The Search for the Origins of Viniculture Princeton University Press, Princeton. 2. Panagiotakos DB, Pitsavos C, Polychronopoulos E, Chrysohoou C, Zampelas A, et al. Can a Mediterranean diet moderate the development and clinical progression of coronary heart disease? A systematic review. Med Sci Monit. 2004;10:RA193–198. [PubMed] 3. Burns J, Gardner PT, O'Neil J, Crawford S, Morecroft I, et al. Relationship among Antioxidant Activity, Vasodilation Capacity, and Phenolic Content of Red Wines. J Agric Food Chem. 2000;48:220–230. [PubMed] 4. Levadoux L. Les populations sauvages et cultivées de Vitis vinifera L. Ann Amélior Plantes. 1956;6:59–117. 5. Olmo HP. Grapes. In: Simmonds NW, editor. Evolution of crop plants. London: 1979. 6. Lewis WH. Polyploidy: Biological Relevance. Plenum Press; 1979. Polyploidy in Angiosperm: dicotyledons. pp. 241–269. 7. Jaillon O, Aury JM, Noel B, Policriti A, Clepet C, et al. The grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phyla. Nature. 2007;449:463–467. [PubMed] 8. Wang Q, Li P, Hanania U, Sahar N, Mawassi M, et al. Improvement of Agrobacterium-mediated transformation efficiency and transgenic plant regeneration of Vitis vinifera L. by optimizing selection regimes and utilizing cryopreserved cell suspensions. Plant Science. 2005;168:565–571. 9. Kikkert JR, Striem MJ, Vidal JR, Wallace PG, Barnard J, et al. Long-term study of somatic embryogenesis from anthers and ovaries of 12 grapevine (vitis sp.) Genotypes. In Vitro Cellular and Developmental Biology-Plant. 2005;41:232–239. 10. Lodhi MA, Reisch BI. Nuclear DNA content of Vitis species, cultivars, and other genera of the Vitaceae. Theor Appl Genet. 1995;90:11–16. 11. Goff SA, Ricke D, Lan TH, Presting G, Wang R, et al. A draft sequence of the rice genome (Oryza sativa L. ssp. japonica). Science. 2002;296:92–100. [PubMed] 12. Tuskan GA, DiFazio S, Jansson S, Bohlmann J, Grigoriev I, et al. The Genome of Black Cottonwood, Populus trichocarpa (Torr. & Gray). Science. 2006;313:1596–1604. [PubMed] 13. Sanger F, Nicklen S, Coulson AR. DNA sequencing with chain-terminating inhibitors. Proc Natl Acad Sci U S A. 1977;74:5463–5467. [PubMed] 14. Margulies M, Egholm M, Altman WE, Attiya S, Bader JS, et al. Genome sequencing in microfabricated high-density picolitre reactors. Nature. 2005;437:376–380. [PubMed] 15. Troggio M, Malacarne G, Coppola G, Segala C, Cartwright DA, et al. A Dense Single-Nucleotide Polymorphism-Based Genetic Linkage Map of Grapevine (Vitis vinifera L.) Anchoring Pinot Noir Bacterial Artificial Chromosome Contigs. Genetics. 2007;176:2637–2650. [PubMed] 16. Jones JD, Dangl JL. The plant immune system. Nature. 2006;444:323–329. [PubMed] 17. Lipka V, Dittgen J, Bednarek P, Bhat R, Wiermer M, et al. Pre- and postinvasion defenses both contribute to nonhost resistance in Arabidopsis. Science. 2005;310:1180–1183. [PubMed] 18. Stein M, Dittgen J, Sanchez-Rodriguez C, Hou BH, Molina A, et al. Arabidopsis PEN3/PDR8, an ATP binding cassette transporter, contributes to nonhost resistance to inappropriate pathogens that enter by direct penetration. Plant Cell. 2006;18:731–746. [PubMed] 19. Dangl JL, Jones JD. Plant pathogens and integrated defence responses to infection. Nature. 2001;411:826–833. [PubMed] 20. Takken FL, Albrecht M, Tameling WI. Resistance proteins: molecular switches of plant defence. Curr Opin Plant Biol. 2006;9:383–390. [PubMed] 21. Meyers BC, Kaushik S, Nandety RS. Evolving disease resistance genes. Curr Opin Plant Biol. 2005;8:129–134. [PubMed] 22. Bai J, Pennill LA, Ning J, Lee SW, Ramalingam J, et al. Diversity in nucleotide binding site-leucine-rich repeat genes in cereals. Genome Res. 2002;12:1871–1884. [PubMed] 23. Grant M, Lamb C. Systemic immunity. Curr Opin Plant Biol. 2006;9:414–420. [PubMed] 24. van Loon LC, Rep M, Pieterse CM. Significance of inducible defense-related proteins in infected plants. Annu Rev Phytopathol. 2006;44:135–162. [PubMed] 25. Chen Z, Hartmann HA, Wu MJ, Friedman EJ, Chen JG, et al. Expression analysis of the AtMLO gene family encoding plant-specific seven-transmembrane domain proteins. Plant Mol Biol. 2006;60:583–597. [PubMed] 26. Richter H, Pezet R, Viret O, Gindro K. Characterization of 3 new partial stilbene synthase genes out of over 20 expressed in Vitis vinifera during the interaction with Plasmopara viticola . Physiol Mol Plant Pathol. 2006;67:248–260. 27. Akkurt M, Welter L, Maul E, Topfer R, Zyprian E. Development of SCAR markers linked to powdery mildew (Uncinula necator) resistance in grapevine (Vitis vinifera L. and Vitis sp.). Mol Breed. 2007;19:103–111. 28. Meyers BC, Shen KA, Rohani P, Gaut BS, Michelmore RW. Receptor-like genes in the major resistance locus of lettuce are subject to divergent selection. Plant Cell. 1998;10:1833–1846. [PubMed] 29. Mauricio R, Stahl EA, Korves T, Tian D, Kreitman M, et al. Natural selection for polymorphism in the disease resistance gene Rps2 of Arabidopsis thaliana. Genetics. 2003;163:735–746. [PubMed] 30. Stahl EA, Dwyer G, Mauricio R, Kreitman M, Bergelson J. Dynamics of disease resistance polymorphism at the Rpm1 locus of Arabidopsis. Nature. 1999;400:667–671. [PubMed] 31. Bakker EG, Toomajian C, Kreitman M, Bergelson J. A genome-wide survey of R gene polymorphisms in Arabidopsis. Plant Cell. 2006;18:1803–1818. [PubMed] 32. Shen J, Araki H, Chen L, Chen JQ, Tian D. Unique evolutionary mechanism in R-genes under the presence/absence polymorphism in Arabidopsis thaliana. Genetics. 2006;172:1243–1250. [PubMed] 33. Clark RM, Schweikert G, Toomajian C, Ossowski S, Zeller G, et al. Common sequence polymorphisms shaping genetic diversity in Arabidopsis thaliana. Science. 2007;317:338–342. [PubMed] 34. Di Gaspero G, Cipriani G, Adam-Blondon A-F, Testolin R. Linkage maps of grapevine displaying the chromosomal locations of 420 microsatellite markers and 82 markers for R -gene candidates. Theor Appl Genet. 2007;114:1249–1263. [PubMed] 35. Belhadj A, Saigne C, Telef N, Cluzet S, Bouscaut J, et al. Methyl Jasmonate Induces Defense Responses in Grapevine and Triggers Protection against Erysiphe necator. J Agric Food Chem. 2006;54:9119–9125. [PubMed] 36. Leister D. Tandem and segmental gene duplication and recombination in the evolution of plant disease resistance genes. Trends Genet. 2004;20:116–122. [PubMed] 37. Richly E, Kurth J, Leister D. Mode of amplification and reorganization of resistance genes during recent Arabidopsis thaliana evolution. Mol Biol Evol. 2002;19:76–84. [PubMed] 38. Fischer BM, Salakhutdinov I, Akkurt M, Eibach R, Edwards KJ, et al. Quantitative trait locus analysis of fungal disease resistance factors on a molecular map of grapevine. Theor Appl Genet. 2004;108:501–515. [PubMed] 39. Dalbó MA, Ye GN, Weeden NF, Wilcox WF, Reisch BI. Marker-assisted selection for powdery mildew resistance in grapes. J Am Soc Hortic. 2001;126:83–89. 40. Waterhouse AL. Wine phenolics. Ann N Y Acad Sci. 2002;957:21–36. [PubMed] 41. Winkel-Shirley B. Flavonoid biosynthesis. A colorful model for genetics, biochemistry, cell biology, and biotechnology. Plant Physiol. 2001;126:485–493. [PubMed] 42. Scalbert A, Manach C, Morand C, Remesy C, Jimenez L. Dietary Polyphenols and the Prevention of Diseases. CRC Crit Rev Food Sci Nutr. 2005;45:287–306. 43. Mattivi F, Zulian C, Nicolini G, Valenti L. Wine, biodiversity, technology, and antioxidants. Ann N Y Acad Sci. 2002;957:37–56. [PubMed] 44. Sparvoli F, Martin C, Scienza A, Gavazzi G, Tonelli C. Cloning and molecular analysis of structural genes involved in flavonoid and stilbene biosynthesis in grape (Vitis vinifera L.). Plant Mol Biol. 1994;24:743–755. [PubMed] 45. Bogs J, Downey MO, Harvey JS, Ashton AR, Tanner GJ, et al. Proanthocyanidin synthesis and expression of genes encoding leucoanthocyanidin reductase and anthocyanidin reductase in developing grape berries and grapevine leaves. Plant Physiol. 2005;139:652–663. [PubMed] 46. Bogs J, Ebadi A, McDavid D, Robinson SP. Identification of the flavonoid hydroxylases from grapevine and their regulation during fruit development. Plant Physiol. 2006;140:279–291. [PubMed] 47. Fujita A, Goto-Yamamoto N, Aramaki I, Hashizume K. Organ-specific transcription of putative flavonol synthase genes of grapevine and effects of plant hormones and shading on flavonol biosynthesis in grape berry skins. Biosci Biotechnol Biochem. 2006;70:632–638. [PubMed] 48. Tsai CJ, Harding SA, Tschaplinski TJ, Lindroth RL, Yuan Y. Genome-wide analysis of the structural genes regulating defense phenylpropanoid metabolism in Populus. New Phytol. 2006;172:47–62. [PubMed] 49. Schroder G, Brown JW, Schroder J. Molecular analysis of resveratrol synthase. cDNA, genomic clones and relationship with chalcone synthase. Eur J Biochem. 1988;172:161–169. [PubMed] 50. Jeandet P, Douillet-Breuil A-C, Bessis R, Debord S, Sbaghi M, et al. Phytoalexins from the Vitaceae: Biosynthesis, Phytoalexin Gene Expression in Transgenic Plants, Antifungal Activity, and Metabolism. J Agric Food Chem. 2002;50:2731–2741. [PubMed] 51. Baur JA, Pearson KJ, Price NL, Jamieson HA, Lerin C, et al. Resveratrol improves health and survival of mice on a high-calorie diet. Nature. 2006;444:337–342. [PubMed] 52. Hall D, De Luca V. Mesocarp localization of a bi-functional resveratrol/hydroxycinnamic acid glucosyltransferase of Concord grape (Vitis labrusca). Plant J. 2007;49:579–591. [PubMed] 53. Tholl D. Terpene synthases and the regulation, diversity and biological roles of terpene metabolism. Curr Opin Plant Biol. 2006;9:297–304. [PubMed] 54. Lichtenthaler HK, Rohmer M, Schwender J. Two independent biochemical pathways for isopentenyl diphosphate and isoprenoid biosynthesis in higher plants. Physiol Plant. 1997;101:643–652. 55. Luan F, Wust M. Differential incorporation of 1-deoxy-D-xylulose into (3S)-linalool and geraniol in grape berry exocarp and mesocarp. Phytochemistry. 2002;60:451–459. [PubMed] 56. Bohlmann J, Meyer-Gauen G, Croteau R. Plant terpenoid synthases: molecular biology and phylogenetic analysis. Proc Natl Acad Sci U S A. 1998;95:4126–4133. [PubMed] 57. McCaskill D, Croteau R. Some caveats for bioengineering terpenoid metabolism in plants. Trends Biotechnol. 1998;16:349–355. 58. Martin DM, Bohlmann J. Identification of Vitis vinifera (-)-alpha-terpineol synthase by in silico screening of full-length cDNA ESTs and functional characterization of recombinant terpene synthase. Phytochemistry. 2004;65:1223–1229. [PubMed] 59. Harborne JB. Ecological Chemistry and Biochemistry of Plant Terpenoids. Oxford, UK: 1991. Recent advances in the ecological chemistry of plant terpenoids. pp. 399–426. 60. Riechmann JL, Heard J, Martin G, Reuber L, Jiang C, et al. Arabidopsis transcription factors: genome-wide comparative analysis among eukaryotes. Science. 2000;290:2105–2110. [PubMed] 61. Yu J, Wang J, Lin W, Li S, Li H, et al. The Genomes of Oryza sativa: a history of duplications. PLoS Biol. 2005;3:e38. [PubMed] 62. Zhu QH, Guo AY, Gao G, Zhong YF, Xu M, et al. DPTF: a database of poplar transcription factors. Bioinformatics. 2007;23:1307–1308. [PubMed] 63. Riano-Pachon DM, Ruzicic S, Dreyer I, Mueller-Roeber B. PlnTFDB: an integrative plant transcription factor database. BMC Bioinformatics. 2007;8:42. [PubMed] 64. Shiu SH, Shih MC, Li WH. Transcription factor families have much higher expansion rates in plants than in animals. Plant Physiol. 2005;139:18–26. [PubMed] 65. Deluc L, Barrieu F, Marchive C, Lauvergeat V, Decendit A, et al. Characterization of a grapevine R2R3-MYB transcription factor that regulates the phenylpropanoid pathway. Plant Physiol. 2006;140:499–511. [PubMed] 66. Geekiyanage S, Takase T, Ogura Y, Kiyosue T. Anthocyanin production by over-expression of grape transcription factor gene VlmybA2 in transgenic tobacco and Arabidopsis. Plant Biotechnology Reports. 2007;1:11–18. 67. Bogs J, Jaffe FW, Takos AM, Walker AR, Robinson SP. The grapevine transcription factor VvMYBPA1 regulates proanthocyanidin synthesis during fruit development. Plant Physiol. 2007;143:1347–1361. [PubMed] 68. Aharoni A, De Vos CH, Wein M, Sun Z, Greco R, et al. The strawberry FaMYB1 transcription factor suppresses anthocyanin and flavonol accumulation in transgenic tobacco. Plant J. 2001;28:319–332. [PubMed] 69. Buttner M, Singh KB. Arabidopsis thaliana ethylene-responsive element binding protein (AtEBP), an ethylene-inducible, GCC box DNA-binding protein interacts with an ocs element binding protein. Proc Natl Acad Sci U S A. 1997;94:5961–5966. [PubMed] 70. Rodriguez-Uribe L, O'Connell MA. A root-specific bZIP transcription factor is responsive to water deficit stress in tepary bean (Phaseolus acutifolius) and common bean (P. vulgaris). J Exp Bot. 2006;57:1391–1398. [PubMed] 71. Alvarez-Buylla ER, Pelaz S, Liljegren SJ, Gold SE, Burgeff C, et al. An ancestral MADS-box gene duplication occurred before the divergence of plants and animals. Proc Natl Acad Sci U S A. 2000;97:5328–5333. [PubMed] 72. Ng M, Yanofsky MF. Function and evolution of the plant MADS-box gene family. Nat Rev Genet. 2001;2:186–195. [PubMed] 73. Vrebalov J, Ruezinsky D, Padmanabhan V, White R, Medrano D, et al. A MADS-box gene necessary for fruit ripening at the tomato ripening-inhibitor (rin) locus. Science. 2002;296:343–346. [PubMed] 74. Giovannoni JJ. Fruit ripening mutants yield insights into ripening control. Curr Opin Plant Biol. 2007;10:283–289. [PubMed] 75. Feschotte C, Jiang N, Wessler SR. Plant transposable elements: where genetics meets genomics. Nat Rev Genet. 2002;3:329–341. [PubMed] 76. Jones-Rhoades MW, Bartel DP, Bartel B. MicroRNAS and their regulatory roles in plants. Annu Rev Plant Biol. 2006;57:19–53. [PubMed] 77. Sunkar R, Chinnusamy V, Zhu J, Zhu JK. Small RNAs as big players in plant abiotic stress responses and nutrient deprivation. Trends Plant Sci. 2007;12:301–309. [PubMed] 78. Griffiths-Jones S. The microRNA Registry. Nucleic Acids Res. 2004;32:D109–111. [PubMed] 79. Rajagopalan R, Vaucheret H, Trejo J, Bartel DP. A diverse and evolutionarily fluid set of microRNAs in Arabidopsis thaliana. Genes Dev. 2006;20:3407–3425. [PubMed] 80. Fahlgren N, Howell MD, Kasschau KD, Chapman EJ, Sullivan CM, et al. High-throughput sequencing of Arabidopsis microRNAs: evidence for frequent birth and death of MIRNA genes. PLoS ONE. 2007;2:e219. [PubMed] 81. Axtell MJ, Jan C, Rajagopalan R, Bartel DP. A two-hit trigger for siRNA biogenesis in plants. Cell. 2006;127:565–577. [PubMed] 82. Lowe T, Eddy S. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 1997;25:955–964. [PubMed] 83. Wang BB, Brendel V. The ASRG database: identification and survey of Arabidopsis thaliana genes involved in pre-mRNA splicing. Genome Biol. 2004;5:R102. [PubMed] 84. Brown JW, Clark GP, Leader DJ, Simpson CG, Lowe T. Multiple snoRNA gene clusters from Arabidopsis. RNA. 2001;7:1817–1832. [PubMed] 85. Brunner S, Fengler K, Morgante M, Tingey S, Rafalski A. Evolution of DNA sequence nonhomologies among maize inbreds. Plant Cell. 2005;17:343–360. [PubMed] 86. Rafalski A. Applications of single nucleotide polymorphisms in crop genetics. Curr Opin Plant Biol. 2002;5:94–100. [PubMed] 87. This P, Lacombe T, Thomas MR. Historical origins and genetic diversity of wine grapes. Trends Genet. 2006;22:511–519. [PubMed] 88. Morgante M, Brunner S, Pea G, Fengler K, Zuccolo A, et al. Gene duplication and exon shuffling by helitron-like transposons generate intraspecies diversity in maize. Nat Genet. 2005;37:997–1002. [PubMed] 89. Chandler M, Mahillon J. Mobile DNA II. In: Craig NL, Craigie R, Gellert M, Lambowitz AM, editors. Insertion Sequences revisited. American Society for Microbiology. Washington D.C: 2002. pp. 305–366. 90. Sodergren E, Weinstock GM, Davidson EH, Cameron RA, Gibbs RA, et al. The genome of the sea urchin Strongylocentrotus purpuratus. Science. 2006;314:941–952. [PubMed] 91. Dehal P, Satou Y, Campbell RK, Chapman J, Degnan B, et al. The draft genome of Ciona intestinalis: insights into chordate and vertebrate origins. Science. 2002;298:2157–2167. [PubMed] 92. Fridman E, Pleban T, Zamir D. A recombination hotspot delimits a wild-species quantitative trait locus for tomato sugar content to 484 bp within an invertase gene. Proc Natl Acad Sci U S A. 2000;97:4718–4723. [PubMed] 93. Thornsberry JM, Goodman MM, Doebley J, Kresovich S, Nielsen D, et al. Dwarf8 polymorphisms associate with variation in flowering time. Nat Genet. 2001;28:286–289. [PubMed] 94. Lindblad-Toh K, Wade CM, Mikkelsen TS, Karlsson EK, Jaffe DB, et al. Genome sequence, comparative analysis and haplotype structure of the domestic dog. Nature. 2005;438:803–819. [PubMed] 95. Venter JC, Adams MD, Myers EW, Li PW, Mural RJ, et al. The sequence of the human genome. Science. 2001;291:1304–1351. [PubMed] 96. Holt RA, Subramanian GM, Halpern A, Sutton GG, Charlab R, et al. The genome sequence of the malaria mosquito Anopheles gambiae. Science. 2002;298:129–149. [PubMed] 97. Simillion C, Vandepoele K, Van Montagu MC, Zabeau M, Van de Peer Y. The hidden duplication past of Arabidopsis thaliana. Proc Natl Acad Sci U S A. 2002;99:13627–13632. [PubMed] 98. Bowers JE, Chapman BA, Rong J, Paterson AH. Unravelling angiosperm genome evolution by phylogenetic analysis of chromosomal duplication events. Nature. 2003;422:433–438. [PubMed] 99. De Bodt S, Maere S, Van de Peer Y. Genome duplication and the origin of angiosperms. Trends Ecol Evol. 2005;20:591–597. [PubMed] 100. Jansen RK, Kaittanis C, Saski C, Lee SB, Tomkins J, et al. Phylogenetic analyses of Vitis (Vitaceae) based on complete chloroplast genome sequences: effects of taxon sampling and phylogenetic methods on resolving relationships among rosids. BMC Evol Biol. 2006;6:32. [PubMed] 101. Korf I, Flicek P, Duan D, Brent MR. Integrating genomic homology into gene structure prediction. Bioinformatics. 2001;17:S140–148. [PubMed] 102. Simillion C, Vandepoele K, Saeys Y, Van de Peer Y. Building genomic profiles for uncovering segmental homology in the twilight zone. Genome Res. 2004;14:1095–1106. [PubMed] 103. Van de Peer Y, Taylor JS, Braasch I, Meyer A. The ghost of selection past: rates of evolution and functional divergence of anciently duplicated genes. J Mol Evol. 2001;53:436–446. [PubMed] 104. Byrne KP, Wolfe KH. Consistent patterns of rate asymmetry and gene loss indicate widespread neofunctionalization of yeast genes after whole-genome duplication. Genetics. 2007;175:1341–1350. [PubMed] 105. Maere S, De Bodt S, Raes J, Casneuf T, Van Montagu M, et al. Modeling gene and genome duplications in eukaryotes. Proc Natl Acad Sci U S A. 2005;102:5454–5459. [PubMed] 106. Grando MS, Bellin D, Edwards KJ, Pozzi C, Stefanini M, et al. Molecular linkage maps of Vitis vinifera L. and Vitis riparia Mchx. Theor Appl Genet. 2003;106:1213–1224. [PubMed] 107. Doucleff M, Jin Y, Gao F, Riaz S, Krivanek AF, et al. A genetic linkage map of grape, utilizing Vitis rupestris and Vitis arizonica. Theor Appl Genet. 2004;109:1178–1187. [PubMed] 108. Lowe KM, Walker MA. Genetic linkage map of the interspecific grape rootstock cross Ramsey (Vitis champinii) x Riparia Gloire (Vitis riparia). Theor Appl Genet. 2006;112:1582–1592. [PubMed] 109. Adam-Blondon AF, Bernole A, Faes G, Lamoureux D, Pateyron S, et al. Construction and characterization of BAC libraries from major grapevine cultivars. Theor Appl Genet. 2005;110:1363–1371. [PubMed] 110. Doyle JJ, Doyle JL. A rapid DNA isolation procedure for small quantities of fresh leaf tissue. Phytochemical Bulletin. 1987;19:11–15. 111. Tobler AR, Short S, Andersen MR, Paner TM, Briggs JC, et al. The SNPlex genotyping system: a flexible and scalable platform for SNP genotyping. J Biomol Tech. 2005;16:398–406. [PubMed] 112. Grattapaglia D, Sederoff R. Genetic linkage maps of Eucalyptus grandis and Eucalyptus urophylla using a pseudo-testcross: mapping strategy and RAPD markers. Genetics. 1994;137:1121–1137. [PubMed] 113. Cartwright DA, Troggio M, Velasco R, Gutin A. Genetic Mapping in the Presence of Genotyping Errors. Genetics. 2007;176:2521–2527. [PubMed] 114. Solovyev V, Kosarev P, Seledsov I, Vorobyev D. Automatic annotation of eukaryotic genes, pseudogenes and promoters. Genome Biol 7 Suppl. 2006;1:S10.1–12. 115. Majoros WH, Pertea M, Salzberg SL. TigrScan and GlimmerHMM: two open source ab initio eukaryotic gene-finders. Bioinformatics. 2004;20:2878–2879. [PubMed] 116. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–3402. [PubMed] 117. Huang X, Madan A. CAP3: A DNA Sequence Assembly Program. Genome Res. 1999;9:868–877. [PubMed] 118. Mulder NJ, Apweiler R, Attwood TK, Bairoch A, Bateman A, et al. New developments in the InterPro database. Nucleic Acids Res. 2007;35:D224–228. [PubMed] 119. Lord PW, Stevens RD, Brass A, Goble CA. Investigating semantic similarity measures across the Gene Ontology: the relationship between sequence and annotation. Bioinformatics. 2003;19:1275–1283. [PubMed] 120. Li WH, Gu Z, Wang H, Nekrutenko A. Evolutionary analyses of the human genome. Nature. 2001;409:847–849. [PubMed] 121. Felsenstein J. PHYLIP—Phylogeny Inference Package. Cladistics. 1989;5:164–166. 122. Finn RD, Mistry J, Schuster-Bockler B, Griffiths-Jones S, Hollich V, et al. Pfam: clans, web tools and services. Nucleic Acids Res. 2006;34:D247–251. [PubMed] 123. Henikoff S, Henikoff JG. Position-based sequence weights. J Mol Biol. 1994;243:574–578. [PubMed] 124. Thompson JD, Gibson TJ, Plewniak F, Jeanmougin F, Higgins DG. The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res. 1997;25:4876–4882. [PubMed] 125. Notredame C, Higgins DG, Heringa J. T-Coffee: A novel method for fast and accurate multiple sequence alignment. J Mol Biol. 2000;302:205–217. [PubMed] 126. Tamura K, Dudley J, Nei M, Kumar S. MEGA4: Molecular Evolutionary Genetics Analysis (MEGA) Software Version 4.0. Mol Biol Evol. 2007 127. Thompson JD, Higgins DG, Gibson TJ. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 1994;22:4673–4680. [PubMed] 128. Adam-Blondon AF, Roux C, Claux D, Butterlin G, Merdinoglu D, et al. Mapping 245 SSR markers on the Vitis vinifera genome: a tool for grape genetics. Theor Appl Genet. 2004;109:1017–1027. [PubMed] 129. Jones-Rhoades MW, Bartel DP. Computational identification of plant microRNAs and their targets, including a stress-induced miRNA. Mol Cell. 2004;14:787–799. [PubMed] |
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
|||||||||||||||||||||||||
Med Sci Monit. 2004 Aug; 10(8):RA193-8.
[Med Sci Monit. 2004]J Agric Food Chem. 2000 Feb; 48(2):220-30.
[J Agric Food Chem. 2000]Nature. 2007 Sep 27; 449(7161):463-7.
[Nature. 2007]Science. 2002 Apr 5; 296(5565):92-100.
[Science. 2002]Science. 2006 Sep 15; 313(5793):1596-604.
[Science. 2006]Proc Natl Acad Sci U S A. 1977 Dec; 74(12):5463-7.
[Proc Natl Acad Sci U S A. 1977]Nature. 2005 Sep 15; 437(7057):376-80.
[Nature. 2005]Science. 2002 Apr 5; 296(5565):92-100.
[Science. 2002]Genetics. 2007 Aug; 176(4):2637-50.
[Genetics. 2007]Genetics. 2007 Aug; 176(4):2521-7.
[Genetics. 2007]Genetics. 2007 Aug; 176(4):2637-50.
[Genetics. 2007]Genetics. 2007 Aug; 176(4):2637-50.
[Genetics. 2007]Theor Appl Genet. 2004 Sep; 109(5):1017-27.
[Theor Appl Genet. 2004]J Mol Biol. 1994 Nov 4; 243(4):574-8.
[J Mol Biol. 1994]Nature. 2006 Nov 16; 444(7117):323-9.
[Nature. 2006]Science. 2005 Nov 18; 310(5751):1180-3.
[Science. 2005]Plant Cell. 2006 Mar; 18(3):731-46.
[Plant Cell. 2006]Nature. 2001 Jun 14; 411(6839):826-33.
[Nature. 2001]Curr Opin Plant Biol. 2006 Aug; 9(4):383-90.
[Curr Opin Plant Biol. 2006]Curr Opin Plant Biol. 2005 Apr; 8(2):129-34.
[Curr Opin Plant Biol. 2005]Science. 2006 Sep 15; 313(5793):1596-604.
[Science. 2006]Nucleic Acids Res. 1997 Dec 15; 25(24):4876-82.
[Nucleic Acids Res. 1997]J Mol Biol. 2000 Sep 8; 302(1):205-17.
[J Mol Biol. 2000]Genetics. 2007 Aug; 176(4):2637-50.
[Genetics. 2007]Curr Opin Plant Biol. 2006 Aug; 9(4):414-20.
[Curr Opin Plant Biol. 2006]Annu Rev Phytopathol. 2006; 44():135-62.
[Annu Rev Phytopathol. 2006]Plant Mol Biol. 2006 Mar; 60(4):583-97.
[Plant Mol Biol. 2006]Science. 2005 Nov 18; 310(5751):1180-3.
[Science. 2005]Plant Cell. 2006 Mar; 18(3):731-46.
[Plant Cell. 2006]Plant Cell. 1998 Nov; 10(11):1833-46.
[Plant Cell. 1998]Genetics. 2003 Feb; 163(2):735-46.
[Genetics. 2003]Nature. 1999 Aug 12; 400(6745):667-71.
[Nature. 1999]Science. 2007 Jul 20; 317(5836):338-42.
[Science. 2007]Theor Appl Genet. 2007 May; 114(7):1249-63.
[Theor Appl Genet. 2007]Trends Genet. 2004 Mar; 20(3):116-22.
[Trends Genet. 2004]Mol Biol Evol. 2002 Jan; 19(1):76-84.
[Mol Biol Evol. 2002]Theor Appl Genet. 2004 Feb; 108(3):501-15.
[Theor Appl Genet. 2004]Ann N Y Acad Sci. 2002 May; 957():21-36.
[Ann N Y Acad Sci. 2002]Plant Physiol. 2001 Jun; 126(2):485-93.
[Plant Physiol. 2001]Ann N Y Acad Sci. 2002 May; 957():37-56.
[Ann N Y Acad Sci. 2002]Plant Mol Biol. 1994 Mar; 24(5):743-55.
[Plant Mol Biol. 1994]Biosci Biotechnol Biochem. 2006 Mar; 70(3):632-8.
[Biosci Biotechnol Biochem. 2006]Nucleic Acids Res. 1994 Nov 11; 22(22):4673-80.
[Nucleic Acids Res. 1994]New Phytol. 2006; 172(1):47-62.
[New Phytol. 2006]Plant Physiol. 2001 Jun; 126(2):485-93.
[Plant Physiol. 2001]Eur J Biochem. 1988 Feb 15; 172(1):161-9.
[Eur J Biochem. 1988]J Agric Food Chem. 2002 May 8; 50(10):2731-41.
[J Agric Food Chem. 2002]Nature. 2006 Nov 16; 444(7117):337-42.
[Nature. 2006]Nature. 2007 Sep 27; 449(7161):463-7.
[Nature. 2007]Plant J. 2007 Feb; 49(4):579-91.
[Plant J. 2007]Curr Opin Plant Biol. 2006 Jun; 9(3):297-304.
[Curr Opin Plant Biol. 2006]Phytochemistry. 2002 Jul; 60(5):451-9.
[Phytochemistry. 2002]Nature. 2006 Nov 16; 444(7117):337-42.
[Nature. 2006]Curr Opin Plant Biol. 2006 Jun; 9(3):297-304.
[Curr Opin Plant Biol. 2006]Proc Natl Acad Sci U S A. 1998 Apr 14; 95(8):4126-33.
[Proc Natl Acad Sci U S A. 1998]Science. 2006 Sep 15; 313(5793):1596-604.
[Science. 2006]Phytochemistry. 2004 May; 65(9):1223-9.
[Phytochemistry. 2004]Science. 2000 Dec 15; 290(5499):2105-10.
[Science. 2000]PLoS Biol. 2005 Feb; 3(2):e38.
[PLoS Biol. 2005]Bioinformatics. 2007 May 15; 23(10):1307-8.
[Bioinformatics. 2007]Genetics. 2007 Aug; 176(4):2637-50.
[Genetics. 2007]BMC Bioinformatics. 2007 Feb 7; 8():42.
[BMC Bioinformatics. 2007]Science. 2002 Apr 5; 296(5565):92-100.
[Science. 2002]Science. 2000 Dec 15; 290(5499):2105-10.
[Science. 2000]Science. 2002 Apr 5; 296(5565):92-100.
[Science. 2002]Plant Physiol. 2005 Sep; 139(1):18-26.
[Plant Physiol. 2005]Plant Physiol. 2006 Feb; 140(2):499-511.
[Plant Physiol. 2006]Plant Physiol. 2007 Mar; 143(3):1347-61.
[Plant Physiol. 2007]Plant J. 2001 Nov; 28(3):319-32.
[Plant J. 2001]Proc Natl Acad Sci U S A. 1997 May 27; 94(11):5961-6.
[Proc Natl Acad Sci U S A. 1997]J Exp Bot. 2006; 57(6):1391-8.
[J Exp Bot. 2006]Proc Natl Acad Sci U S A. 2000 May 9; 97(10):5328-33.
[Proc Natl Acad Sci U S A. 2000]Nat Rev Genet. 2001 Mar; 2(3):186-95.
[Nat Rev Genet. 2001]Science. 2002 Apr 12; 296(5566):343-6.
[Science. 2002]Curr Opin Plant Biol. 2007 Jun; 10(3):283-9.
[Curr Opin Plant Biol. 2007]Nat Rev Genet. 2002 May; 3(5):329-41.
[Nat Rev Genet. 2002]Science. 2006 Sep 15; 313(5793):1596-604.
[Science. 2006]Science. 2002 Apr 5; 296(5565):92-100.
[Science. 2002]Annu Rev Plant Biol. 2006; 57():19-53.
[Annu Rev Plant Biol. 2006]Trends Plant Sci. 2007 Jul; 12(7):301-9.
[Trends Plant Sci. 2007]Nucleic Acids Res. 2004 Jan 1; 32(Database issue):D109-11.
[Nucleic Acids Res. 2004]Mol Cell. 2004 Jun 18; 14(6):787-99.
[Mol Cell. 2004]Mol Cell. 2004 Jun 18; 14(6):787-99.
[Mol Cell. 2004]Nucleic Acids Res. 2004 Jan 1; 32(Database issue):D109-11.
[Nucleic Acids Res. 2004]Genes Dev. 2006 Dec 15; 20(24):3407-25.
[Genes Dev. 2006]PLoS One. 2007 Feb 14; 2(2):e219.
[PLoS One. 2007]Cell. 2006 Nov 3; 127(3):565-77.
[Cell. 2006]Nucleic Acids Res. 1997 Mar 1; 25(5):955-64.
[Nucleic Acids Res. 1997]Genome Biol. 2004; 5(12):R102.
[Genome Biol. 2004]RNA. 2001 Dec; 7(12):1817-32.
[RNA. 2001]Plant Cell. 2005 Feb; 17(2):343-60.
[Plant Cell. 2005]Curr Opin Plant Biol. 2002 Apr; 5(2):94-100.
[Curr Opin Plant Biol. 2002]Science. 2007 Jul 20; 317(5836):338-42.
[Science. 2007]Trends Genet. 2006 Sep; 22(9):511-9.
[Trends Genet. 2006]Nat Genet. 2005 Sep; 37(9):997-1002.
[Nat Genet. 2005]Genetics. 2007 Aug; 176(4):2637-50.
[Genetics. 2007]Bioinformatics. 2001; 17 Suppl 1():S140-8.
[Bioinformatics. 2001]Bioinformatics. 2004 Nov 1; 20(16):2878-9.
[Bioinformatics. 2004]Nucleic Acids Res. 1997 Sep 1; 25(17):3389-402.
[Nucleic Acids Res. 1997]Genome Res. 1999 Sep; 9(9):868-77.
[Genome Res. 1999]Curr Opin Plant Biol. 2002 Apr; 5(2):94-100.
[Curr Opin Plant Biol. 2002]Science. 2006 Nov 10; 314(5801):941-52.
[Science. 2006]Science. 2002 Dec 13; 298(5601):2157-67.
[Science. 2002]Proc Natl Acad Sci U S A. 2000 Apr 25; 97(9):4718-23.
[Proc Natl Acad Sci U S A. 2000]Nat Genet. 2001 Jul; 28(3):286-9.
[Nat Genet. 2001]Nature. 2005 Dec 8; 438(7069):803-19.
[Nature. 2005]Science. 2001 Feb 16; 291(5507):1304-51.
[Science. 2001]Nature. 2005 Dec 8; 438(7069):803-19.
[Nature. 2005]Science. 2002 Oct 4; 298(5591):129-49.
[Science. 2002]Science. 2006 Sep 15; 313(5793):1596-604.
[Science. 2006]Proc Natl Acad Sci U S A. 2002 Oct 15; 99(21):13627-32.
[Proc Natl Acad Sci U S A. 2002]Nature. 2003 Mar 27; 422(6930):433-8.
[Nature. 2003]Nature. 2007 Sep 27; 449(7161):463-7.
[Nature. 2007]Trends Ecol Evol. 2005 Nov; 20(11):591-7.
[Trends Ecol Evol. 2005]Genome Res. 2004 Jun; 14(6):1095-106.
[Genome Res. 2004]J Mol Evol. 2001 Oct-Nov; 53(4-5):436-46.
[J Mol Evol. 2001]Genetics. 2007 Mar; 175(3):1341-50.
[Genetics. 2007]Nature. 2007 Sep 27; 449(7161):463-7.
[Nature. 2007]Proc Natl Acad Sci U S A. 2002 Oct 15; 99(21):13627-32.
[Proc Natl Acad Sci U S A. 2002]Nature. 2007 Sep 27; 449(7161):463-7.
[Nature. 2007]Science. 2006 Sep 15; 313(5793):1596-604.
[Science. 2006]Nature. 2003 Mar 27; 422(6930):433-8.
[Nature. 2003]Proc Natl Acad Sci U S A. 2005 Apr 12; 102(15):5454-9.
[Proc Natl Acad Sci U S A. 2005]Nature. 2007 Sep 27; 449(7161):463-7.
[Nature. 2007]Theor Appl Genet. 2007 May; 114(7):1249-63.
[Theor Appl Genet. 2007]Theor Appl Genet. 2003 May; 106(7):1213-24.
[Theor Appl Genet. 2003]Theor Appl Genet. 2006 May; 112(8):1582-92.
[Theor Appl Genet. 2006]Theor Appl Genet. 2005 May; 110(8):1363-71.
[Theor Appl Genet. 2005]Nature. 2005 Sep 15; 437(7057):376-80.
[Nature. 2005]BMC Evol Biol. 2006 Apr 9; 6():32.
[BMC Evol Biol. 2006]J Biomol Tech. 2005 Dec; 16(4):398-406.
[J Biomol Tech. 2005]Genetics. 1994 Aug; 137(4):1121-37.
[Genetics. 1994]Genetics. 2007 Aug; 176(4):2521-7.
[Genetics. 2007]Bioinformatics. 2001; 17 Suppl 1():S140-8.
[Bioinformatics. 2001]Bioinformatics. 2004 Nov 1; 20(16):2878-9.
[Bioinformatics. 2004]Nature. 2005 Dec 8; 438(7069):803-19.
[Nature. 2005]Nucleic Acids Res. 1997 Sep 1; 25(17):3389-402.
[Nucleic Acids Res. 1997]Genome Res. 1999 Sep; 9(9):868-77.
[Genome Res. 1999]Nucleic Acids Res. 2007 Jan; 35(Database issue):D224-8.
[Nucleic Acids Res. 2007]Bioinformatics. 2003 Jul 1; 19(10):1275-83.
[Bioinformatics. 2003]Nature. 2001 Feb 15; 409(6822):847-9.
[Nature. 2001]Proc Natl Acad Sci U S A. 2005 Apr 12; 102(15):5454-9.
[Proc Natl Acad Sci U S A. 2005]Genome Res. 2004 Jun; 14(6):1095-106.
[Genome Res. 2004]BMC Bioinformatics. 2007 Feb 7; 8():42.
[BMC Bioinformatics. 2007]Nucleic Acids Res. 2006 Jan 1; 34(Database issue):D247-51.
[Nucleic Acids Res. 2006]Med Sci Monit. 2004 Aug; 10(8):RA193-8.
[Med Sci Monit. 2004]J Agric Food Chem. 2000 Feb; 48(2):220-30.
[J Agric Food Chem. 2000]