Send to

Choose Destination
Mol Ecol Resour. 2018 May;18(3):602-619. doi: 10.1111/1755-0998.12756. Epub 2018 Feb 12.

Draft genome and reference transcriptomic resources for the urticating pine defoliator Thaumetopoea pityocampa (Lepidoptera: Notodontidae).

Author information

CBGP, INRA, CIRAD, IRD, Montpellier SupAgro, Univ Montpellier, Montpellier, France.
INRA-CNRGV, Castanet Tolosan Cedex, France.
INRA, US 1426, GeT-PlaGe, Genotoul, INRA Auzeville, Castanet Tolosan Cedex, France.
Forest Research Center (CEF), Instituto Superior de Agronomia (ISA), University of Lisbon (ULisboa), Lisboa, Portugal.
INRA, UMR Institut de Génétique, Environnement et Protection des Plantes (IGEPP), BioInformatics Platform for Agroecosystems Arthropods (BIPAA), Rennes, France.
INRIA, IRISA, GenOuest Core Facility, Rennes, France.
BIOGECO, INRA, Univ. Bordeaux, Cestas, France.
Plateforme MGX-Montpellier GenomiX, c/o Institut de Génomique Fonctionnelle IGF-sud, UMR 5203 CNRS-U 661 INSERM-Université de Montpellier, Montpellier Cedex 05, France.
CBGP, IRD, CIRAD, INRA, Montpellier SupAgro, Univ Montpellier, Montpellier, France.
Edinburgh Genomics, Ashworth Laboratories, The University of Edinburgh, Edinburgh, UK.


The pine processionary moth Thaumetopoea pityocampa (Lepidoptera: Notodontidae) is the main pine defoliator in the Mediterranean region. Its urticating larvae cause severe human and animal health concerns in the invaded areas. This species shows a high phenotypic variability for various traits, such as phenology, fecundity and tolerance to extreme temperatures. This study presents the construction and analysis of extensive genomic and transcriptomic resources, which are an obligate prerequisite to understand their underlying genetic architecture. Using a well-studied population from Portugal with peculiar phenological characteristics, the karyotype was first determined and a first draft genome of 537 Mb total length was assembled into 68,292 scaffolds (N50 = 164 kb). From this genome assembly, 29,415 coding genes were predicted. To circumvent some limitations for fine-scale physical mapping of genomic regions of interest, a 3X coverage BAC library was also developed. In particular, 11 BACs from this library were individually sequenced to assess the assembly quality. Additionally, de novo transcriptomic resources were generated from various developmental stages sequenced with HiSeq and MiSeq Illumina technologies. The reads were de novo assembled into 62,376 and 63,175 transcripts, respectively. Then, a robust subset of the genome-predicted coding genes, the de novo transcriptome assemblies and previously published 454/Sanger data were clustered to obtain a high-quality and comprehensive reference transcriptome consisting of 29,701 bona fide unigenes. These sequences covered 99% of the cegma and 88% of the busco highly conserved eukaryotic genes and 84% of the busco arthropod gene set. Moreover, 90% of these transcripts could be localized on the draft genome. The described information is available via a genome annotation portal (


BAC library; Lepidoptera; de novo assembly; gene prediction; genome; transcriptome

[Indexed for MEDLINE]

Supplemental Content

Full text links

Icon for Wiley
Loading ...
Support Center