Send to

Choose Destination
Ecol Appl. 2019 Jul;29(5):e01914. doi: 10.1002/eap.1914. Epub 2019 Jun 12.

Categorization of species as native or nonnative using DNA sequence signatures without a complete reference library.

Author information

Department of Environmental Science Policy and Management, University of California Berkeley, 130 Mulford Hall, Berkeley, California, 94720-3114, USA.
Essig Museum of Entomology, University of California Berkeley, Berkeley, California, 94720, USA.
Gump South Pacific Research Station, University of California Berkeley, Maharepa, Moorea, French Polynesia.
Biométrie et Biologie Évolutive, UMR CNRS, 69622, Villeurbanne, France.
Komohana Research and Extension Center, University of Hawai'i at Mānoa, Hilo, Hawaii, 96720, USA.
Smithsonian Institution, Washington, D.C., 20013, USA.
Department of Biogeography, Universität Trier, Trier, Germany.
Department of Integrated Biology, University of California Berkeley, 3040 Valley Life Sciences Building, Berkeley, California, 94720, USA.
Faculty of Agriculture and Marine Science, Kochi University, Kochi, Japan.
9 Quartier de la Glacière, 29900, Concarneau, France.


New genetic diagnostic approaches have greatly aided efforts to document global biodiversity and improve biosecurity. This is especially true for organismal groups in which species diversity has been underestimated historically due to difficulties associated with sampling, the lack of clear morphological characteristics, and/or limited availability of taxonomic expertise. Among these methods, DNA sequence barcoding (also known as "DNA barcoding") and by extension, meta-barcoding for biological communities, has emerged as one of the most frequently utilized methods for DNA-based species identifications. Unfortunately, the use of DNA barcoding is limited by the availability of complete reference libraries (i.e., a collection of DNA sequences from morphologically identified species), and by the fact that the vast majority of species do not have sequences present in reference databases. Such conditions are critical especially in tropical locations that are simultaneously biodiversity rich and suffer from a lack of exploration and DNA characterization by trained taxonomic specialists. To facilitate efforts to document biodiversity in regions lacking complete reference libraries, we developed a novel statistical approach that categorizes unidentified species as being either likely native or likely nonnative based solely on measures of nucleotide diversity. We demonstrate the utility of this approach by categorizing a large sample of specimens of terrestrial insects and spiders (collected as part of the Moorea BioCode project) using a generalized linear mixed model (GLMM). Using a training data set of known endemic (n = 45) and known introduced species (n = 102), we then estimated the likely native/nonnative status for 4,663 specimens representing an estimated 1,288 species (412 identified species), including both those specimens that were either unidentified or whose endemic/introduced status was uncertain. Using this approach, we were able to increase the number of categorized specimens by a factor of 4.4 (from 794 to 3,497), and the number of categorized species by a factor of 4.8 from (147 to 707) at a rate much greater than chance (77.6% accuracy). The study identifies phylogenetic signatures of both native and nonnative species and suggests several practical applications for this approach including monitoring biodiversity and facilitating biosecurity.


DNA barcoding; Moorea BioCode; alien invasive species; biomonitoring; biosecurity; community barcoding; metabarcoding

[Indexed for MEDLINE]

Supplemental Content

Full text links

Icon for Wiley
Loading ...
Support Center