NCBI RefSeq Targeted Loci Project

Targeted loci are specific molecular markers such as protein coding or ribosomal RNA loci (16S rDNA, 18S rDNA (SSU), 28S rDNA (LSU) gene and internal transcribed spacer (ITS)) that are used for phylogenetic and barcoding analysis.

SCOPE

Targeted loci currently include genic and spacer regions of the nuclear ribosomal cistron. The scope includes curated RefSeq records (NCBI RefSeq Targeted Loci projects) and selected validated GenBank sequences for curated BLAST databases. RefSeq records are available for Archaea, Bacteria and Fungi which are accessible via Entrez query and BLAST search interfaces. Selected validated GenBank sequences are accessible via BLAST search interfaces and are available for Animals, Plants and Protists.

BLAST DATABASES

REFSEQ records (Archaea, Bacteria and Fungi)

REFSEQ BLAST Search against curated RefSeq records from ribosomal RNA loci. Select "Sequences from type material" to limit your search to type only. See RefSeq project descriptions below for more curation detail of the source.

MOLE-BLAST A tool that helps users find closest database neighbors of submitted query sequences by generating a phylogenetic tree from BLAST results.

SELECTED GENBANK sequences (Animals, Plants and Protists)

SELECTED GENBANK BLAST Search against selected GenBank sequences from ribosomal RNA loci. The validation procedure of 18S and 28S rDNA sequences included the ribodbmaker pipeline (part of the ribovore package), available at https://github.com/nawrockie/ribovore). Ribodbmaker compares sequences against rRNA Rfam models (RF01960 (SSU) and RF02543 (LSU)) to: validate eukaryotic origin and rRNA continuity; identify potentially misassembled sequences; verify unexpectedly divergent sequences relative to other eukaryotic sequences of its rank. The pipeline also removes sequences with: too many ambiguous nucleotides, vector subsequences recognized by VecScreen, and repeated subsequences that are indicative of missassembly. Verification of the ITS region in ITS sequences included the ITSx program available at https://microbiology.se/software/itsx/. Additionally sequences were checked for too many ambiguous nucleotides and vector subsequences recognized by VecScreen.

REFSEQ TARGETED LOCI PROJECTS

Archaea FTP: ftp://ftp.ncbi.nlm.nih.gov/refseq/TargetedLoci/Archaea/

Bacteria FTP: ftp://ftp.ncbi.nlm.nih.gov/refseq/TargetedLoci/Bacteria/

Fungi FTP: ftp://ftp.ncbi.nlm.nih.gov/refseq/TargetedLoci/Fungi/

Bacteria and Archaea: 16S ribosomal RNA project

The small subunit ribosomal RNA is a useful phylogenetic marker that has been used extensively for evolutionary analyses. The RefSeq dataset contains curated 16S ribosomal RNA sequences that correspond to bacteria and archaea type materials. The RefSeq records may contain corrections to the sequence or taxonomy as compared to the original INSD submission, and may have additional information added that is not found in the original. See more details on curation process.
16S RefSeq Nucleotide sequence records

Bacteria and Archaea: 23S ribosomal RNA project

The 23S ribosomal RNA is the largest rRNA in the microbial ribosome (~2800 nt) and is part of the 50S ribosomal subunit. The 23S rRNA is involved in the formation of peptide bonds during protein synthesis. Antibiotics which act by inhibiting translation, such as chloramphenicol, often bind to this region of the 23S rRNA. The RefSeq 23S ribosomal RNA project represents a collection of curated complete and near full length Reference Sequence records for Archaea and Bacteria. This set is used in Prokaryotic Genome Annotation Pipeline
23S Refseq Nucleotide sequence records

Fungi: ITS project

The Fungi RefSeq ITS project contains curated and re-annotated records (ITS RefSeq Nucleotide sequence records) of sequences from the ITS region in the nuclear ribosomal cistron, sourced from INSD records. The RefSeq records may contain edits to the sequence or taxonomy as compared to the original INSD submission and may have additional information added that is not found in the original. The ITS region includes ITS1, 5.8S gene and ITS2 and is near full length to complete. Sequences are mostly derived from type material and RefSeq records contain the public collection identifiers of these specimens. Since the ITS region is the official barcode for Fungi it is typically most useful for identification at the species level. The project started with an international collaboration of taxonomy experts in the mycological research community with NCBI. See more details on curation process.

Fungi: 28S ribosomal RNA project

The 28S ribosomal RNA targeted loci project is a RefSeq curated data set sourced from INSD records and at a minimum the sequences contain the hyper variable D1/D2 region as determined by the ribodbmaker pipeline.The RefSeq records may contain edits to the taxonomy as compared to the original INSD submission and may have additional information added that is not found in the original. LSU RefSeq records (28S ribosomal RNA Nucleotide sequence records) include sequences mostly obtained from type specimens and collection identifiers from public collections with codes curated at NCBI's Biocollection database. LSU rDNA is more conserved than the ITS region and widely used for phylogenetic analyses but also species identification in faster evolving clades.

Fungi: 18S ribosomal RNA project

The 18S ribosomal RNA targeted loci project is a RefSeq curated data set sourced from INSD records and at a minimum the sequences contain most of the variable V4 region and part of the V5 region as determined by the ribodbmaker pipeline. The RefSeq records may contain edits to the taxonomy as compared to the original INSD submission and may have additional information added that is not found in the original. SSU RefSeq records (18S ribosomal RNA Nucleotide sequence records) include sequences mostly obtained from type specimens and collection identifiers from public collections with codes curated at NCBI's Biocollection database. SSU rDNA is the most conserved region in the ribsomal cistron and widely used for phylogenetic analyses.

Last updated: 2019-10-22T15:38:58Z