BOLD Identification Engine

Search Handbook

BOLD Identification Engine

The library of sequences collected in BOLD is available for facilitating identification of unknown sequences. The BOLD Identification Engine uses all sequences uploaded to BOLD from public and private projects to locate the closest match. To ensure data security, sequences from private records are never exposed.

Batch Identifications

BOLD now provides the ability to submit a batch of query sequences for identification. This service is available for up to 100 sequences at a time for users signed into the system.

Email Results

Users can email the identification results so that identification requests may be run in parallel. The new option is next to the Submit button. Upon submitting the ID Engine request, the system will provide you with an estimated run-time.

Animal Identification (COI)

The BOLD ID Engine accepts sequences from the 5’ region of the mitochondrial gene COI and returns a species-level identification (when possible). BOLD uses the BLAST algorithm to identify single base indels before aligning the protein translation through profile to a Hidden Markov Model of the COI protein. There are four types of databases that can be used to identify COI sequences. The BOLD ID Engine provides historical copies of the COI databases dating back to 2009 for use in replicating results from previous years. The Full-Length COI database is designed for use with short query sequences as it provides maximum overlap in the barcode region of COI.

Fungal (ITS) and Plant (rbcL & matK) Identification

In the BOLD ID Engine, ITS is the default identification tool for fungal barcodes and rbcL and matK are the defaults for plant barcodes. Both return a species-level identification (when possible). The BLAST algorithm is employed in place of BOLD’s internal identification engine for these sequences. The number of fungal and plant sequences in BOLD is relatively limited compared to the number of animal sequences and thus a successful species match may not be possible. As new sequences are added to the database, the number of successful matches should improve. These databases include many species represented by only one or two specimens, as well as all species with interim taxonomy. Both searches will return a list of the nearest matches but do not provide a probability of placement to a taxon.

Descriptions of the 6 types of identification databases on BOLD
Database Name	Description	Database Size
All Barcode Records	Every COI sequence on BOLD >500bp	>1,390,000 sequences
Species Barcode Records	Every COI sequence >500bp with species level identification	>1,150,000 sequences
Public Barcode Records	Every public COI sequence >500bp	>270,000 sequences
Full-Length Barcode Records	Every COI sequence on BOLD >640bp	>950,000 sequences
Fungal Records	Every ITS sequence on BOLD >100bp	>15,000 sequences
Plant Records	Every rbcL and matK sequence on BOLD >500bp	>95,000 & >70,000 sequences respectively

The results page for a typical animal sequence identification is illustrated below. For each sequence queried, a overview is provided describing the best match, links to both the taxonomic page and the BIN cluster for the match, as well as a Taxon ID Tree placing the query sequence in among 100 of the closest matches. The top matches listed in the table provide links to the public record where available. A map is provided displaying the collection location of all the public records in the top 100 matches. For a batch of sequences queried, each result page is accessible via the accordion tabs in the page.

Id Engine Results Identification Engine results page for batch identification

tag_new
tag_publicdata
tag_search
tag_specimen
tag_sequence
tag_bin
tag_analysis
tag_taxonomy
tag_map

Handbook

Search Handbook

BOLD Identification Engine

Batch Identifications

Email Results

Animal Identification (COI)

Fungal (ITS) and Plant (rbcL & matK) Identification

Databases

Resources

Organization

Community

Partners