GBIF provides access to extensive biodiversity data, primarily focused on species occurrences, including where and when species have been recorded. These records, gathered from collections, surveys, and citizen science efforts, contain species names, observation dates, locations, and other important details. In addition to traditional biodiversity records, GBIF also supports DNA-associated data, such as DNA metabarcoding datasets. A general guide – Publishing DNA-derived data through biodiversity data platforms – was released in 2021 to assist with these specific data types. However, consultations with the GBIF network and research community in 2022 and 2023 highlighted the need for user-friendly tools to facilitate the formatting and publishing of DNA metabarcoding data.
The Metabarcoding Data Toolkit (MDT) was developed to address this need, enabling users with basic knowledge of data standards, GBIF processes, and DNA metabarcoding to convert metabarcoding data – in the shape of the so-called OTU table commonly used in the metabarcoding community – into the standardised format suitable for GBIF.
Anyone interested in publishing DNA metabarcoding data to GBIF (or OBIS, etc.)
-
eDNA researchers or other data publishers
-
GBIF network people
-
Managers of a MDT installation
Tip
|
New users may benefit from starting with the two simple quick start guides. |
What the Metabarcoding Data Toolkit can do:
-
Handle DNA metabarcoding datasets – specifically OTU tables and their associated metadata.
-
Accept dataset uploads in a few supported template formats.
What the MDT cannot do:
-
Process raw sequencing files (e.g., fastq).
-
Handle DNA metagenomic dataset (e.g., shotgun sequencing).
-
Support other DNA-associated biodiversity data types (e.g., specimen barcodes, qPCR).
A suggested approach could be:
-
Watch this short video on DNA data + GBIF (originally made for the Technical support hour for GBIF nodes) and/or this quickly made ad.hoc. video with an example use of the MDT.
-
Work through the [simple_quick_start] and the [advanced_quick_start].
-
…maybe supplement with processing of one of the other [example_data].
-
Select a dataset template and maybe have a look at the example datasets.
-
Adapt your own dataset to the template (see [preparation_structure]).
-
Process the data in the MDT.
-
Refer to the [faq], the [glossary], and the general DNA-publishing guide where needed.
There is a full glossary at the end of the document. The list here contains some essential terms for quick reference to new users. Terms link to the glossary.