Introduction

Overview

GBIF provides access to extensive biodiversity data, primarily focused on species occurrences, including where and when species have been recorded. These records, gathered from collections, surveys, and citizen science efforts, contain species names, observation dates, locations, and other important details. In addition to traditional biodiversity records, GBIF also supports DNA-associated data, such as DNA metabarcoding datasets. A general guide – Publishing DNA-derived data through biodiversity data platforms – was released in 2021 to assist with these specific data types. However, consultations with the GBIF network and research community in 2022 and 2023 highlighted the need for user-friendly tools to facilitate the formatting and publishing of DNA metabarcoding data.

The Metabarcoding Data Toolkit (MDT) was developed to address this need, enabling users with basic knowledge of data standards, GBIF processes, and DNA metabarcoding to convert metabarcoding data – in the shape of the so-called OTU table commonly used in the metabarcoding community – into the standardised format suitable for GBIF.

Target audiences

Anyone interested in publishing DNA metabarcoding data to GBIF (or OBIS, etc.)

eDNA researchers or other data publishers
GBIF network people
Managers of a MDT installation

Tip	New users may benefit from starting with the two simple quick start guides.

Scope

What the Metabarcoding Data Toolkit can do:

Handle DNA metabarcoding datasets – specifically OTU tables and their associated metadata.
Accept dataset uploads in a few supported template formats.

What the MDT cannot do:

Process raw sequencing files (e.g., fastq).
Handle DNA metagenomic dataset (e.g., shotgun sequencing).
Support other DNA-associated biodiversity data types (e.g., specimen barcodes, qPCR).

Getting started

A suggested approach could be:

Watch this short video on DNA data + GBIF (originally made for the Technical support hour for GBIF nodes) and/or this quickly made ad.hoc. video with an example use of the MDT.
Work through the [simple_quick_start] and the [advanced_quick_start].
…maybe supplement with processing of one of the other [example_data].
Select a dataset template and maybe have a look at the example datasets.
Adapt your own dataset to the template (see [preparation_structure]).
Process the data in the MDT.
Refer to the [faq], the [glossary], and the general DNA-publishing guide where needed.

Essential terms

There is a full glossary at the end of the document. The list here contains some essential terms for quick reference to new users. Terms link to the glossary.

[asv]
[dwc-standard]
[dwc-a]
[dwc-term]
DNA-derived data (extension)
[endpoint]
[occurrence]
[occurrence_core]
[otu]
[otu-table]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

introduction.en.adoc

introduction.en.adoc

Introduction

Overview

Target audiences

Scope

Getting started

Essential terms

Files

introduction.en.adoc

Latest commit

History

introduction.en.adoc

File metadata and controls

Introduction

Overview

Target audiences

Scope

Getting started

Essential terms