OnSIDES

Just looking for the data? Click on the "Releases" tab on the right.

Introduction

OnSIDES is an international, comprehensive database of drugs and their adverse events using data from drug product labels. Information was extracted by fine-tuning a PubMedBERT language model on 200 manually curated labels available from Denmer-Fushman et al.. This comprehensive database will be updated quarterly, and currently contains more than 7.1 million drug-ADE pairs for 4,097 drug ingredients extracted from 51,460 labels, processed from all of the labels available to download from DailyMed (USA), EMA (EU), EMC (UK), and KEGG (Japan) as of April 2025.

Citation

If you use the OnSIDES database, results, or methods in your work, please reference our paper:

Tanaka Y, Chen HY, Belloni P, Gisladottir U, Kefeli J, Patterson J, Srinivasan A, Zietz M, Sirdeshmukh G, Berkowitz J, LaRow Brown K, Tatonetti NP. OnSIDES database: Extracting adverse drug events from drug labels using natural language processing models. Med. 2025 Mar 27:100642. doi: 10.1016/j.medj.2025.100642. PMID: 40179876.

Downloading the data

OnSIDES is released as a set of flat files (comma-separated) and as an interactive dataset at onsidesdb.org. Downloads are available in the Releases section.

Loading into a database

We provide utilities to simplify loading OnSIDES into a relational database of your choice (e.g. MySQL, PostgreSQL, etc.). We have pre-built database schema files for MySQL, PostgreSQL, and SQLite, along with example bash scripts to create a database and load the files in the proper order (MySQL, PostgreSQL, and SQLite). These example scripts use Podman to run the databases in containers, but they can be easily adapted to work with existing databases. We also provide OnSIDES as a single, pre-built SQLite database, which you could import into a database with specialized tools like pgloader or sqlite3-to-mysql.

Database Design and Organization

The OnSIDES database has the following seven tables:

product_label - Individual drug products
product_adverse_effect - Extracted adverse effects for drug products
product_to_rxnorm - Mapping from drug products to RxNorm products (many-to-many)
vocab_rxnorm_product - RxNorm products
vocab_rxnorm_ingredient_to_product - Mapping from RxNorm products to RxNorm ingredients (many-to-many)
vocab_rxnorm_ingredient - RxNorm ingredients
vocab_meddra_adverse_effect - MedDRA adverse effect terms

Tables starting with product_ contain data from drug labels, while vocab_ tables contain vocabulary mapping information from the UMLS Metathesaurus and the OMOP vocabularies (from Athena).

Here's a diagram of the database schema:

Usage

For some example queries, view the documentation.

High confidence sets

You may be interested in especially high confidence drug-effect relationships. In this case, we recommend using the intersection of the adverse drug effects that are extracted from difference sources. For example, a drug-effect relationship reported on both the US and UK labels for an ingredient would be higher confidence than a drug-effect relationshipi reported on on the US labels.

The highest confidence would be the intersection of all four sources (US, UK, EU, and Japan). Note, however, that the overlap in this case is relatively small. We found 663 ADEs across 71 ingredients and 180 effects fit this criteria. Below is some SQL to create a table from which any of the intersections can be queried.

Since this table is derived, it has not been included in the database schemas. A suitable definition (SQLite here) would be:

CREATE TABLE source_intersection AS
SELECT
pi.ingredient_id,
a.effect_meddra_id,
count(distinct source) as num_sources,
max((source = 'US')::int) as US,
max((source = 'EU')::int) as EU,
max((source = 'UK')::int) as UK,
max((source = 'JP')::int) as JP
FROM
product_label l
INNER JOIN product_to_rxnorm lp USING (label_id)
INNER JOIN vocab_rxnorm_product p ON lp.rxnorm_product_id = p.rxnorm_id
INNER JOIN vocab_rxnorm_ingredient_to_product pi ON p.rxnorm_id = pi.product_id
INNER JOIN product_adverse_effect a ON l.label_id = a.product_label_id
group by pi.ingredient_id, a.effect_meddra_id;

After creating the table, you can run a query to check the counts of advese drug effects by which sources you are intersecting with:

SELECT 'US-EU' AS source_combo, COUNT(*) as num_ades, count(distinct ingredient_id) as num_ingredients, count(distinct effect_meddra_id) as num_effects FROM source_intersection WHERE US = 1 AND EU = 1
UNION ALL
SELECT 'US-UK', COUNT(*), count(distinct ingredient_id), count(distinct effect_meddra_id) FROM source_intersection WHERE US = 1 AND UK = 1
UNION ALL
SELECT 'US-JP', COUNT(*), count(distinct ingredient_id), count(distinct effect_meddra_id) FROM source_intersection WHERE US = 1 AND JP = 1
UNION ALL
SELECT 'EU-UK', COUNT(*), count(distinct ingredient_id), count(distinct effect_meddra_id) FROM source_intersection WHERE EU = 1 AND UK = 1
UNION ALL
SELECT 'EU-JP', COUNT(*), count(distinct ingredient_id), count(distinct effect_meddra_id) FROM source_intersection WHERE EU = 1 AND JP = 1
UNION ALL
SELECT 'UK-JP', COUNT(*), count(distinct ingredient_id), count(distinct effect_meddra_id) FROM source_intersection WHERE UK = 1 AND JP = 1
UNION ALL
SELECT 'US-EU-UK', COUNT(*), count(distinct ingredient_id), count(distinct effect_meddra_id) FROM source_intersection WHERE US = 1 AND EU = 1 AND UK = 1
UNION ALL
SELECT 'US-EU-JP', COUNT(*), count(distinct ingredient_id), count(distinct effect_meddra_id) FROM source_intersection WHERE US = 1 AND EU = 1 AND JP = 1
UNION ALL
SELECT 'US-UK-JP', COUNT(*), count(distinct ingredient_id), count(distinct effect_meddra_id) FROM source_intersection WHERE US = 1 AND UK = 1 AND JP = 1
UNION ALL
SELECT 'EU-UK-JP', COUNT(*), count(distinct ingredient_id), count(distinct effect_meddra_id) FROM source_intersection WHERE EU = 1 AND UK = 1 AND JP = 1
UNION ALL
SELECT 'US-EU-UK-JP', COUNT(*), count(distinct ingredient_id), count(distinct effect_meddra_id) FROM source_intersection WHERE US = 1 AND EU = 1 AND UK = 1 AND JP = 1;

In OnSIDES version 3.1.1 this would yield the following table:

Source Combination	Num Ingredient-Effect Pairs	Num Ingredients	Num Effects
US-EU-UK-JP	663	71	180
US-EU-UK	3489	206	591
US-EU-JP	1047	85	261
US-UK-JP	1516	197	284
EU-UK-JP	1285	74	355
US-EU	11199	501	1143
US-UK	10141	620	1045
US-JP	4240	297	557
EU-UK	9541	267	1571
EU-JP	2270	87	569
UK-JP	3462	223	591

Manual annotations

As part of the v3.0.0 release and on, we include a set of manual annotations for download as well. For more information, see the annotation description.

Data Generation and Metrics

OnSIDES is generated through the following steps:

Find and download drug labels for all human prescription drugs from four sources (DailyMed, EMA, EMC, KEGG).
Parse raw formats (e.g. PDF, XML), extract relevant sections, and format all label texts.
Identify string matches of MedDRA terms in the extracted label texts (note, MedDRA Japan for Japanese labels).
(Non-Japan): Apply PubMedBERT to score each matched term, and apply a cutoff threshold on the predictions.
Map all drug products to RxNorm.
Gather extracted data into a database (products, adverse effects identified). Combine with product-to-ingredient information from RxNorm and MedDRA concept mappings.
Validate the database, and export tables as flat files.

Model accuracy

The fine-tuned PubMedBERT model used for OnSIDES achieved the following performance on 200 manually annotated FDA labels:

Section	F1	Precision	Recall	AUROC	N
Boxed Warning	0.964	0.971	0.957	0.977	686
Warnings and Precautions	0.882	0.883	0.881	0.933	10254
Adverse Reactions	0.935	0.946	0.924	0.956	12122

In a separate comparison, this modeling approach was compared to the TAC 2017 dataset, and achieved the following performance:

Performance metrics evaluated against the TAC gold standard

Metric	TAC (Best Model^†)	SIDER 4.1	OnSIDES v1.0.0	OnSIDES v2/3.0.0
F1 Score	82.19	74.36	82.01	87.54
Precision	80.69	43.49	88.76	91.29
Recall	85.05	52.89	77.12	84.08

Developer Documentation

Dependencies

Install nix.

If you prefer not to use nix, you'll need to install all the dependencies yourself, including Python (I recommend uv), Java, tabula, pandoc, and DuckDB, then create the virtual environment for Python (e.g. uv sync).

Building

nix develop
snakemake -s snakemake/us/download/Snakefile --resources jobs=1
snakemake -s snakemake/uk/download/Snakefile --resources jobs=1
snakemake -s snakemake/eu/download/Snakefile --resources jobs=1 --keep-going # Ignore errors
snakemake -s snakemake/jp/download/Snakefile --resources jobs=1
snakemake -s snakemake/us/parse/Snakefile
snakemake -s snakemake/uk/parse/Snakefile
snakemake -s snakemake/eu/parse/Snakefile
snakemake -s snakemake/jp/parse/Snakefile
snakemake -s snakemake/onsides/evaluate/Snakefile
snakemake -s snakemake/onsides/export/Snakefile

If you see red, there are errors. You will need to either fix or, if they're download errors, just re-run the Snakefile until it passes. This is particularly relevant for downloading from the EU!

This project uses Snakemake to reproducibly download and build the database. Snakemake is a workflow management tool that ensures work is not unnecessarily duplicated. Various steps are stored as Snakefiles, in the snakemake directory.

By default, the first rule in a file is run. Each file's first rule is a catch-all rule called all. To run a Snakefile, ensure it's installed (e.g. nix develop or install manually), then run e.g. snakemake -s snakemake/us/download/Snakefile. Adding -n makes it a dry run, meaning it tells you what it would do, without actually doing it. Because Snakefiles can overwrite things, I recommend an initial dry run before running any other command, just to verify that you want it to do is what it will do.

The snakemake directory is organized as follows, with each sub-directory having a Snakefile and any additional scripts that are needed. You should run these Snakefiles from the project root directory.

snakemake
├── eu
│   ├── download
│   └── parse
├── jp
│   ├── download
│   └── parse
├── onsides
│   ├── evaluate
│   └── export
├── uk
│   ├── download
│   └── parse
└── us
    ├── download
    └── parse

To run everything, you'll want to download each source, then parse, then evaluate (onsides/evaluate), then export (onsides/export).

To be transparent, the reason I didn't package everything into a single command is because these scripts will inevitably become stale in various ways (source URLs change, etc.), and this makes debugging significantly easier for the inheritors of this project.

Releasing

Once everything above has been run, create the final release ZIP archive. Here's how the v3.1.0 archive was created.

build-zip --version v3.1.0

Limitations

OnSIDES is strictly intended for academic research purposes. The adverse drug event term extraction method is far from perfect - some side effects will be missed and some predicted as true adverse events will be incorrect.

Patients/healthcare professionals seeking health information should not trust or use this data, and instead refer to the information available from their regions' respective drug regulatory agencies, such as the FDA (USA), EMA (EU), MHRA (UK), PMDA (Japan) and consult their healthcare providers for information.

Additionally, this project is under active development. We are continuing to further conduct independent validation of the performance of the models used, and improve the extraction methodology. As such, the data, methods, and statistics are subject to change at any time. Any updates to the database will be reflected on this page/in this repository.

Contact

If you would like to contribute to this project or have any suggestions on how the methods, data, or evaluation can be improved please reach out to Dr. Tatonetti via email or Twitter.

Name		Name	Last commit message	Last commit date
Latest commit History 82 Commits
analyses		analyses
database		database
docs		docs
snakemake		snakemake
src/onsides		src/onsides
train		train
.gitignore		.gitignore
.python-version		.python-version
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
flake.lock		flake.lock
flake.nix		flake.nix
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

OnSIDES

Table of Contents

Introduction

Citation

Downloading the data

Loading into a database

Database Design and Organization

Usage

High confidence sets

Manual annotations

Data Generation and Metrics

Model accuracy

Developer Documentation

Dependencies

Building

Releasing

Limitations

Contact

About

Uh oh!

Releases 13

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

tatonetti-lab/onsides

Folders and files

Latest commit

History

Repository files navigation

OnSIDES

Table of Contents

Introduction

Citation

Downloading the data

Loading into a database

Database Design and Organization

Usage

High confidence sets

Manual annotations

Data Generation and Metrics

Model accuracy

Developer Documentation

Dependencies

Building

Releasing

Limitations

Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 13

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages