Dataset of Interacting Compound-Target Pairs in ChEMBL

Introduction

This code extracts a dataset of compound-target pairs from the open-source bioactivity database ChEMBL [Zdrazil2023].

The compound-target pairs are known to interact because

they have at least one corresponding measured activity value in ChEMBL or
they are part of a set of manually curated known interactions in ChEMBL.

Furthermore, the dataset contains a number of compound and target annotations to enable future analyses.

Previously, a similar dataset has been curated manually and has been used to investigate target-based differences in drug-like properties and ligand efficiencies [Leeson2021]. This code can generate an extended version of the previous dataset for every ChEMBL version from ChEMBL 26 onwards.

[Zdrazil2023]: Zdrazil et al., "The ChEMBL Database in 2023: a drug discovery platform spanning multiple bioactivity data types and time periods", Nucleic Acids Research, gkad1004, 2023, https://doi.org/10.1093/nar/gkad1004

[Leeson2021]: Leeson et al., "Target-Based Evaluation of “Drug-Like” Properties and Ligand Efficiencies", Journal of Medicinal Chemistry, 64(11), 7210-7230, 2021, https://doi.org/10.1021/acs.jmedchem.1c00416

Dataset

The dataset for different ChEMBL versions from ChEMBL 26 onwards is available here.

Quick Start

Dependencies

Install the required dependencies with

pip install .

Note: Using Pandas version 2.2 will lead to warnings regarding the RDKit PandasTools when running the code. However, the final dataset is not impacted.

Generating the Dataset

The default version of the dataset (the full dataset as a CSV file based on the newest ChEMBL version) can be generated by calling

python main.py -o <output_path>

An overview of the available arguments to modify the output is available by calling

python main.py --help

Documentation

The full documentation is available here.

The corresponding article is available here.

Name		Name	Last commit message	Last commit date
Latest commit History 206 Commits
.github/workflows		.github/workflows
docs		docs
src		src
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Dataset of Interacting Compound-Target Pairs in ChEMBL

Introduction

Dataset

Quick Start

Dependencies

Generating the Dataset

Documentation

About

Uh oh!

Releases 1

Packages

Uh oh!

Languages

License

chembl/compound_target_pairs_dataset

Folders and files

Latest commit

History

Repository files navigation

Dataset of Interacting Compound-Target Pairs in ChEMBL

Introduction

Dataset

Quick Start

Dependencies

Generating the Dataset

Documentation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Languages

Packages