scikit-mol

Scikit-Learn classes for molecular vectorization using RDKit

The intended usage is to be able to add molecular vectorization directly into scikit-learn pipelines, so that the final model directly predict on RDKit molecules or SMILES strings

As example with the needed scikit-learn and -mol imports and RDKit mol objects in the mol_list_train and _test lists:

pipe = Pipeline([('mol_transformer', MorganFingerprintTransformer()), ('Regressor', Ridge())])
pipe.fit(mol_list_train, y_train)
pipe.score(mol_list_test, y_test)
pipe.predict([Chem.MolFromSmiles('c1ccccc1C(=O)C')])

>>> array([4.93858815])

The scikit-learn compatibility should also make it easier to include the fingerprinting step in hyperparameter tuning with scikit-learns utilities

The first draft for the project was created at the RDKIT UGM 2022 hackathon 2022-October-14

Installation

Users can install latest tagged release from pip

pip install scikit-mol

or from conda-forge

conda install -c conda-forge scikit-mol

The conda forge package should get updated shortly after a new tagged release on pypi.

Bleeding edge

pip install git+https://github.com:EBjerrum/scikit-mol.git

Documentation

Example notebooks and API documentation are now hosted on https://scikit-mol.readthedocs.io

We also put a software note on ChemRxiv. https://doi.org/10.26434/chemrxiv-2023-fzqwd

Other use-examples

Scikit-Mol has been featured in blog-posts or used in research, some examples which are listed below:

Roadmap and Contributing

Help wanted! Are you a PhD student that want a "side-quest" to procrastinate your thesis writing or are you simply interested in computational chemistry, cheminformatics or simply with an interest in QSAR modelling, Python Programming open-source software? Do you want to learn more about machine learning with Scikit-Learn? Or do you use scikit-mol for your current work and would like to pay a little back to the project and see it improved as well? With a little bit of help, this project can be improved much faster! Reach to me (Esben), for a discussion about how we can proceed.

Currently we are working on fixing some deprecation warnings, its not the most exciting work, but it's important to maintain a little. Later on we need to go over the scikit-learn compatibility and update to some of their newer features on their estimator classes. We're also brewing on some feature enhancements and tests, such as new fingerprints and a more versatile standardizer.

There are more information about how to contribute to the project in CONTRIBUTING

BUGS

Probably still, please check issues at GitHub and report there

Contributers:

Scikit-Mol has been developed as a community effort with contributions from people from many different companies, consortia, foundations and academic institutions.

Cheminformania Consulting, Aptuit, BASF, Bayer AG, Boehringer Ingelheim, Chodera Lab (MSKCC), EPAM Systems,ETH Zürich, Evotec, Johannes Gutenberg University, Martin Luther University, Odyssey Therapeutics, Open Molecular Software Foundation, Openfree.energy, Polish Academy of Sciences, Productivista, Simulations-Plus Inc., University of Vienna

Esben Jannik Bjerrum @ebjerrum, [email protected]
Carmen Esposito @cespos
Son Ha, [email protected]
Oh-hyeon Choung, [email protected]
Andreas Poehlmann, @ap--
Ya Chen, @anya-chen
Anton Siomchen @asiomchen
Rafał Bachorz @rafalbachorz
Adrien Chaton @adrienchaton
@VincentAlexanderScholz
@RiesBen
@enricogandini
@mikemhenry
@c-feldmann

Name		Name	Last commit message	Last commit date
Latest commit History 363 Commits
.github		.github
.vscode		.vscode
docs		docs
ressources/logo		ressources/logo
scikit_mol		scikit_mol
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.readthedocs.yaml		.readthedocs.yaml
CITATION.bib		CITATION.bib
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml
ruff.toml		ruff.toml
setup.cfg		setup.cfg
uv.lock		uv.lock
uv.toml		uv.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

scikit-mol

Scikit-Learn classes for molecular vectorization using RDKit

Installation

Documentation

Other use-examples

Roadmap and Contributing

BUGS

Contributers:

About

Releases

Packages

Languages

License

asiomchen/scikit-mol

Folders and files

Latest commit

History

Repository files navigation

scikit-mol

Scikit-Learn classes for molecular vectorization using RDKit

Installation

Documentation

Other use-examples

Roadmap and Contributing

BUGS

Contributers:

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages