Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support skipping deduplication with dataset factories #179

Open
chapincavender opened this issue Nov 24, 2021 · 0 comments
Open

Support skipping deduplication with dataset factories #179

chapincavender opened this issue Nov 24, 2021 · 0 comments

Comments

@chapincavender
Copy link
Collaborator

It would be helpful to provide an option to skip deduplication in the create_dataset() method of dataset factories. Use cases include datasets containing multiple instances of the same molecule with different constraints or different initial conformers applied to each instance.

The constructor for the ComponentResult class already has a flag skip_unique_check here that skips the initial deduplication. Propagating that flag to the create_dataset() methods and passing it to calls of the ComponentResult constructor would partially support this feature.

Currently, dataset factories hash molecules using their InChI key, so an alternative hash would need to be implemented to support this feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant