Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No tasks made on compute expansions #93

Open
jthorton opened this issue Jan 27, 2021 · 0 comments
Open

No tasks made on compute expansions #93

jthorton opened this issue Jan 27, 2021 · 0 comments

Comments

@jthorton
Copy link
Contributor

PR #80 introduced threading to submissions by chunking up the total list of newly added entries and passing them to each thread. However, if we do a compute expansion where no new entries are added but we just want to create more tasks for a new specification, tasks will not be made as the entries list will be empty. To fix this we should get the list of entries from the dataset, this will be fine for now but might cause issues in future where we want to add compute to a subset of a dataset.

It might be better to create a pathway that better handles compute expansions explicitly and give users more options, for example, we could create a new ComputeExpansionschema which can take some QCSpecification and a dataset and can add new tasks, we could also add subset support which is not currently available. Here users could add a list of entries or a list of molecules for which they want to add compute. We could also support some of the filters built into qcsubmit and only add tasks for molecules which pass the filter. This would help in cases where we add ani2x support we can filter out molecules not covered by the model. I imagine it to look something like this

from openff.qcsubmit.compute_expansion import OptimizationExpansion, expand_compute
from openff.qcsubmit.common_structures import QCSpec
from openff.qcsubmit.procedures import GeometricProcedure
from openff.qcsubmit.workflow_components import ElementFilter

from openff.toolkit.topology import Molecule

# build the qcspec we want to add for ani2x
ani2x = QCSpec(method="ani2x", basis=None, program="torchani", spec_name="ani2x", spec_description="testing ani2x")
# try out the dlc coord system in geometric
geo = GeometricProcedure(coordsys="dlc")

# build the expansion schema
opt_expand = OptimizationExpansion(qc_specifications=ani2x, optimization_procedure=geo)

# add new tasks for all molecules in a dataset pulled from fractalclient
expand_compute(dataset=dataset, compute_schema=opt_expand)

# add compute to only a few molecules
target_mols = [Molecule.from_smiles("CC"), "CCO"] # support smiles or openff molecules
opt_expand.target_molecules = target_mols
expand_compute(dataset=dataset, compute_schema=opt_expand)

# add compute to only molecules which pass a filter
el_filter = ElementFilter(allowed_elements=["H", "C", "N", "O", "F", "S", "Cl"])
opt_expand.filters = el_filter
expand_compute(dataset=dataset, compute_schema=opt_expand)

this way each compute expansion would be its own schema which would allow users to submit multiple new specs for different subsets of the molecules. Maybe we would also want to the dataset name into the schema for provenance so users know what dataset it was applied to. The expand_compute method should also return a list of all of the indices which have had new tasks created.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant