Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Infrastructure: Communicating optional depdendencies #4

Open
mattwthompson opened this issue May 26, 2021 · 2 comments
Open

Infrastructure: Communicating optional depdendencies #4

mattwthompson opened this issue May 26, 2021 · 2 comments

Comments

@mattwthompson
Copy link
Member

Copying what I wrote several months ago in Confluence (which largely still holds true):

We should establish some guidelines for defining what are required and optional dependencies. The two major users of any of our software products are scientists who want to use our software to do science and CI bots that run test suites. Their needs clash - bots need to install everything to run the full test suites, but scientists may only use a small portion of the codebase to accomplish their tasks. It is expensive (in computer and human time) to just list everything as required dependencies for all users since that bloats conda environments (at present and over time) and increases the likelihood of dependency issues as upstream maintainers break API and/or abandon projects. The maintenance burden can be slightly reduced with fewer required dependencies, as it allows fewer problems when building new releases.

For each package, we should aim to define a core set of use cases that must be supported “out of the box”. This helps clarify which dependencies qualify as required and, by deduction, which qualify as optional dependencies. For example, the OpenFF Toolkit needs OpenMM to export a topology and force field to a simulation; that should definitely be a required dependency (until a possible future date in which there are alternatives to consider). But functionality like molecule visualization (NGLview) and QCArchive interoperability use extra dependencies (nglview, qcelemental, etc.) and aren’t as likely to be used by most users, so could be moved to optional dependencies.

There could be reasons to have more that two lists of dependencies for each package. Thinking about them as concentric circles, maybe some package has a “core” set of dependencies that are always needed, another circle out from that which includes the core but also other dependencies that are not strictly necessary but commonly used, and then another bigger circle that encompasses everything. It’s simplest to think about it as two lists/circles, but can take other shapes.

Another gray area that’s unclear to me is what examples should be run-able “out of the box,” i.e. with nothing but a conda one-liner. It may be appropriate to deal with this at the level of each example; if we can keep the required dependencies of a package light, some examples may have as their first cell “run these conda commands to install these other packages”

@mattwthompson
Copy link
Member Author

As an example of implementations that attempt to solve these issues, the toolkit's feedstock has been split into separate recipes (ignoring OpenEye, since it it not distribute on conda-forge and therefore cannot be part of the requirements):

  • openff-toolkit: Covers enough dependencies to use most - but not all - of the API.
  • openff-toolkit-base: Installs the same as above, but without RDKit or AmberTools, therefore missing out on most of the important functionality in the API, i.e. Molecule.from_smiles, Molecule.generate_conformers, ForceField.create_openmm_system, and many others. (OpenMM should eventually be stripped out in this package, but the toolkit currently has SimTK units too deeply interwoven to make this an optional dependency.)
  • openff-toolkit-examples (not merged yet): Covers enough to run all examples. (Unclear to me how well this matches up with the entirety of the API, but it should be pretty close).

A benefit to doing this in packaging is that is directly specifies what users install based on which "kind" of the toolkit they want. The major downside, in my opinion, is that it only serves as implicit documentation, i.e. somebody not familiar enough with conda-forge infrastructure would have a hard time figuring out of package X is required or optional.

@davidlmobley
Copy link

OK, this is great. Are you asking for help identifying the core cases to be supported? I wonder if a poll of our team or something similar is an efficient way to get the feedback you need.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants