Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About the mission and structure of alchemtest #56

Open
xiki-tempula opened this issue Aug 28, 2021 · 4 comments
Open

About the mission and structure of alchemtest #56

xiki-tempula opened this issue Aug 28, 2021 · 4 comments

Comments

@xiki-tempula
Copy link
Collaborator

xiki-tempula commented Aug 28, 2021

From a user perspective, I would think that the mission of alchemtest would be to provide some "real" dataset that the user could use alchemlyb to analyse and see what kind of data they will generate and what kind of result they will get.

From the developer perspective, some datasets are required to test edge cases, such as the dataset with restart #55.

However, these datasets for testing edge cases might not be very useful for normal users and their existence might dissuade the user from the dataset that they might be more interested in.

So I think there could be two ways of doing it.

-We could have two sections in the doc, where the dataset that the user might be interested in are in one section and the dataset that the developer uploads to bump the test should be in another section.

-Or we could move the test dataset that the developer are interested in to alchemlyb?

Opinions are welcomed. (Or should I move it to the alchemlyb discussion?)

@orbeckst
Copy link
Member

orbeckst commented Oct 6, 2021

At the moment the docs are API docs. If we had a more narrative part then we could highlight the datasets for users (e.g., for teaching the use of alchemlyb).

Problem with real datasets is that they can be big. In MDAnalysisData we solve the problem by not hosting the data sets and instead delegating to archive-grade repositories. I think that's the right approach for any real data. We could, in principle, use the MDAnalysisData approach here, too.

@richardjgowers
Copy link

I'd also support a shift to the pattern of MDAnalysisData; where this package is just knowledgeable about where to download data sets from.

@orbeckst
Copy link
Member

For running the test suite for alchemlyb I'd want to keep the test data bundled to avoid further slow-down by having to download them every time in CI.

I think the question is if alchemtests should also cater to the "teaching alchemlyb" angle. If so, we could add MDAnalysisData-style code (we can actually copy it from MDAnalysisData because it's all BSD-3). Or we create a new package alchemdata (or whatever works) and then have a clean separation between use cases.

@orbeckst
Copy link
Member

To bump this old issue: I'd be happy to also have alchemtest serve as a teaching tool that gives access to bigger datasets using accessor functions similar to what scikit-learn and MDAnalysisData have. I'd just want to ensure that the data that are needed for running the alchemlyb tests remains part of the repository itself.

If anyone wants to get started adding tooling (and tests) for external datasets then please do and submit a PR!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants