-
Notifications
You must be signed in to change notification settings - Fork 2
[ENH] Benchmarking framework and csv loader #114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…on tests and bug fixing
pyaptamer/benchmarking/_base.py
Outdated
y : array-like, optional | ||
Target vector. Used together with `X` if explicit train/test splits | ||
are not provided. | ||
train_X : array-like, optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there are too many options that could all be handled via cv
. From X
onwards, I would remove everything except X
, y
, and cv
- the additional parameters imo do not add to convenience or clarity.
pyaptamer/benchmarking/_base.py
Outdated
Returns | ||
------- | ||
pd.DataFrame | ||
Results table with rows = (estimator, metric), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please describe this better
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added an example output
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please describe this better, say precisely which rows and columns there are and what the entries are
- target_name: str (the name of the target column) | ||
- filename: str (resolved path to the CSV file) | ||
If `return_X_y` is True, returns (X, y) where: | ||
- X : ndarray of shape (n_samples, n_features) built by dropping the target |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these should be pd.DataFrame
and not np.ndarray
to comply with the design in #106
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sklearn outputs np arrays and not dfs, should we stick to their standard?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed on discord.
can you kindly summarize how requests were addressed and what the changes since last review are? |
The changes made were the ones requested, so:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great!
May I request to add a short notebook with a small benchmarking experiment and some reasonably chosen dataset?
I think that is important to to understand your design as well as for review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you want to do this in a different PR or this one? There are pros/cons for either option. In this one, it might allow to spot bugs which would otherwise need to be fixed in additional PR.
I would rather do it in a separate issue (#164), I can add bug fixing as part of the notebook PR as well. |
from pyaptamer.datasets._loaders import load_pfoa_structure | ||
|
||
|
||
def test_pfoa_loader(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are we deleting this file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added in the description of the PR
could you kindly make sure you write a good PR description in the first post? |
Done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are making changes to AptaNetPipeline
- is this required for benchmarking? Would this not interact with other PRs, e.g., #153?
I would recommend to make this a separate PR.
closes #141
This PR:
AptaNetPipeline
AptaNetPipeline
inherit fromBaseObject
to prevent errors during benchmarkingtest_pfoa
), the loader is already being tested intest_loaders