Skip to content

Conversation

satvshr
Copy link
Collaborator

@satvshr satvshr commented Aug 25, 2025

closes #141

This PR:

  • Fixes a small bug in AptaNetPipeline
  • Makes AptaNetPipeline inherit from BaseObject to prevent errors during benchmarking
  • A csv loader
  • Removes an unnecessary test (test_pfoa), the loader is already being tested in test_loaders
  • The benchmarking framework

y : array-like, optional
Target vector. Used together with `X` if explicit train/test splits
are not provided.
train_X : array-like, optional
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are too many options that could all be handled via cv. From X onwards, I would remove everything except X, y, and cv - the additional parameters imo do not add to convenience or clarity.

Returns
-------
pd.DataFrame
Results table with rows = (estimator, metric),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please describe this better

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added an example output

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please describe this better, say precisely which rows and columns there are and what the entries are

- target_name: str (the name of the target column)
- filename: str (resolved path to the CSV file)
If `return_X_y` is True, returns (X, y) where:
- X : ndarray of shape (n_samples, n_features) built by dropping the target
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these should be pd.DataFrame and not np.ndarray to comply with the design in #106

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sklearn outputs np arrays and not dfs, should we stick to their standard?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed on discord.

@satvshr satvshr requested a review from fkiraly September 21, 2025 13:28
@fkiraly
Copy link
Contributor

fkiraly commented Sep 25, 2025

can you kindly summarize how requests were addressed and what the changes since last review are?

@satvshr
Copy link
Collaborator Author

satvshr commented Sep 26, 2025

The changes made were the ones requested, so:

  1. The class now only take 3 arguments, X, y, and cv.
  2. The csv loader returns dataframes instead of numpy arrays and bunches now.
  3. Added an example to show what output the run method yields.
  4. Added tests.
  5. Removed task_check and metaclass.

Copy link
Contributor

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great!

May I request to add a short notebook with a small benchmarking experiment and some reasonably chosen dataset?
I think that is important to to understand your design as well as for review.

Copy link
Contributor

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you want to do this in a different PR or this one? There are pros/cons for either option. In this one, it might allow to spot bugs which would otherwise need to be fixed in additional PR.

@satvshr
Copy link
Collaborator Author

satvshr commented Sep 29, 2025

In this one, it might allow to spot bugs which would otherwise need to be fixed in additional PR.

I would rather do it in a separate issue (#164), I can add bug fixing as part of the notebook PR as well.

@satvshr satvshr requested a review from fkiraly September 29, 2025 20:59
@satvshr satvshr changed the title [ENH] Benchmarking framework [ENH] Benchmarking framework and csv loader Sep 30, 2025
from pyaptamer.datasets._loaders import load_pfoa_structure


def test_pfoa_loader():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we deleting this file?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in the description of the PR

@fkiraly
Copy link
Contributor

fkiraly commented Sep 30, 2025

could you kindly make sure you write a good PR description in the first post?

@satvshr satvshr requested a review from fkiraly September 30, 2025 10:13
@satvshr
Copy link
Collaborator Author

satvshr commented Sep 30, 2025

could you kindly make sure you write a good PR description in the first post?

Done.

Copy link
Contributor

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are making changes to AptaNetPipeline - is this required for benchmarking? Would this not interact with other PRs, e.g., #153?

I would recommend to make this a separate PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[ENH] Benchmarking framework
3 participants