[ENH] Benchmarking framework and csv loader #114

satvshr · 2025-08-25T08:04:47Z

closes #141

This PR:

Fixes a small bug in AptaNetPipeline
Makes AptaNetPipeline inherit from BaseObject to prevent errors during benchmarking
A csv loader
Removes an unnecessary test (test_pfoa), the loader is already being tested in test_loaders
The benchmarking framework

…on tests and bug fixing

fkiraly · 2025-09-17T19:17:23Z

pyaptamer/benchmarking/_base.py

+    y : array-like, optional
+        Target vector. Used together with `X` if explicit train/test splits
+        are not provided.
+    train_X : array-like, optional


I think there are too many options that could all be handled via cv. From X onwards, I would remove everything except X, y, and cv - the additional parameters imo do not add to convenience or clarity.

fkiraly · 2025-09-17T19:17:58Z

pyaptamer/benchmarking/_base.py

+        Returns
+        -------
+        pd.DataFrame
+            Results table with rows = (estimator, metric),


please describe this better

Added an example output

please describe this better, say precisely which rows and columns there are and what the entries are

fkiraly · 2025-09-17T19:18:55Z

pyaptamer/datasets/_loaders/_csv_loader.py

+            - target_name: str (the name of the target column)
+            - filename: str (resolved path to the CSV file)
+        If `return_X_y` is True, returns (X, y) where:
+            - X : ndarray of shape (n_samples, n_features) built by dropping the target


I think these should be pd.DataFrame and not np.ndarray to comply with the design in #106

sklearn outputs np arrays and not dfs, should we stick to their standard?

As discussed on discord.

fkiraly · 2025-09-25T07:34:00Z

can you kindly summarize how requests were addressed and what the changes since last review are?

satvshr · 2025-09-26T05:40:26Z

The changes made were the ones requested, so:

The class now only take 3 arguments, X, y, and cv.
The csv loader returns dataframes instead of numpy arrays and bunches now.
Added an example to show what output the run method yields.
Added tests.
Removed task_check and metaclass.

fkiraly

Great!

May I request to add a short notebook with a small benchmarking experiment and some reasonably chosen dataset?
I think that is important to to understand your design as well as for review.

fkiraly

do you want to do this in a different PR or this one? There are pros/cons for either option. In this one, it might allow to spot bugs which would otherwise need to be fixed in additional PR.

satvshr · 2025-09-29T20:58:19Z

In this one, it might allow to spot bugs which would otherwise need to be fixed in additional PR.

I would rather do it in a separate issue (#164), I can add bug fixing as part of the notebook PR as well.

fkiraly · 2025-09-30T07:56:25Z

pyaptamer/datasets/tests/test_pfoa.py

-from pyaptamer.datasets._loaders import load_pfoa_structure
-
-
-def test_pfoa_loader():


why are we deleting this file?

Added in the description of the PR

fkiraly · 2025-09-30T07:57:08Z

could you kindly make sure you write a good PR description in the first post?

satvshr · 2025-09-30T10:13:27Z

could you kindly make sure you write a good PR description in the first post?

Done.

fkiraly

We are making changes to AptaNetPipeline - is this required for benchmarking? Would this not interact with other PRs, e.g., #153?

I would recommend to make this a separate PR.

satvshr added 30 commits July 7, 2025 13:21

Added the pseaac encoding algorithm

e37135c

Added Aptanet implementation

6ea5ff7

Made pseaac to a class and made the functions private, still working …

a5f01e0

…on tests and bug fixing

Made a few readability changes

3773a90

Edited tests

9b9a3da

Added pytest to tests

2dfe0c7

Added numpy style docstrings and ruff formatting

1e182d3

Added docstrings, made functions pvt and made code more clean

20d7e37

Removed AptaNet from root

fc2f051

Added example

62f6c42

Removed AptaNet from root

848fc9b

Made requested changes

1515efe

Merge branch 'main' into issue28

75d4efb

Made requested changes and updated tests

733f908

Made suggested changes

04ab599

Removed lint. from pyproject, will push it as a separate PR

dc78e44

Refactored code

c347988

Added pandas as a dependancy

d9537f4

Renamed parent folder name to put it in the same level as AptaNet

1c46c55

Merge remote-tracking branch 'origin/main' into issue13

a716872

Refactored code and made architecture flexible

7781441

Edited docstrings and directory structure

e762cc8

Merge branch 'main' into issue28

e844d4f

weird rename experiment

f9392ef

weird rename experiment pt. 2

beb45ec

Made requested changes

d603d07

Made requested changes

6ecf576

Made requested changes

b91c511

chore: dummy commit to retrigger CI

b2428b0

Added missing init file to utils

2982954

fkiraly reviewed Sep 17, 2025

View reviewed changes

satvshr added 5 commits September 21, 2025 15:53

Update _base.py

2c06084

Update _base.py

7027d47

cleaning code remove tag checks

ec96bba

Update _base.py

d6111a9

Test suite added and bugs fixed

2e2d71f

satvshr requested a review from fkiraly September 21, 2025 13:28

satvshr added 3 commits September 21, 2025 19:12

arg name fixing

90af1ee

Update _csv_loader.py

971ae29

Update test_csv_loader.py

fe19150

Merge branch 'main' into issue109

bb80a81

fkiraly requested changes Sep 29, 2025

View reviewed changes

fixed docstring and example

18833f0

satvshr mentioned this pull request Sep 29, 2025

[ENH] Notebook for Benchmarking #164

Open

fkiraly reviewed Sep 29, 2025

View reviewed changes

satvshr requested a review from fkiraly September 29, 2025 20:59

satvshr mentioned this pull request Sep 30, 2025

[ENH] Notebook for Benchmarking #165

Open

satvshr changed the title ~~[ENH] Benchmarking framework~~ [ENH] Benchmarking framework and csv loader Sep 30, 2025

satvshr mentioned this pull request Sep 30, 2025

[ENH] Update AptaNet notebook to use AptaTrans data and a pdb for prediction #153

Open

fkiraly reviewed Sep 30, 2025

View reviewed changes

satvshr requested a review from fkiraly September 30, 2025 10:13

Update _aptanet_utils.py

84a2754

fkiraly requested changes Oct 1, 2025

View reviewed changes

		from pyaptamer.datasets._loaders import load_pfoa_structure


		def test_pfoa_loader():

[ENH] Benchmarking framework and csv loader #114

Are you sure you want to change the base?

[ENH] Benchmarking framework and csv loader #114

Conversation

satvshr commented Aug 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fkiraly commented Sep 25, 2025

Uh oh!

satvshr commented Sep 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fkiraly left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fkiraly left a comment

Choose a reason for hiding this comment

Uh oh!

satvshr commented Sep 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fkiraly commented Sep 30, 2025

Uh oh!

satvshr commented Sep 30, 2025

Uh oh!

fkiraly left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

satvshr commented Aug 25, 2025 •

edited

Loading

satvshr commented Sep 26, 2025 •

edited

Loading

fkiraly left a comment •

edited

Loading

satvshr commented Sep 29, 2025 •

edited

Loading