Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replacing Pandas in EvaluationRunResult with CustomDataFrame #8793 #8838

Open
wants to merge 18 commits into
base: main
Choose a base branch
from

Conversation

blue-gitty
Copy link

@blue-gitty blue-gitty commented Feb 10, 2025

Related Issues

Proposed Changes:

The aim was to make EvaluationRunResult work without pandas.

The goal was to refactor the EvaluationRunResult class to function without relying on pandas. While pandas.DataFrame is a powerful tool for handling tabular data, it was not a strict requirement for this use case. Instead, I implemented a solution using simple Python dictionaries, which are lightweight and sufficient for the task.

However, since the existing unit tests for EvaluationRunResult were designed around pandas.DataFrame behavior, I couldn’t directly replace it with dictionaries. To bridge this gap, I created a custom class called CustomDataFrame. This class mimics the essential functionalities of pandas.DataFrame is built entirely using native Python data structures like dictionaries and lists.

The CustomDataFrame class supports multiple output formats to ensure compatibility with existing tests and use cases:
When calling methods like score_report, to_pandas, or comparative_individual_scores_report, the desired output format must be explicitly specified, or the default would be returned i.e. CustomDataFrame (dict). For example

How It Works

The CustomDataFrame internally uses dictionaries to store data, with keys representing column names and values representing column data as lists.

Output Methods:

to_dict: Returns the data as a dictionary.

to_csv_file_with_name: Saves the data to a CSV file or returns it as a string, if filename is not given, it returns the csv string.

to_pandas_dataframe: Converts the data to a pandas.DataFrame (optional, requires pandas).

How did you test it?

All functionalities of CustomDataFrame have been rigorously tested to ensure they pass the existing unit tests designed for pandas.DataFrame.

Notes for the reviewer

Create an EvaluationRunResult instance

result = EvaluationRunResult(run_name, inputs, results)

Get results as a dictionary

dict_output = result.score_report(output_format="customdataframe")

Get results as a CSV string

csv_output = result.score_report(output_format="csv", csv_file = 'score_report.csv')
If the file name int specified, it returns a csv string.

Get results as a pandas DataFrame (if pandas is installed)

pandas_output = result.score_report(output_format="pandas")

Checklist

  • I have read the contributors guidelines and the code of conduct
  • I have updated the related issue with new insights and changes
  • I added unit tests and updated the docstrings
  • I've used one of the conventional commit types for my PR title: fix:, feat:, build:, chore:, ci:, docs:, style:, refactor:, perf:, test: and added ! in case the PR includes breaking changes.
  • I documented my code
  • I ran pre-commit hooks and fixed any issue

mathislucka and others added 11 commits February 9, 2025 15:46
…eepset-ai#8812)

* add component checks

* pipeline should run deterministically

* add FIFOQueue

* add agent tests

* add order dependent tests

* run new tests

* remove code that is not needed

* test: intermediate from cycle outputs are available outside cycle

* add tests for component checks (Claude)

* adapt tests for component checks (o1 review)

* chore: format

* remove tests that aren't needed anymore

* add _calculate_priority tests

* revert accidental change in pyproject.toml

* test format conversion

* adapt to naming convention

* chore: proper docstrings and type hints for PQ

* format

* add more unit tests

* rm unneeded comments

* test input consumption

* lint

* fix: docstrings

* lint

* format

* format

* fix license header

* fix license header

* add component run tests

* fix: pass correct input format to tracing

* fix types

* format

* format

* types

* add defaults from Socket instead of signature

- otherwise components with dynamic inputs would fail

* fix test names

* still wait for optional inputs on greedy variadic sockets

- mirrors previous behavior

* fix format

* wip: warn for ambiguous running order

* wip: alternative warning

* fix license header

* make code more readable

Co-authored-by: Amna Mubashar <[email protected]>

* Introduce content tracing to a behavioral test

* Fixing linting

* Remove debug print statements

* Fix tracer tests

* remove print

* test: test for component inputs

* test: remove testing for run order

* chore: update component checks from experimental

* chore: update pipeline and base from experimental

* refactor: remove unused method

* refactor: remove unused method

* refactor: outdated comment

* refactor: inputs state is updated as side effect

- to prepare for AsyncPipeline implementation

* format

* test: add file conversion test

* format

* fix: original implementation deepcopies outputs

* lint

* fix: from_dict was updated

* fix: format

* fix: test

* test: add test for thread safety

* remove unused imports

* format

* test: FIFOPriorityQueue

* chore: add release note

* feat: add AsyncPipeline

* chore: Add release notes

* fix: format

* debug: switch run order to debug ubuntu and windows tests

* fix: consider priorities of other components while waiting for DEFER

* refactor: simplify code

* fix: resolve merge conflict with mermaid changes

* fix: format

* fix: remove unused import

* refactor: rename to avoid accidental conflicts

* fix: track pipeline type

* fix: and extend test

* fix: format

* style: sort alphabetically

* Update test/core/pipeline/features/conftest.py

Co-authored-by: Amna Mubashar <[email protected]>

* Update test/core/pipeline/features/conftest.py

Co-authored-by: Amna Mubashar <[email protected]>

* Update releasenotes/notes/feat-async-pipeline-338856a142e1318c.yaml

* fix: indentation, do not close loop

* fix: use asyncio.run

* fix: format

---------

Co-authored-by: Amna Mubashar <[email protected]>
Co-authored-by: David S. Batista <[email protected]>
…eepset-ai#8812)

* add component checks

* pipeline should run deterministically

* add FIFOQueue

* add agent tests

* add order dependent tests

* run new tests

* remove code that is not needed

* test: intermediate from cycle outputs are available outside cycle

* add tests for component checks (Claude)

* adapt tests for component checks (o1 review)

* chore: format

* remove tests that aren't needed anymore

* add _calculate_priority tests

* revert accidental change in pyproject.toml

* test format conversion

* adapt to naming convention

* chore: proper docstrings and type hints for PQ

* format

* add more unit tests

* rm unneeded comments

* test input consumption

* lint

* fix: docstrings

* lint

* format

* format

* fix license header

* fix license header

* add component run tests

* fix: pass correct input format to tracing

* fix types

* format

* format

* types

* add defaults from Socket instead of signature

- otherwise components with dynamic inputs would fail

* fix test names

* still wait for optional inputs on greedy variadic sockets

- mirrors previous behavior

* fix format

* wip: warn for ambiguous running order

* wip: alternative warning

* fix license header

* make code more readable

Co-authored-by: Amna Mubashar <[email protected]>

* Introduce content tracing to a behavioral test

* Fixing linting

* Remove debug print statements

* Fix tracer tests

* remove print

* test: test for component inputs

* test: remove testing for run order

* chore: update component checks from experimental

* chore: update pipeline and base from experimental

* refactor: remove unused method

* refactor: remove unused method

* refactor: outdated comment

* refactor: inputs state is updated as side effect

- to prepare for AsyncPipeline implementation

* format

* test: add file conversion test

* format

* fix: original implementation deepcopies outputs

* lint

* fix: from_dict was updated

* fix: format

* fix: test

* test: add test for thread safety

* remove unused imports

* format

* test: FIFOPriorityQueue

* chore: add release note

* feat: add AsyncPipeline

* chore: Add release notes

* fix: format

* debug: switch run order to debug ubuntu and windows tests

* fix: consider priorities of other components while waiting for DEFER

* refactor: simplify code

* fix: resolve merge conflict with mermaid changes

* fix: format

* fix: remove unused import

* refactor: rename to avoid accidental conflicts

* fix: track pipeline type

* fix: and extend test

* fix: format

* style: sort alphabetically

* Update test/core/pipeline/features/conftest.py

Co-authored-by: Amna Mubashar <[email protected]>

* Update test/core/pipeline/features/conftest.py

Co-authored-by: Amna Mubashar <[email protected]>

* Update releasenotes/notes/feat-async-pipeline-338856a142e1318c.yaml

* fix: indentation, do not close loop

* fix: use asyncio.run

* fix: format

---------

Co-authored-by: Amna Mubashar <[email protected]>
Co-authored-by: David S. Batista <[email protected]>
@blue-gitty blue-gitty requested review from a team as code owners February 10, 2025 22:39
@blue-gitty blue-gitty requested review from dfokina and mpangrazzi and removed request for a team February 10, 2025 22:39
@CLAassistant
Copy link

CLAassistant commented Feb 10, 2025

CLA assistant check
All committers have signed the CLA.

@github-actions github-actions bot added the type:documentation Improvements on the docs label Feb 10, 2025
@davidsbatista davidsbatista self-requested a review February 12, 2025 14:05
@davidsbatista
Copy link
Contributor

@blue-gitty thanks a lot for this - I'm starting a review

@davidsbatista
Copy link
Contributor

@blue-gitty thanks again for this - there are a few issues with the type checker and also some aspects regarding cleaning and improving a bit the code - do you mind if I take it from here?

@blue-gitty
Copy link
Author

@davidsbatista Sure, please go ahead and take it from here. I realize the lint issues now, let me know if I could be of any help. Thanks!

@coveralls
Copy link
Collaborator

Pull Request Test Coverage Report for Build 13287209985

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • 48 unchanged lines in 2 files lost coverage.
  • Overall coverage decreased (-0.4%) to 91.126%

Files with Coverage Reduction New Missed Lines %
evaluation/base.py 10 0.0%
evaluation/eval_run_result.py 38 66.67%
Totals Coverage Status
Change from base Build 13284465135: -0.4%
Covered Lines: 9416
Relevant Lines: 10333

💛 - Coveralls

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic:tests type:documentation Improvements on the docs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make EvaluationRunResult work without pandas
5 participants