-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replacing Pandas in EvaluationRunResult with CustomDataFrame #8793 #8838
base: main
Are you sure you want to change the base?
Conversation
…eepset-ai#8812) * add component checks * pipeline should run deterministically * add FIFOQueue * add agent tests * add order dependent tests * run new tests * remove code that is not needed * test: intermediate from cycle outputs are available outside cycle * add tests for component checks (Claude) * adapt tests for component checks (o1 review) * chore: format * remove tests that aren't needed anymore * add _calculate_priority tests * revert accidental change in pyproject.toml * test format conversion * adapt to naming convention * chore: proper docstrings and type hints for PQ * format * add more unit tests * rm unneeded comments * test input consumption * lint * fix: docstrings * lint * format * format * fix license header * fix license header * add component run tests * fix: pass correct input format to tracing * fix types * format * format * types * add defaults from Socket instead of signature - otherwise components with dynamic inputs would fail * fix test names * still wait for optional inputs on greedy variadic sockets - mirrors previous behavior * fix format * wip: warn for ambiguous running order * wip: alternative warning * fix license header * make code more readable Co-authored-by: Amna Mubashar <[email protected]> * Introduce content tracing to a behavioral test * Fixing linting * Remove debug print statements * Fix tracer tests * remove print * test: test for component inputs * test: remove testing for run order * chore: update component checks from experimental * chore: update pipeline and base from experimental * refactor: remove unused method * refactor: remove unused method * refactor: outdated comment * refactor: inputs state is updated as side effect - to prepare for AsyncPipeline implementation * format * test: add file conversion test * format * fix: original implementation deepcopies outputs * lint * fix: from_dict was updated * fix: format * fix: test * test: add test for thread safety * remove unused imports * format * test: FIFOPriorityQueue * chore: add release note * feat: add AsyncPipeline * chore: Add release notes * fix: format * debug: switch run order to debug ubuntu and windows tests * fix: consider priorities of other components while waiting for DEFER * refactor: simplify code * fix: resolve merge conflict with mermaid changes * fix: format * fix: remove unused import * refactor: rename to avoid accidental conflicts * fix: track pipeline type * fix: and extend test * fix: format * style: sort alphabetically * Update test/core/pipeline/features/conftest.py Co-authored-by: Amna Mubashar <[email protected]> * Update test/core/pipeline/features/conftest.py Co-authored-by: Amna Mubashar <[email protected]> * Update releasenotes/notes/feat-async-pipeline-338856a142e1318c.yaml * fix: indentation, do not close loop * fix: use asyncio.run * fix: format --------- Co-authored-by: Amna Mubashar <[email protected]> Co-authored-by: David S. Batista <[email protected]>
…eepset-ai#8812) * add component checks * pipeline should run deterministically * add FIFOQueue * add agent tests * add order dependent tests * run new tests * remove code that is not needed * test: intermediate from cycle outputs are available outside cycle * add tests for component checks (Claude) * adapt tests for component checks (o1 review) * chore: format * remove tests that aren't needed anymore * add _calculate_priority tests * revert accidental change in pyproject.toml * test format conversion * adapt to naming convention * chore: proper docstrings and type hints for PQ * format * add more unit tests * rm unneeded comments * test input consumption * lint * fix: docstrings * lint * format * format * fix license header * fix license header * add component run tests * fix: pass correct input format to tracing * fix types * format * format * types * add defaults from Socket instead of signature - otherwise components with dynamic inputs would fail * fix test names * still wait for optional inputs on greedy variadic sockets - mirrors previous behavior * fix format * wip: warn for ambiguous running order * wip: alternative warning * fix license header * make code more readable Co-authored-by: Amna Mubashar <[email protected]> * Introduce content tracing to a behavioral test * Fixing linting * Remove debug print statements * Fix tracer tests * remove print * test: test for component inputs * test: remove testing for run order * chore: update component checks from experimental * chore: update pipeline and base from experimental * refactor: remove unused method * refactor: remove unused method * refactor: outdated comment * refactor: inputs state is updated as side effect - to prepare for AsyncPipeline implementation * format * test: add file conversion test * format * fix: original implementation deepcopies outputs * lint * fix: from_dict was updated * fix: format * fix: test * test: add test for thread safety * remove unused imports * format * test: FIFOPriorityQueue * chore: add release note * feat: add AsyncPipeline * chore: Add release notes * fix: format * debug: switch run order to debug ubuntu and windows tests * fix: consider priorities of other components while waiting for DEFER * refactor: simplify code * fix: resolve merge conflict with mermaid changes * fix: format * fix: remove unused import * refactor: rename to avoid accidental conflicts * fix: track pipeline type * fix: and extend test * fix: format * style: sort alphabetically * Update test/core/pipeline/features/conftest.py Co-authored-by: Amna Mubashar <[email protected]> * Update test/core/pipeline/features/conftest.py Co-authored-by: Amna Mubashar <[email protected]> * Update releasenotes/notes/feat-async-pipeline-338856a142e1318c.yaml * fix: indentation, do not close loop * fix: use asyncio.run * fix: format --------- Co-authored-by: Amna Mubashar <[email protected]> Co-authored-by: David S. Batista <[email protected]>
…RunResult to work without pandas
@blue-gitty thanks a lot for this - I'm starting a review |
@blue-gitty thanks again for this - there are a few issues with the type checker and also some aspects regarding cleaning and improving a bit the code - do you mind if I take it from here? |
@davidsbatista Sure, please go ahead and take it from here. I realize the lint issues now, let me know if I could be of any help. Thanks! |
Pull Request Test Coverage Report for Build 13287209985Warning: This coverage report may be inaccurate.This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.
Details
💛 - Coveralls |
Related Issues
Proposed Changes:
The aim was to make EvaluationRunResult work without pandas.
The goal was to refactor the EvaluationRunResult class to function without relying on pandas. While pandas.DataFrame is a powerful tool for handling tabular data, it was not a strict requirement for this use case. Instead, I implemented a solution using simple Python dictionaries, which are lightweight and sufficient for the task.
However, since the existing unit tests for EvaluationRunResult were designed around pandas.DataFrame behavior, I couldn’t directly replace it with dictionaries. To bridge this gap, I created a custom class called CustomDataFrame. This class mimics the essential functionalities of pandas.DataFrame is built entirely using native Python data structures like dictionaries and lists.
The CustomDataFrame class supports multiple output formats to ensure compatibility with existing tests and use cases:
When calling methods like score_report, to_pandas, or comparative_individual_scores_report, the desired output format must be explicitly specified, or the default would be returned i.e. CustomDataFrame (dict). For example
How It Works
The CustomDataFrame internally uses dictionaries to store data, with keys representing column names and values representing column data as lists.
Output Methods:
to_dict
: Returns the data as a dictionary.to_csv_file_with_name
: Saves the data to a CSV file or returns it as a string, if filename is not given, it returns the csv string.to_pandas_dataframe
: Converts the data to apandas.DataFrame
(optional, requires pandas).How did you test it?
All functionalities of CustomDataFrame have been rigorously tested to ensure they pass the existing unit tests designed for pandas.DataFrame.
Notes for the reviewer
Create an EvaluationRunResult instance
result = EvaluationRunResult(run_name, inputs, results)
Get results as a dictionary
dict_output = result.score_report(output_format="customdataframe")
Get results as a CSV string
csv_output = result.score_report(output_format="csv", csv_file = 'score_report.csv')
If the file name int specified, it returns a csv string.
Get results as a pandas DataFrame (if pandas is installed)
pandas_output = result.score_report(output_format="pandas")
Checklist
fix:
,feat:
,build:
,chore:
,ci:
,docs:
,style:
,refactor:
,perf:
,test:
and added!
in case the PR includes breaking changes.