Skip to content

Conversation

@jjurm
Copy link
Contributor

@jjurm jjurm commented May 1, 2025

Motivation

Dataframely currently doesn't support the pl.Array column type, and so dataframes using array columns cannot be validated. I'd like to be able to validate those, too.

Changes

This PR implements the dy.Array column type corresponding to polars' pl.Array type.

The implementation is very similar to the dy.List, however, polars List differs from Array mostly in the following ways:

  • Lists are flat, whereas Arrays can have have shapes with multiple axes
  • Arrays must have fixed size (or shape) whereas List elements can vary in length
  • pl.Expr.arr has less functionality compared to pl.Expr.list, most notably there is no .arr.eval() function, which complicates validating rules. One option is to recursively convert the array column to nested lists just for validation, but since the performance would be suboptimal, I left this part unimplemented and only allowed Array's inner column types without any validation rules.

Would be happy to hear any feedback!

@jjurm jjurm added the enhancement New feature or request label May 1, 2025
@codecov
Copy link

codecov bot commented May 1, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 100.00%. Comparing base (b49692a) to head (217613b).
Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff            @@
##              main       #27   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           37        38    +1     
  Lines         1808      1845   +37     
=========================================
+ Hits          1808      1845   +37     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Member

@borchero borchero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, this looks great already! I left a couple of small comments but I think it is almost ready :)

Great job finding all the right places to add code, esp. in the tests, and thanks for maintaining the 100% coverage 😁🚀

Copy link
Member

@delsner delsner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please also update the mypy plugin and tests/test_typing.py similar to #29? Thanks!

Copy link
Member

@borchero borchero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me (modulo the mypy plugin that @delsner mentioned), thanks! 🚀

@jjurm
Copy link
Contributor Author

jjurm commented May 2, 2025

Thanks a lot for the great feedback @borchero !

I am struggling to get the problem with the mypy plugin, @delsner could you please add a test for the mypy plugin that you would like to pass?

Copy link
Member

@delsner delsner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, thanks! 🚀 I added a typing test and updated the mypy plugin code.

@borchero borchero merged commit f053ab8 into main May 3, 2025
18 checks passed
@borchero borchero deleted the add-array-column branch May 3, 2025 22:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants