Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add nrow and ncol fallback implementations #343

Merged
merged 9 commits into from
Sep 19, 2023
Merged

Add nrow and ncol fallback implementations #343

merged 9 commits into from
Sep 19, 2023

Conversation

bkamins
Copy link
Member

@bkamins bkamins commented Sep 7, 2023

To help with issues like #342

@codecov
Copy link

codecov bot commented Sep 7, 2023

Codecov Report

Patch coverage: 100.00% and project coverage change: +0.09% 🎉

Comparison is base (dac28d9) 94.46% compared to head (88a9a79) 94.55%.
Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #343      +/-   ##
==========================================
+ Coverage   94.46%   94.55%   +0.09%     
==========================================
  Files           7        7              
  Lines         723      735      +12     
==========================================
+ Hits          683      695      +12     
  Misses         40       40              
Files Changed Coverage Δ
src/dicts.jl 94.06% <100.00%> (+0.10%) ⬆️
src/fallbacks.jl 97.40% <100.00%> (+0.06%) ⬆️
src/matrix.jl 100.00% <100.00%> (ø)
src/namedtuples.jl 97.67% <100.00%> (+0.11%) ⬆️

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@bkamins
Copy link
Member Author

bkamins commented Sep 7, 2023

I can see that my implementation is incorrect, but let us wait for @quinnj to comment how this should be fixed (my implementation does not work correctly for all tables).

@juliohm
Copy link

juliohm commented Sep 7, 2023

Thank you Bogumil for addressing this issue. It would be really nice to get more explicit support in Tables.jl for nrow and ncol, especially nrow.

A few comments:

  1. Julia Base has a different naming convention for functions that return integers: ndims, nfields, ndigits, ... all end with "s". Is there a reason for nrow and ncol to be singular? Would it be too breaking to rename them to nrows and ncols for consistency?
  2. Is there a reason why nrow and ncol are implemented in DataAPI.jl? Could these names be moved to Tables.jl in place of Tables.rowcount? It feels like the most natural home for these functions.
  3. I understand that Tables.jl also targets the use case where tables have infinite streams of rows. I think this use case could be considered the exception instead of the default, and that nrow and ncol were part of the default table trait. Infinite tables could be formalized with a special trait that only a few users would benefit.
  4. It is important to emphasize that adding nrow to the default table trait doesn't imply that all tables will fit in memory. We can have the a priori knowledge of nrow from a file format specification, and still only be able to load chunks of the file in memory because it is too big.

@bkamins
Copy link
Member Author

bkamins commented Sep 7, 2023

Is there a reason for nrow and ncol to be singular?

This is just a convention that was already in DataFrames.jl before I started developing it. The convention originates from R I guess.
So this is kind of a situation where the names were picked because everyone used them. It is like in the English language (I am not a native speaker so my perspective does not have to be super correct): people say "gonna" because it is short and everyone says so, although I was taught that this is technically incorrect.

It feels like the most natural home for these functions.

They could. The reason is that Tables.jl provides implementations while DataAPI.jl provides only API. And as such nrow and ncol do not have currently a default implementation (they could get it in Tables.jl).

I understand that Tables.jl also targets the use case where tables have infinite streams of rows.

Yes. That is why I have commented above that what should nrow/ncol return in case of a lazy table that is e.g. schemaless and of unknown number of rows would have to be decided as a part of the design.

It is important to emphasize that adding nrow to the default table trait doesn't imply that all tables will fit in memory.

Yes. nrow should play with Tables.subset (so that users know what indices are valid)

@bkamins
Copy link
Member Author

bkamins commented Sep 17, 2023

@quinnj - I have updated the PR following your comment on rowcount usage. Can you please have a look if its design is OK now? If yes I will add tests.

Copy link
Member

@quinnj quinnj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bkamins, this LGTM; would you mind adding some simple tests that cover the new nrow/ncol definitions for the various Tables-owned table types? With that, feel free to merge and we can tag a new release.

@bkamins bkamins merged commit a695836 into main Sep 19, 2023
@bkamins bkamins deleted the bkamins-patch-2 branch September 19, 2023 18:01
@bkamins
Copy link
Member Author

bkamins commented Sep 19, 2023

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants