Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exclude certain kinds of dictionary tables from being interpreted as tables #189

Merged
merged 5 commits into from
Oct 12, 2022

Conversation

ablaom
Copy link
Member

@ablaom ablaom commented Oct 10, 2022

This PR will:

@ablaom
Copy link
Member Author

ablaom commented Oct 10, 2022

According to local testing, this PR will resolve JuliaAI/MLJText.jl#25 .

@ablaom
Copy link
Member Author

ablaom commented Oct 10, 2022

@bkamins Be great if you could have a look at this, thanks.

@codecov-commenter
Copy link

codecov-commenter commented Oct 10, 2022

Codecov Report

Merging #189 (fb34924) into dev (6ab77cf) will increase coverage by 0.04%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##              dev     #189      +/-   ##
==========================================
+ Coverage   93.50%   93.54%   +0.04%     
==========================================
  Files           6        6              
  Lines         431      434       +3     
==========================================
+ Hits          403      406       +3     
  Misses         28       28              
Impacted Files Coverage Δ
src/ScientificTypes.jl 100.00% <100.00%> (ø)
src/convention/scitype.jl 93.18% <100.00%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@ablaom ablaom changed the title Exclude certain kinds of dictionary tables from being interpreted as dictionaries Exclude certain kinds of dictionary tables from being interpreted as tables Oct 10, 2022
@bkamins
Copy link

bkamins commented Oct 10, 2022

Apart from a comment related to JuliaData/Tables.jl#309 I have a general design question that I think would be worth discussing.

The issue is that Tables.istable is only a one-way promise:

Check if an object has specifically defined that it is a table. Note that not all valid tables will return true, since it's possible to satisfy the Tables.jl interface at "run-time", e.g. a Generator of NamedTuples iterates NamedTuples, which satisfies the AbstractRow interface, but there's no static way of knowing that the generator is a table.

This means that is we see true we know that something is a table, but when we see false it does not mean it is not a table. This is indeed unfortunate.

The problem is that some MLJ.jl users can use a table for which Tables.istable returns false. Then they have a problem because your current mechanisms will not let them such a table be treated as a table, but rather it will get :other treatment.

You know the MLJ.jl ecosystem much better than me 😄, so I do not feel qualified to give informed proposals what to do with this, but clearly the sentence:

any table type T supported by Tables.jl

is not true currently. It probably should be made more precise (if nothing is changed) to saying that table is only considered something for which Tables.istable returns true except these two excluded cases. The other solution could be that a separate method would be for something that user explicitly wants to be considered as table, and separate for things not to be considered tables (but I am not sure if this is feasible or convinient, it also might be breaking; so to do not treat this suggestion as something strong).

@bkamins
Copy link

bkamins commented Oct 12, 2022

Given the discussion in JuliaData/Tables.jl#309 and Tables.jl 1.10.0 release the PR should be updated to AbstractString.

@ablaom
Copy link
Member Author

ablaom commented Oct 12, 2022

@bkamins Thanks for the review and comments. I have now updated the excluded key type from String to AbstractString.

It probably should be made more precise (if nothing is changed) to saying that table is only considered something for which Tables.istable returns true except these two excluded cases.

I have clarified the ScientificTypes.jl documentation to that effect, and raised an issue to do the same in MLJ docs.

As far as allowing tables in MLJ for which Tables.istable is false, I can't think of anything better than adding a wrapper for objects to declare them as bona fide tables, with Tables.jl methods passing through, which I guess is what you are suggesting. I suppose the main use case in ML for these tables would come from tabular data not fitting into memory. MLJ hasn't really been designed to train using out-of-memory datasets, so this probably fit's into the broader design discussion at, e.g., JuliaML/MLUtils.jl#61 .

@ablaom ablaom merged commit 3f7cbc0 into dev Oct 12, 2022
@ablaom ablaom mentioned this pull request Oct 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants