Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

filter doesn't work on generators #37146

Open
knuesel opened this issue Aug 21, 2020 · 7 comments
Open

filter doesn't work on generators #37146

knuesel opened this issue Aug 21, 2020 · 7 comments
Labels
feature Indicates new feature / enhancement requests iteration Involves iteration or the iteration protocol

Comments

@knuesel
Copy link
Member

knuesel commented Aug 21, 2020

The filter function cannot be used on generators:

julia> filter(isodd, i for i in 0:9)
ERROR: MethodError: no method matching filter(::typeof(isodd), ::Base.Generator{UnitRange{Int64},typeof(identity)})
Closest candidates are:
  filter(::Any, ::Tuple{Any,Any,Any,Any,Any,Any,Any,Any,Any,Any,Any,Any,Any,Any,Any,Any,Vararg{Any,N}} where N) at tuple.jl:267
  filter(::Any, ::Tuple) at tuple.jl:264
  filter(::Any, ::Array{T,N}) where {T, N} at array.jl:2457
  ...
Stacktrace:
 [1] top-level scope at REPL[59]:1

This is surprising because map works, and the documentation of both map and filter say they work on "collections".

Apparently it used to work, see examples mentioned in #18866 .

In support of the view that the current behavior is confusing, some problems reported by users on Discourse:

https://discourse.julialang.org/t/creating-an-array-of-tuples-why-doesnt-filter-work-on-a-zip/28173
https://discourse.julialang.org/t/filter-doesnt-work-on-grouped-dataframe/27322
https://discourse.julialang.org/t/filter-of-skipmissing-fails/21249
https://discourse.julialang.org/t/what-is-the-correct-way-to-ignore-some-files-directories-in-walkdir/26780
https://discourse.julialang.org/t/jump-filtering-the-output-of-jump-value-x/25391

(And my own use case: filtering the output of eachline.)

julia> versioninfo()
Julia Version 1.5.0
Commit 96786e22cc (2020-08-01 23:44 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Core(TM) i5-7200U CPU @ 2.50GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-9.0.1 (ORCJIT, skylake)
@quinnj
Copy link
Member

quinnj commented Aug 21, 2020

Note that Iterators.filter works on Generators and is how "guards" are implemented, like:

julia> f = (i for i = 1:10 if i % 2 == 0)
Base.Generator{Base.Iterators.Filter{var"#33#34",UnitRange{Int64}},typeof(identity)}(identity, Base.Iterators.Filter{var"#33#34",UnitRange{Int64}}(var"#33#34"(), 1:10))

@mcabbott
Copy link
Contributor

Perhaps there should be a method Base.filter(f, itr) = collect(Iterators.filter(f, itr)), to match the one map(f, A) = collect(Generator(f,A)).

@Gyoshi
Copy link

Gyoshi commented Dec 29, 2020

One thing that came up from a slack discussion with @mcabbott, @dpsanders, @SaschaMann and others, and @bramtayl mentioning it in the above PR as well: Though not everyone agreed(?), it seems questionable to automatically collect as a default behaviour. (I do not fully understand why map has this behaviour, but seems like the reason for it is in #15342.)

The current issue would also be fixed by simply exporting Iterators.filter. This would be more in line with the default lazy behaviour of generators in my opinion.

Similar issues could be fixed for accumulate and reverse which have eager versions in Base and lazy versions in Iterators.

@Gyoshi
Copy link

Gyoshi commented Dec 29, 2020

Seems like the origin of the separation of eager/lazy is #18839:

I separated Base.filter and IterTools.filter into two functions (since the latter is lazy and the former is eager). This is up for debate.

I don't know if the separation has since been debated and agreed upon conclusively, but I think the current issue highlights the problem with this separation rather well: Base filter not working on generators is evidently widely unexpected behaviour.

@bramtayl
Copy link
Contributor

I think the issue with iterators is that they are...messy from a type standpoint. They inherit some traits from their parents and not others. Iterators.map inherits size but not eltype. Iterators.filter inherits eltype but not size. So while I still think that it would make a ton of sense to move away from auto-collection, it would be critical to figure out a story for traits (or tags, or multiple inheritance, or whatever) first.

@JeffBezanson
Copy link
Member

Iterators.map inherits size but not eltype. Iterators.filter inherits eltype but not size. So while I still think that it would make a ton of sense to move away from auto-collection, it would be critical to figure out a story for traits

The story is: iterators preserve traits that they preserve. map inherently changes the eltype to something unknown, and filter inherently breaks any kind of N-d array structure since it removes arbitrary elements (and you don't even know how many until you've seen every element). What else could they do?

@bramtayl
Copy link
Contributor

bramtayl commented Jan 5, 2021

Right; AbstractArray is really just two separate traits (HasEltype and HasShape)

@brenhinkeller brenhinkeller added iteration Involves iteration or the iteration protocol feature Indicates new feature / enhancement requests labels Nov 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Indicates new feature / enhancement requests iteration Involves iteration or the iteration protocol
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants