-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: clean up map
and collect
with iterator traits
#15146
Conversation
From #13276 I recall that it would be natural to assign an |
We should probably be careful to distinguish "infinite" from "lazy" – i.e. you can't get the length of an IO object without reading it to the end but it does usually have an end (unless it's /dev/zero or something like that), whereas you can have iterators that are truly infinite as well. |
Yes, if people generally like this approach then we can add an IsInfinite flavor. |
24ebce0
to
6b90b4d
Compare
I decided to try making |
|
||
""" | ||
collect(collection) | ||
|
||
Return an array of all items in a collection. For associative collections, returns Pair{KeyType, ValType}. | ||
""" | ||
collect(itr) = collect(eltype(itr), itr) | ||
collect(itr) = _collect(itr, iteratoreltype(itr), iteratorsize(itr)) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
collect(repeated(1))
now gives a method-error instead of hanging? that's nice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I suppose so. Would you like to take a stab at adding more support for IsInfinite? I think mostly we need definitions for combining it correctly with other iterators.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I plan to do this, I am reading now to see how it works.
6b90b4d
to
3630dd3
Compare
6d8191b
to
ec3d68b
Compare
00c31f0
to
c30cffc
Compare
@@ -208,6 +208,13 @@ reshape(a::AbstractArray, dims::Int...) = reshape(a, dims) | |||
|
|||
## from general iterable to any array | |||
|
|||
function copy!(dest, src) | |||
for x in src | |||
push!(dest, x) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doesn't copy!
overwrite rather than append in all other cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes; I did this because push!
is the only generic "add to collection" function we have. This definition only works if copy(dest::AbstractArray, src)
is also defined, which it is.
push!
is key-preserving for non-arrays, but arrays can't do that trick since taking an element out of an array drops the key. Actually this definition can replace union!
(sets) and merge!
(dicts). Given that we have copy!, append!, union!, and merge!, it looks quite possible we have too many whole-collection functions and not enough single-element functions. append!
goes with push!
, but there isn't a function that pairs with copy!
. Arrays could then implement copy!
, but not the single-element partner for that function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like you'd have to empty!(dest)
first to make this actually implement copy!
, otherwise as written this is doing append!
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But what does append!
mean on a dict or set? (it's not currently defined)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Every other append!
method is equivalent to this, doing for x in src; push!(dest, x); end
, right?
There are other non-abstractarray collections other than dicts or sets so defining this with such a loose signature seems wrong. Why should copy!
append to existing keys for Associative (or other) collections but overwrite existing values for arrays?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that in general append!
is repeated push!
. So it seems the solution is to have an f!
such that copy!
is repeated f!
. f!
works for unordered collections, or collections like dicts where the "elements" contain both a key and a value, so there is no question about "where" it will be inserted.
append!
is in fact only defined for Vector and BitVector, so this hasn't been probed much. We also have union!
, which is literally implemented with repeated push!
, and merge!
which could easily be. These certainly overwrite existing values. So arguably we should replace push!
with f!
in these cases. f!
could be, say, a 2-argument form of insert!
.
A bit more background: in the use case in this PR, I'm either incrementally growing an Array, or building some other kind of collection. So push!
kind of makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the case of copy!
, that f!
seems like it should be setindex!
. The generic implementation for dest::AbstractArray
should probably be
for (i, x) in zip(eachindex(dest), src)
dest[i] = x
end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that looks fine. But the problem is that if you want to implement copy!
for non-arrays, you need to use push!
AFAICT. Which is not correct for arrays, hence this comment thread.
I have to interrupt my review for the remainder of the day, but aside from a big 👍 one general comment: this seems to suffer from the difference in our iteration protocol for arrays (which returns values) and associative collections (which returns key-value pairs). The closest we have for arrays is
I'm behind a ridiculously-slow connection or I'd provide a link, but if you haven't seen it, my recent "detangle..." and "iterate along a dimension..." PRs are also concerned with generalizing our iteration (over just arrays in this case). There seem to be a number of commonalities, so I'll try to keep a close eye on this. |
Exactly; the problem is that a single element of a set or dict has all the information you need to insert it in a different collection, but for arrays the keys are only part of the container and not the elements. The only solution I can think of is to use a different name than |
The other option is to introduce a I originally thought that one other potential difference between arrays and associative collections is |
|
I agree with @tkelman and was trying to come up with a generic iteration scheme that lets you use |
immutable EltypeUnknown <: IteratorEltype end | ||
immutable HasEltype <: IteratorEltype end | ||
|
||
iteratoreltype(x) = iteratoreltype(typeof(x)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
haseltype
? And then EltypeUnknown
and EltypeKnown
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds ok. I was aiming to have all the iterator traits share the iterator
prefix, but I guess having an element type does not need to be tied to iteration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Merit in the iterator
prefix, too. iteratorhaseltype
? Getting kinda long without underscores, though.
Sounds like the best thing for me to do for now is to restrict the copy! method to |
Seems reasonable to me. Probably move it into |
d0d8ae8
to
337074c
Compare
@timholy going off your recent changes, would it be more correct to iterate the indexes instead? copy!(dest, src) = for i in eachindex(src)
dest[i] = src[i]
end this might, for example, give more reasonable result when the types of src and dest are not the same |
Ready to merge I think? |
@vtjnash, right now (and even moreso once ReshapedArrays merge), the best approach is to use a separate iterator for each array (so a |
i was thinking that for the definition of copy, this isn't actually iterating |
The trouble is that the notion of an iterator only specifies a sequence of values, not a sequence of values with indices. So one game to play here is to see how far we can get in generic functions assuming that collections have no more structure than, essentially, a multi-set. I think I've tended to assume that an array is an ordered multi-set, rather than a set of |
An iterator could implicitly be considered to have integer indices. |
That's a good definition, and maybe all it should be. But I'm quite interested in some generalizations that are guaranteed to cause trouble 😈, and I suspect the multi-set won't prove to be enough of a foundation for deciding what the right behavior should be. |
…traits make HasEltype and HasLength the default this is a possible fix for #15342 type-accumulating Dict constructor some tests for type accumulation in `map` and `Dict`
add `iteratorsize(::Type{StreamMapIterator})`
337074c
to
bd02ba4
Compare
Should I be using something like From https://github.com/samoconnor/julia/blob/retry_branch/base/pmap.jl#L74: function batchsplit(c; min_batch_count=1, max_batch_size=100)
# Split collection into batches, then peek at the first few batches...
batches = partition(c, max_batch_size)
head, tail = head_and_tail(batches, min_batch_count)
# If there are not enough batches, use a smaller batch size...
if length(head) < min_batch_count
batch_size = max(1, div(sum(length, head), min_batch_count))
return partition(flatten(head), batch_size)
end
return flatten((head, tail))
end The goal is this:
Does the new iterator traits stuff allow asking the question: "Do we known, without non-trivial computation, that this collection has at least n items ?" |
No, it doesn't support that distinction yet. I think it would be ok for I can see the need for a "has at least n elements" function, since if |
see #15123