Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Vec::drain_filter #43245

Merged
merged 1 commit into from
Aug 16, 2017
Merged

Add Vec::drain_filter #43245

merged 1 commit into from
Aug 16, 2017

Conversation

Gankra
Copy link
Contributor

@Gankra Gankra commented Jul 15, 2017

This implements the API proposed in #43244.

So I spent like half a day figuring out how to implement this in some awesome super-optimized unsafe way, which had me very confident this was worth putting into the stdlib.

Then I looked at the impl for retain, and was like "oh dang". I compared the two and they basically ended up being the same speed. And the retain impl probably translates to DoubleEndedIter a lot more cleanly if we ever want that.

So now I'm not totally confident this needs to go in the stdlib, but I've got two implementations and an amazingly robust test suite, so I figured I might as well toss it over the fence for discussion.

@rust-highfive
Copy link
Collaborator

r? @sfackler

(rust_highfive has picked a reviewer for you, use r? to override)

@Gankra
Copy link
Contributor Author

Gankra commented Jul 15, 2017

In case you're curious here's the other impl: https://gist.github.com/Gankro/ec6bf4f939b40ace16790447ed75152b

It basically does a ton of work to try to use ptr::copy() on contiguous blocks of elements instead of swapping them individually. As a result it ends up having to handle boundary cases, and ZSTs all over the place.

@Mark-Simulacrum
Copy link
Member

I think this is a good thing to include in the standard library. Even though out of tree implementation may be possible, I've found multiple times that vec.into_iter().partition() is just easier than finding and including a library. Having this would make it be the ideal performance wise for those situations.

@JordiPolo
Copy link
Contributor

I believe this is called delete_if in Ruby. That name is clearer IMHO

@Gankra
Copy link
Contributor Author

Gankra commented Jul 15, 2017

Neither delete nor if have any precedent in our standard library naming so I disagree.

So the three fundamentally similar operations we have are drain, retain, and filter.

  • Drain moves things out (but doesn't accept a predicate)
  • Filter takes an iter and produces a filtered one (but can't modify a source)
  • Retain removes based on a filter (but doesn't produce an iterator)

This gives us a few possibilities:

  • drain_by
  • drain_with
  • drain_retain
  • drain_filter
  • filter
  • retain_move(??)

The most important point, to me, is that filter and retain have reversed "polarity" when applied to this problem:

  • with filter, true means "I want this to be yielded"
  • with retain, true means "I want this to stay in the vec"

This is precisely why I didn't like drain_by/drain_with; they left the predicate ambiguous on their own. Ultimately I felt that filter was the clearer model. drain(x..y) says you want to remove x..y, so drain(|x| *x % 2 == 0) should say you want to remove all x that match x % 2 == 0.

Using bare filter as a name didn't sit well with me; we don't add iterator-ish methods directly on collections.

So ultimately drain_filter seemed like the clearest name. It also, hopefully, makes it easier to find.

@Mark-Simulacrum Mark-Simulacrum added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jul 17, 2017
@alexcrichton alexcrichton added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label Jul 17, 2017
/// But `drain_filter` is easier to use. `drain_filter` is also more efficient,
/// because it can backshift the elements of the array in bulk.
///
/// Note that `drain_filter` also lets you mutate ever element in the filter closure,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

every.

@sfackler
Copy link
Member

This seems like a reasonable API to me.

@rfcbot fcp merge

@rfcbot
Copy link

rfcbot commented Jul 19, 2017

Team member @sfackler has proposed to merge this. The next step is review by the rest of the tagged teams:

No concerns currently listed.

Once these reviewers reach consensus, this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@rfcbot rfcbot added the proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. label Jul 19, 2017
@Gankra
Copy link
Contributor Author

Gankra commented Jul 19, 2017

typo fixed

@CryZe
Copy link
Contributor

CryZe commented Jul 19, 2017

Interestingly, drain_filter could be used in place of the missing retain_mut method. Would it make sense to use it as that, or is the fact that it's returning an Iterator causing some overhead over an actual implementation of retain_mut like the one in bluss' odds crate?

@Gankra
Copy link
Contributor Author

Gankra commented Jul 19, 2017

In my experiments iteraterification had little to no performance impact, but I don't 100% trust my results.

Note that drain_filter has a reversed predicate to retain_mut.

@carols10cents carols10cents added S-waiting-on-team Status: Awaiting decision from the relevant subteam (see the T-<team> label). and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jul 24, 2017
@alexcrichton
Copy link
Member

ping @Kimundi, @aturon, @brson, @dtolnay,

checkboxes!

/// let val = vec.remove(i);
/// // your code here
/// }
/// i += 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would skip every item after a removed item.

let some_predicate = |&x| x == 2;
let mut vec = vec![1, 2, 2, 3, 4, 5]; 

let mut i = 0;
while i != vec.len() {
    if some_predicate(&mut vec[i]) {
        let val = vec.remove(i);
        println!("removed {}", val);
    }
    i += 1;
}

println!("{:?}", vec);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, haha! Need to put the +=1 in an else

@nrxus
Copy link

nrxus commented Jul 31, 2017

Do we really need a whole new struct that implements Iterator for this?

Would it not be easier to re-use most of the retain function, and call drain after we have done the moving of the elements?

fn retain_or_drain<F>(&mut self, mut f: F) -> std::vec::Drain<T>
  where
    F: FnMut(&T) -> bool,
{
  // this is exactly the same as the current retain() method...
  let len = self.len();
  let mut del = 0;
  {
    let v = &mut **self;

    for i in 0..len {
      if !f(&v[i]) {
        del += 1;
      } else if del > 0 {
        v.swap(i - del, i);
      }
    }
  }
  //... until here
  // the existing retain() methods truncates
  // this new method would return a drain instead
  self.drain(len - del..)
}

The method above is very similar to the current retain method, and re-uses the existing Drain iterator instead of creating an entire new type.

type Item = T;

fn next(&mut self) -> Option<T> {
unsafe {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would personally not make the unsafe cover the entire block, just the raw_parts_mut method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's also needed for the ptr::read. At that point might as well just say the whole thing is unsafe and call it a day (this matches many parts of this file).

@Gankra
Copy link
Contributor Author

Gankra commented Aug 1, 2017

@nrxus iterators are maximally flexible and composable, and should only be avoided when there's a performance or ergonomics reason. In this case I don't think there is.

@nrxus
Copy link

nrxus commented Aug 1, 2017

@gankro Drain is also an iterator though. What do we gain from a new type that implements iterator (FilterDrain) that outweights the benefit of reusing an existing working type that also implements iterator (Drain).

@Gankra
Copy link
Contributor Author

Gankra commented Aug 1, 2017

Oh I see, I misread your original comment.

I don't consider unifying DrainFilter and Drain to eliminate a struct to be intrinsically valuable to our users (and indeed, it would corner us into a very specific implementation). The maintenance burden is also fairly minimal for std.

I am also concerned that this implementation has worse cache behaviour when most elements need to be removed, as it runs over the entire array an extra time to actually yield the elements. It also requires actually swapping the elements, as opposed to just memcopying one over the other.

@aturon
Copy link
Member

aturon commented Aug 1, 2017

(I checked off @brson's box due to him being away)

@rfcbot
Copy link

rfcbot commented Aug 1, 2017

🔔 This is now entering its final comment period, as per the review above. 🔔

@rfcbot rfcbot added final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. and removed proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. labels Aug 1, 2017
@RalfJung
Copy link
Member

RalfJung commented Aug 4, 2017

One problem with all these closure-based design is that you cannot use ? inside the closure. I recently wanted exactly this drain_filter API (well I wasn't even actually interested in the visitor), but I am calling fallible operations per item, so I'd need to move the Err outwards using an &mut or so. But I suppose this is not specific to drain_filter, so this is probably not the right place to address this issue.

@rfcbot
Copy link

rfcbot commented Aug 11, 2017

The final comment period is now complete.

@kennytm
Copy link
Member

kennytm commented Aug 13, 2017

Ping @sfackler? FCP completed without much concern besides #43245 (comment) (OP's reply at #43245 (comment)).

@carols10cents carols10cents added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-team Status: Awaiting decision from the relevant subteam (see the T-<team> label). labels Aug 14, 2017
@sfackler
Copy link
Member

@bors r+

@bors
Copy link
Contributor

bors commented Aug 15, 2017

📌 Commit 1af4226 has been approved by sfackler

@Gankra
Copy link
Contributor Author

Gankra commented Aug 15, 2017

\o/

@bors
Copy link
Contributor

bors commented Aug 15, 2017

⌛ Testing commit 1af4226 with merge 1e60a47...

bors added a commit that referenced this pull request Aug 15, 2017
Add Vec::drain_filter

This implements the API proposed in #43244.

So I spent like half a day figuring out how to implement this in some awesome super-optimized unsafe way, which had me very confident this was worth putting into the stdlib.

Then I looked at the impl for `retain`, and was like "oh dang". I compared the two and they basically ended up being the same speed. And the `retain` impl probably translates to DoubleEndedIter a lot more cleanly if we ever want that.

So now I'm not totally confident this needs to go in the stdlib, but I've got two implementations and an amazingly robust test suite, so I figured I might as well toss it over the fence for discussion.
@bors
Copy link
Contributor

bors commented Aug 16, 2017

☀️ Test successful - status-appveyor, status-travis
Approved by: sfackler
Pushing 1e60a47 to master...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.