-
Notifications
You must be signed in to change notification settings - Fork 288
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Bytes::from_owner #742
Conversation
What do you think, Sean? It is interesting that it avoids exposing the vtable like previous suggestions. |
Yes, that was my goal: #437 (comment) |
At a quick glance, this looks like it would be forwards compatible with exposing the vtable? |
src/bytes.rs
Outdated
/// | ||
/// Note that converting [Bytes] constructed from an owner into a [BytesMut] | ||
/// will always create a deep copy of the buffer into newly allocated memory. | ||
pub fn from_owner<T: AsRef<[u8]>>(owner: T) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this need some kind of StableDeref bound to protect against weird edge cases where the referenced region changes over time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The string
crate had a similar issue: carllerche/string#9
I do wonder if Sync
would be sufficient here... In general, we probably do need a Send
bound as well since Bytes
is Send
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sync wouldn't cover it since you can have a thread-safe implementation that still mutates internally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As for StableDeref
isn't calling as_ref
just once, never moving the memory afterwards because it's Box
ed and never calling any other method already guaranteeing the Bytes
invariants and soundness in general?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm yeah only calling as_ref once may actually make this fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a safe way for it to still be UB, even with only calling it once. If I pass in a form of Arc<Mutex<_>>
, and then afterwards change the internal buffer completely (maybe a reallocation), the captured pointer could be referring to invalid memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reference we get back from the as_ref needs to be valid for the remaining lifetime of the object though - it can't be swapped out without a call to an &mut method that would ensure the initial borrow is gone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it's sound as-is. You cannot write a safe thing where this breaks even with Arc
/Mutex
because if the buffer is inside the mutex, the implementer of AsRef
can't return the buffer from as_ref
since it borrows from the mutex guard.
As written, the as_ref
call essentially borrows the owner
field until we run the destructor. As long as we never make any mutable function calls on owner
during that time, it's okay.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use std::sync::{Arc, Mutex};
struct Evil {
within: Arc<Mutex<Vec<u8>>>,
}
impl AsRef<[u8]> for Evil {
fn as_ref(&self) -> &[u8] {
// error[E0515]: cannot return value referencing temporary value
self.within.try_lock().unwrap().as_ref()
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm not mistaken the current API should guard against Owner
exposing a thread local as well.
src/bytes.rs
Outdated
/// | ||
/// Note that converting [Bytes] constructed from an owner into a [BytesMut] | ||
/// will always create a deep copy of the buffer into newly allocated memory. | ||
pub fn from_owner<T: AsRef<[u8]>>(owner: T) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we may want our own trait instead of AsRef<[u8]>
so we can add more methods in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
<opinionated>
A custom trait is how I had originally started implementing this, but gave up early on for a few reasons:
- Worse ergonomics for custom trait:
- Many things already implement
AsRef<[u8]>
, soBytes::from_owner(mmap);
is easy and straight-forward. - Conversely, a custom trait would require
Bytes::from_custom(MyMmapWrapper::new(mmap))
.
- Many things already implement
- Safety:
AsRef<[u8]>
can be called during construction and only once. It's a trick that makes thefrom_owner
API safe.- Any additional APIs would - in practice - need to be called during the lifetime of bytes. As an example
trait CustomOwner { fn try_into_mut(self) -> Result<impl CustomOwnerMut, Self>; }
(unless I'm wrong) would make it into anunsafe trait
. - What's missing from the current approach is zero-copy conversion to
BytesMut
.
The implementor of a trait that that exposed this would need to be very careful here to respect theVec<u8>
semantics. Context: IfBytes
is not unique, the existing code creates a newly allocatedVec<u8>
copy of the referenced memory. Unless due care is taken - in the specific case of anMmap
- this would likely lead an implementor to erroneously assume that they should create anMmapMut
. This would be wrong, especially if the mmapped section is backed by a file as the resulting behaviour would sometimes update the contents of a file and sometimes not, depending onBytes
's internal refcount. - In general, also allowing any other calls to the object during its lifetime also allows for scope for the object to invalidate the slice returned from
.as_ref()
rendering that initial call unsafe too, since there's no borrow checker here to police usage.
- Feature creep:
- By all means, I think a custom trait approach might work in the future for a
fn from_custom <T: CustomOwner>
and as a good way of exposing more of the vtable functionality, but it is not where the current pain point in the API is and there are many hurdles to get past that as has been shown in previous efforts to tackle meta: ExposeBytes
vtable #437.
- By all means, I think a custom trait approach might work in the future for a
The full-fat custom use case would mostly cover - as far as I can tell - mixing Bytes
created and managed via multiple custom memory allocators (QuestDB would certainly care!), but I suspect there's a more elegant way to handle those anyway.
</opinionated>
.. that said, if you guys prefer that route I'm happy to amend my PR in that direction.
b5015b9
to
6271eff
Compare
Also, I want to thank @amunra for putting this PR together and moving the issue forward. I think this is a great/clever way to move I will try to do a closer review of the code tomorrow and we may want to consider other names than |
ddcffb3
to
d1aed1d
Compare
I shall address the latest feedback tomorrow. I would appreciate clarity around what the behaviour of |
Co-authored-by: Alice Ryhl <[email protected]>
Co-authored-by: Alice Ryhl <[email protected]>
Co-authored-by: Alice Ryhl <[email protected]>
Co-authored-by: Alice Ryhl <[email protected]>
Co-authored-by: Alice Ryhl <[email protected]>
Co-authored-by: Alice Ryhl <[email protected]>
Co-authored-by: Alice Ryhl <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
Thank you! |
Is there a plan to release a new version with this feature soon? This is exactly what I needed to avoid allocating twice when streaming |
It makes sense to make a new release at this point. If you'd like feel free to send a release PR. |
Introduces
Bytes::from_owner
, allowing the creation of a bytes object from anotherAsRef<[u8]>
object which is responsible for the ownership of the exposed byte slice.The core use case is to safely construct from mapped memory.
It should practically address the vast majority of use cases for wanting to expose the VTable as has often been requested in #437, without actually exposing the VTable.