Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal for overhaul of buffer representation #251

Open
ollie-etl opened this issue Feb 27, 2023 · 12 comments
Open

Proposal for overhaul of buffer representation #251

ollie-etl opened this issue Feb 27, 2023 · 12 comments
Labels

Comments

@ollie-etl
Copy link
Contributor

ollie-etl commented Feb 27, 2023

This follows on from #161 #216 at least, but lays out a concrete proposal and rational. This proposal would represent a major API change.

I am proposing unifying all buffers with a single, non-parameterized buffer type, similar to owned-buf. This proposed buffer has the following properties. For the sake of this Proposal, I'll refer to it as Buffer.

Properties of Buffer

  1. Concrete
  2. Non parameterized
  3. Support multiple underlying memory regions. That is, the data represented by a Buffer may be backed by multiple, non contiguos memory regions. This is inspired by the trait bytes::buf.
  4. Internally maintains it's representation of memory regions as contiguous iovec
  5. Allow bytes_init to shrink, making it a semantically a bytes_available count

Motivation

Many of the function in the API in tokio-uring either allocate, or require allocating a Vec externally when operating over vectored or multipart memory regions. Specifically readv, writev, recvmsg and sendmsg variants. This is a shame, because these calls tend to be called on when trying to optimise performance (in my experience). The allocation occurs because the underlying uring ops require a contiguous iovec to represent the operation.

Candidate design

/// The unified owned buffer type
///
/// Note: There is a case for employing SSO for the Vec's here, to account for the very common case of a single memory region.
/// This can be achieved without exceeding cacheline storage for Buffer
///
/// ```
/// assert!(std::mem::size_of<Buffer>() <= 64);
/// ```
pub struct Buffer {
   /// Holds the memory regions represented by the current buffer
   /// Because this is held in contiguous memory, it can be used directly in 
   /// iov_base: holds the pointer to the start of the region
   /// iov_len: holds the current length of each segment.
   iovec: Vec<libc::iovec>,
   /// Metadata associated with each buffer region
   /// state in AoS (Array of structures) form.
   /// Chosen to keep `Buffer` within a single cache line on 64 B cache width systems
   state: Vec<BufferState>,
}

/// The record of state for each memory region.
pub(crate) struct BufferState {
    /// The maximum size of this memory region.
    total_bytes: usize,
    /// Called to drop the memory
    dst: unsafe fn(libc::iovec, usize),
}

impl Drop for Buffer{
    fn drop(&mut self) {
        let Self{iovecs, state} = self;
        for i in 0..iovecs.len() {
            unsafe {state[i].dst(iovecs[i], state[i].total_bytes)}
        }
    }
}

impl From<Vec<u8>> for Buffer {
    fn from(buf: Vec<u8>) -> Self {
        let (base, iov_len, total_bytes) = buf.into_raw_parts();
        let iov = libc::iovec{
            iov_base: base as _,
            iov_len,
        };
        let state = BufferState::new(total_bytes, drop_buffer_vec);
        Buffer::new(vec![iov], vec![state])
    }
}

unsafe fn drop_buffer_vec(iovec: libc::iovec, total_bytes: usize){
    Vec::from_raw_parts(iovec.iov_base as _, iovec.iov_len, total_bytes);
}

Notes

  • The destructor is, obviously, completely horrible, but allows us the type erasure.
  • It may be that BufferState should hold some additional meta data, to cope with more generalized backing types. If this is required, I'd expect it to be held as a * c_void, to maintain type erasure. the destructor would take this as a parameter. I don't yet know if this is required, so leave it out the sketch

Impact on API

This is a bit hand-wavey, because I haven't given all aspects due consideration. However, it does seem that some rationalization becomes available. Want to write data?

   let buf = Vec![0,1,2,3].into();
   fd.write(buf);

Want to write vectored data?

   let mut buf = vec![0,1,2,3].into();
   let buf.append(vec![4,5,6,7].into());
   fd.write(buf);

Request for feedback

@Noah-Kennedy @carllerche @FrankReh @mzabaluev @redbaron @nrc @rrichardson

I'm prepared to flesh this out more into a PR, provided I can get an indication this is not dead in the water before I start. This would represent a significant time investment, which I'm unwilling to commit to without initial feedback.

@ollie-etl
Copy link
Contributor Author

Something which does strike me with this design is the similarity to the existing registries. I strongly suspect these could be unified too, probably by parameterising a Rawbuffer over the held state.

   type Buffer = Rawbuffer<BufferState>;
   type Registry = Rawbuffer<RegistryState>;

@FrankReh
Copy link
Collaborator

FrankReh commented Feb 27, 2023

If this works, this is a great idea.

If you didn't provide a straw man proposal for a FixedBuf, is that because you don't know yet or because it's obvious? I would want the BufX I have in my BufRing code to also use it of course.

In the frequent case of there being a single allocation being the backing store, do you see the ref and the ref_mut being as efficient as before? Perhaps with the exception of a test added, to the accessor itself (ref or ref_mut), that the iovec length is 1?

What would it do to the API if the buffer were made of multiple backing stores though. Doesn't it mean the ref and ref_mut can't be asked to return a &[u8] or &mut[u8]?

Maybe an approach would be those would panic if used on a multiple-backing-stored buffer. And there would be test (aka try) functions that returned an Option<&[u8]>. But it would be up to the user of the library to either know they are only using single-backing-stored buffers so they can safely use ref and ref_mut, or they have to be safe and use something like try_ref and try_ref_mut? Kind of like users have to know if dividing by their variable is safe or they should be adding their own test for zero.

Would it subsume the slice struct too or would that be a second concrete type?

@FrankReh
Copy link
Collaborator

Something which does strike me with this design is the similarity to the existing registries. I strongly suspect these could be unified too, probably by parameterising a Rawbuffer over the held state.

   type Buffer = Rawbuffer<BufferState>;
   type Registry = Rawbuffer<RegistryState>;

I don't follow. I thought you're thinking a concrete, non-parameterized type could do it all?

@ollie-etl
Copy link
Contributor Author

ollie-etl commented Feb 27, 2023

Something which does strike me with this design is the similarity to the existing registries. I strongly suspect these could be unified too, probably by parameterising a Rawbuffer over the held state.

   type Buffer = Rawbuffer<BufferState>;
   type Registry = Rawbuffer<RegistryState>;

I don't follow. I thought you're thinking a concrete, non-parameterized type could do it all?

Yes, badly phrased by me: I can unify the buffer types without parameterisation. It may be possible tounify both Buffer and BufferRegistries by parameterising the underlying state - given a buffer can now represent multiple memory regions, and a Registry holds multiple memory regions, this is perhaps unsurprising.

@ollie-etl
Copy link
Contributor Author

ollie-etl commented Feb 27, 2023

If you didn't provide a straw man proposal for a FixedBuf, is that because you don't know yet or because it's obvious? I would want the BufX I have in my BufRing code to also use it of course.

I've not yet fleshed this out. It is certinaly possible, and highly desireable, the question is only do I need some additional meta data in the state to achieve it: i.e, on drop, is some additional info required.

@ollie-etl
Copy link
Contributor Author

What would it do to the API if the buffer were made of multiple backing stores though. Doesn't it mean the ref and ref_mut can't be asked to return a &[u8] or &mut[u8]?

Looking at bytes::buf for guidance here, there is a api call called chunks, which returns regions which are contiguos. I'd expect to also be able to return an iterator over such regions

@FrankReh
Copy link
Collaborator

If you didn't provide a straw man proposal for a FixedBuf, is that because you don't know yet or because it's obvious? I would want the BufX I have in my BufRing code to also use it of course.

I've not yet fleshed this out. It is certinaly possible, the question is only do I need some additional meta data in the state to achieve it: i.e, on drop, is some additional info required.

I would hope other meta data would not be needed.

I've wondered why something like a buf id was needed in the first place. Couldn't we say our registries have to be built from a contiguous block of memory so the buffer address being returned in the destructor would be enough to identifier the portion of the region that was being returned to the registry?

I know in my own case of using a buf id in the BufRing implementation, it felt wrong. But I was just following what had already been established in the crate. But if we're talking about a better implementation for all, I would heavily lean towards requiring the registry to be able to use the buffer address itself as the identifier - even if it means we can't use a vec of vecs as a registry any more.

@FrankReh
Copy link
Collaborator

I don't love the idea that every buffer has two allocations before even talking about the backing store buffer.

And why is the state vec containing various destructors. Do we lose flexibility by saying there is one destructor for the entire buffer, and it has to know how to deal with the iovec parts?

I hesitate to mention this, but would it be a reason to bring a tinyvec into the discussion so the most normal case of a single backing store is handled with no extra allocations?

@ollie-etl
Copy link
Contributor Author

ollie-etl commented Feb 27, 2023

I hesitate to mention this, but would it be a reason to bring a tinyvec into the discussion so the most normal case of a single backing store is handled with no extra allocations?

Yes: you'll see I made mention to the case for SSO. I'd have thought this was a very good optimization. We have 16 bytes left on the cache line, so it doesn't even need to rely on nonnull optimizations, although it could.

@nrc
Copy link

nrc commented Mar 3, 2023

Support multiple underlying memory regions. That is, the data represented by a Buffer may be backed by multiple, non contiguos memory regions. This is inspired by the trait bytes::buf.

I'm curious if there are known use cases or concrete motivations for this (as well as just following bytes::Buf)?

Is the use of iovec to support vectored IO?

@ollie-etl
Copy link
Contributor Author

ollie-etl commented Mar 3, 2023

Is the use of iovec to support vectored IO?

Yes - motivated by writev, readv, sendmsg and readmsg

@ileixe
Copy link

ileixe commented Sep 11, 2024

Hi @ollie-etl . Sorry for very late mentioning you, but I found this request is very reasonable.

We're having somehow different motivation (sum type is required to abstract two different API signature: readv(bufs: Vec<X>) and read(buf: X) but anyway this request is the exact one that we're looking for.

Did you make any try for this change?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants