-
Notifications
You must be signed in to change notification settings - Fork 501
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reconsider the IMedia design to support low-copy/zero-copy rx/tx operation #352
Comments
Thinking aloud. At the moment, we're focusing only on transmission. For reception, see the LibCyphal design document, section "Subscriber". In line with what we discussed on the forum a while back, we could make IMedia provide a memory resource for serving dynamic memory for message objects: virtual std::pmr::memory_resource& getMemoryResource() = 0; Then, as also covered on the forum, we expose some easy-to-use handle at the top layer of the library that the client can use to allocate messages to be transmitted; that handle would (through indirection perhaps) eventually invoke the above pure virtual method implemented by the media layer. The simplest possible implementation would simply return something like the new_delete_resource, as covered by Scott on the forum; a more advanced one could leverage specific memory regions that are DMA-reachable or are otherwise advantageous for storing the data to be transmitted. Nunavut would then serialize the data such that the serialization buffer is allocated from the same memory resource in one or more fragments; if the message contains sections that happen to contain data in a wire-compatible format (little-endian, correct padding, IEEE 754, etc.), such sections will not be serialized but instead a view of them will be emitted as part of the output of the serialization routine; when data requires conversion it will be copied into a new temporary buffer from the same PMR. At the output we get some variation of The lizard will then push into its TX queue a sequence of TX items, where each item is an object referencing an ordered list of memory views pointing into the original list of fragments supplied by Nunavut. The lizard can also allocate additional memory for the payload from the same memory resource in order to inject protocol headers into the output TX queue; for libudpard this would mean the UDP frame headers and also the transfer CRC; for libcanard this includes the transfer CRC, padding, and the tail bytes. At this stage, we get an ordered list of TX queue items, where each item contains:
The transport implementation would then be responsible for deallocating the memory fragments as they are consumed by the media layer, finally deallocating the original message object when no references to it are left. This sounds convoluted but I think it should be possible to approach this sensibly by introducing some notion of reference counting on memory fragments; this is out of the scope right now. One major issue with this direct approach is that it doesn't easily allow splitting outgoing transfers across multiple network interfaces, as the allocator is tied to a specific media instance. We could consider two options:
At the moment, the second option appears more compelling, so it will be pursued moving forward. |
Before I can comment on fully, do we intend on having a hierarchy of redundancy where the top layer is managed by the transport layer across multiple media devices (e.g. when using CAN and UDP as redundant channels) and the next layer is managed within each media layer device (e.g. when using multiple CAN peripherals) but looks like a single device to the transport layer? |
No, per the design intention, one A lizard can manage an arbitrary number of redundant media instances. You may recall that we discussed in the past that there is some room for improvement regarding how it's currently done, but even when the improvements are in place, the fundamental model will stay the same: one IMedia = one NIC. As any given lizard implements only a particular Cyphal transport protocol, heterogeneous redundancy requires a higher-level aggregation where the specifics of a given transport protocol are abstracted away; we will be approaching this like in PyCyphal -- with the help of the |
My concern is, without serializing a message directly into an output buffer, the user is doomed to perform full deep copying |
If we are serializing into an output buffer then the application doesn't need to transfer ownership of the object-representation. The memory we are obtaining from IMedia for transmission would be distinct from the memory we obtained for reception where we de-serialized the data into object form*. * This thread is about transmission but when we get to reception we should talk about lazy deserialization as a feature |
This is true, but observe how the reference counting across several lizards required by this approach can potentially turn into a formidable can of worms 🪱
If we apply the low-copy approach across the stack, then the output of Nunavut-generated serialization routines may contain references to the original message object (remember the example with imagery data?), from which follows:
One way to achieve both is to allocate the message object from that DMA-compatible region and then hand it over to the transport layer upon transmission to let it dispose of it when it is no longer needed. I am not saying these are insurmountable issues. They are quite manageable. We just need to choose between |
Okay. I suppose your preferred solution is adequate. It may be that libcyphal will always be a bit less optimized then we'd like for MCUs. I've been wondering if we should start a "cyphal-ard" project which is a minimal application-layer in C where such close-to-the-metal integrations would be more plausible? |
We could entertain that thought just to see if there are sensible solutions to the problem of zero-copy transmission over heterogeneously redundant interfaces, and then try and transfer that back to LibCyphal. Suppose you started a new C library from scratch and want to send zero-copy messages over CAN and UDP. What would be different compared to where we are now? |
Here's a full design based on the second option from the second comment in this thread. We extend virtual std::pmr::memory_resource& getTxMemoryResource() = 0;
virtual std::pmr::memory_resource& getRxMemoryResource() = 0; The first one will be used for the allocation of TX frame payload buffers and ancillary data structures by the lizard. For example, LibUDPard allocates The second one will be used for the allocation of RX frame payload buffers by the IMedia, and their deallocation upon consumption by the lizard or the client. For example, LibUDPard takes ownership of the buffer memory via The TX memory resource (MR) will be used to allocate memory for the lizard when it needs to enqueue a new TX item. If that item never makes it to the IMedia (for example, if it times out or the transmission is canceled for other reasons like running out of queue space or memory), the memory is freed using the same MR. If the item actually makes it to IMedia, the The RX memory resource may map to a DMA-addressable or otherwise RX-optimized memory region. If it does, it may offer benefits to the media implementation, allowing it to forward data received from the hardware to the higher layers very efficiently. The lizard and the client are oblivious to that but it should be noted that the media has no control over how long the lizard/client will keep using the memory as it will typically make it all the way up to Nunavut deserializer, and then possibly even to the application shall Nunavut be able and choose to keep references to the memory instead of copying it during deserialization. Note that the RX memory resource will only be used for deallocation but never for allocation. In LibUDPard this is expressed through the type system via a special kind of memory resource called This design should be implemented both in |
For now, LibCyphal must require that all It is relatively easy to extend the lizards such that they maintain a dedicated memory resource per redundant interface. When that is done, the above requirement that all LibUDPardThe TX pipeline is already managing a dedicated MR per redundant transport; see The RX pipeline will require adjustments:
LibCANard
|
#343 (comment)
The text was updated successfully, but these errors were encountered: