-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SVD register clusters generate inefficient machine code #46
Comments
I spent some time today prototyping the idea of a cluster type that allows normal array indexing, and I wasn't able to come to a solution that I felt satisfied with. The idea of using a zero-sized type to be able to create a reference that can be returned from the I was hoping we could implement One solution would be to represent all register types as a reference to a zero-sized type rather than a struct with a pointer. It would even be a completely source-compatible change, and it would look something like this: pub struct _Gpio;
pub type Gpio = &'static _Gpio; However, the big disadvantage to this is that it makes documentation much less clear: the I experimented with different ways of representing indexing using slices, or trying to take other ideas from the bitvec crate, but unfortunately the trick they use to pull off indexing doesn't work here: impl<T, O> Index<usize> for BitSlice<T, O>
where
T: BitStore,
O: BitOrder,
{
type Output = bool;
fn index(&self, index: usize) -> &Self::Output {
match *index.index(self) {
true => &true,
false => &false,
}
}
} They're working with bits that only have two possible states, so they can store both states as constants in static memory and return references to them. Whereas indexing a register cluster can return thousands of possible values. Any approach using slices to represent the strides/sizes of registers is likely unsound because Rust currently requires any reference to a non-zero-sized type to be dereferenceable at all times (i.e. the compiler can insert spurious dereferences); thus, it is unsound to create a non-zero-sized references with an address in the MMIO register space. (See rust-lang/unsafe-code-guidelines#411 for more information on this.) So, I feel like the best way forward is the second option I originally proposed, ( |
My proposal is more exactly like that cluster[i].register_name().read() to read the register the register API will be implemented on the reference to cluster
pub struct _Gpio;
pub type Gpio = &'static _Gpio;
I don't find a too much complex type compared to other solutions as @andreasWallnerIFX what do you think ? If we are not able to come an agreement. Maybe we can prototype both solution to verify if there are some other technical issue and then present these prototype to more users. |
Is your feature request related to a problem? Please describe.
svd2pac generates accessor functions for register clusters that return an array of all the registers within the cluster:
Now consider a function that accesses a register:
If this function is inlined and the compiler is able to statically tell which index in the port cluster will be accessed, it can optimize out the array and issue a write directly to the appropriate port register. But if the compiler decides not to inline the function for code size reasons, or there's some other reason it does not statically know what port will be accessed, then the generated code is extremely poor: the program constructs the entire array, one element at a time, so that it can do the array index lookup to find the register to write: https://rust.godbolt.org/z/Pa31oW4PT
The SVD file I'm working with has some register clusters that can have up to 1023 registers, and a single write to one of these registers generates 7.8kB of code!
Describe the solution you'd like
I would like to be able to write the
write_to_port
function in a way that generates simple pointer arithmetic instead of creating the entire array only to access a single element.I experimented with constructing the array in different ways, but it seems LLVM is not able to optimize out the array entirely when the index of the element to access is not statically known. Thus, it's not possible to ensure efficient code generation without changing the function signature to return something other than an array.
I think the simplest thing to do is to change the accessor function (or add a new function) to take the index as a parameter: https://rust.godbolt.org/z/c3ePWTxYq
This has the disadvantage of not exposing the length and iterator methods of an array. This is not necessary (or even desirable) for PSOC, because our SVD files define clusters with the maximum possible number of registers that the MMIO interface supports, rather than the actual amount that exist on the hardware. But if that's useful functionality for other applications, it could be implemented by returning an accessor object that provides these methods: https://rust.godbolt.org/z/4496oxqoh
I'm happy to send a PR if either of these options are acceptable.
Describe alternatives you've considered
The code size bloat could be mitigated somewhat by using
core::array::from_fn
to initialize the array: https://rust.godbolt.org/z/f6vzW9YoaThis generates a loop to initialize the register array, instead of forcing it to be unrolled. But it's far from optimal as it still creates the entire array, computing the address of every register in the cluster instead of just the one that's needed.
The text was updated successfully, but these errors were encountered: