Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite MCU I2C to support non blocking execution. #6674

Draft
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

nefelim4ag
Copy link
Contributor

@nefelim4ag nefelim4ag commented Aug 30, 2024

I'm not a hardcore embedded guy - I can have wrong assumptions or understanding here.

If I understood correctly.
For now, SPI/I2C is working in synchronous mode, which may shift other events in the scheduler queue.
I2C is slower in general because it is used with speeds: of 100kHz and 400kHz.
According to #6141, we can easily calculate timings, 5us per pulse, 18 pulses per byte + ack.
The average read request looks like 2 bytes write 6 byte read.
1 byte ~ 18 * 5 = 90 us
8 byte ~ 90 * 8 = 720 us

400kHz is 4x faster, so 22.5us and 180us with same load.
SPI is freaking fast, SW works as fast as pins go, HW looks like expected to run at 4Mhz.
So, 10 times faster, 1 pulse per bit ~ 0.25us.
0.25 * 8 = 2 us, scheduling per byte looks not so good here.

The toolhead moved at a speed of 200 mm/s, 40mm rotational distance and 32 micro steps should have 32us between steps: 1 / ((200 / 40) * 200 * 32)) ~ 31.25 us

Because i2c is blocking it will affect other timings occasionally.
It feels important to me to honor TSTEP timing for TMC because they are used for interpolation/thresholds (stealth/coolstep).
I cannot argue that those timings create interpolation position lag or that is a reason why switching stealth/spread modes is unstable.

Current implementation often blocks and waits for each byte (HW)/bit(SW) just wasting CPU time.
Those graphs when my machine is at a standstill:
Current implementation BME680 on host, SHT3x on mcu. *HW i2c decreased mcu load to ~5.6%
image
Current implementation with disabled i2c sensors.
image
Such a difference in load is unexpected for me. I can guess it is calculated from time drift.
Because the actual Awake time difference is in the expected range (less than 1%).

Patched async I2C (MCU software):
image
mcu hw i2c bus:
image
Host MCU pthread:
image


Current master SW: mcu_awake=0.006 mcu_task_avg=0.000013 mcu_task_stddev=0.000114
Async I2C SW: mcu: mcu_awake=0.001 mcu_task_avg=0.000001 mcu_task_stddev=0.000001


image
image

New I2C SW timings with default 100k.
It looks like 6-7 us per pulse originated to a safe way of switching.
#6141 (comment)

The transition between bytes is a little suboptimal.


image
With bit time correction at 100kHz.
I was able to drive the bus around 500kHz, it looks like there is a limit on the GPIO toggle speed.


The main question is it worth it?
I tried different approaches, but for now, this is something more or less simple.

*Current i2c_read/write works as before.

The high-level Idea is to buffer klippy i2c requests and execute them in the background.
There is a new infrastructure defined around it.
Because i2c_read is currently used in different places, there is the possibility of defining a callback for i2c_async.
There is the ability to change ldc1612 and mpu9250, to use it with the nonblocking API.

Correct nonblocking HW i2c is written for stm32f0_i2c.c
Others I didn't touch for now, but I can also rewrite them - the event loop is the same.
Software i2c is tricky, I tried to not overcomplicate things and still emulate the bit-banging approach in tmcuart.c
(I unintentionally reinvented it somehow, with function overrides :D)

For now, the event loop is stored in each HW i2p implementation because it is easier to overcome specific restrictions, like:

  • Linux HW i2s can only be made in one ioctl.
  • AVR can be slow and such a thing will not work, so there count of jumps can be reduced.

This is not ready for merge.
TODO:

  • Make i2c_modify_bits async. There is Write-Read-Modify-Write cycle here. - Done
  • Write HW for other MCUs.
  • i2c bus is shared, there is the possibility to send 2 async requests to the same bus - it will not work
  • If there are decisions to get rid of old synchronous code here. I have a bad understanding of time measurements for i2c bulk data and how it will cooperate with async approach.
  • Does it make sense to have io pthread on Host MCU? - that one reduce jitter
  • Commit mess should be fixed (I leave it for now as a history of thoughts).

MCU: Octopus Pro - stm32h723
Host: RPI 5

@nefelim4ag nefelim4ag force-pushed the i2c-async branch 5 times, most recently from 7f80ee0 to 062dfa1 Compare September 6, 2024 15:15
@nefelim4ag nefelim4ag marked this pull request as draft September 8, 2024 17:45
@nefelim4ag nefelim4ag changed the title Draft: Rewrite MCU I2C to support non blocking execution. Rewrite MCU I2C to support non blocking execution. Sep 8, 2024
@nefelim4ag nefelim4ag force-pushed the i2c-async branch 6 times, most recently from 3bf2470 to 10e509b Compare September 9, 2024 13:46
@KevinOConnor
Copy link
Collaborator

Interesting, thanks.

There seems to be a lot of work going on here. Out of curiosity, what is the underlying issue that is being addressed? For example, is there some hardware that needs high-speed access that the current code can't support?

Some random thoughts:

  1. The reason for the high "mcu load" graphs is due to variability in the time it takes to idle when i2c devices are used. That is, the "mcu load" statistic is based on the total time the mcu spends running tasks ( mcu_awake) and the variability in the timing to complete all pending tasks (mcu_task_stddev). When the host sends periodic i2c requests it results in high variability, as most of the time tasks complete in a few microseconds, but when an i2c command is pending it may take a millisecond or so. Variability is frequently a symptom of high load (which is why it is used in the load calculation), but it isn't inherently a problem. Tasks are low-priority and as long as the delays are under a millisecond or so, there shouldn't be a degradation in performance.
  2. The i2c_modify_bits command is only used in sx1509.py, and that code doesn't require it. Likely better to remove i2c_modify_bits than to spend lots of time reworking it.
  3. If you change the i2c_read command to be asynchronous then the host code needs to also change. The host code expects the results of i2c_read to always be received synchronously with its command ack. (Note the is_async flag in lookup_query_command.)

Cheers,
-Kevin

@nefelim4ag
Copy link
Contributor Author

nefelim4ag commented Sep 10, 2024

My main motivation here was precious step timings and a little dislike for while() loops.
The only serious reason here, is that all TMC features use timings and data between steps, and when scheduler lags behind "now" and trying to catch up - that can confuse driver functions.

TMC drivers don't have(2209) or have small hysteresis for TSTEP, instantaneous change of TSTEP can cause lags of interpolation, instantaneous enabling, and disabling of stealth chop & other speed-related functionality.
But this is actually hard to prove.
That's it, no user-visible issue here, other stuff is actually not so important I think (like graphs).

(Maybe this is the wrong approach to such a theoretical problem, and just computing and limiting minimal time between steps, based on machine speed limits can be enough...).

I2C is just slightly faster than USART, but still can take a measurable amount of time.

My changes does not speedup i2c, so i2c devices will behave as before in terms of speed.
That changes only tradeoff code memory/complexity for CPU free time & tail latency.

There are also SPI coprocessors, but as mentioned above, crazy fast, even if we will poll driver status every ms, it is unlikely it will affect anything.

  1. Understood.
  2. Agree. I just can't suggest that without your approval and I did not have such expander to test. But I can open PR with such changes to sx1509, np.
  3. You are right, but I already set is_async=True for i2c_read, to avoid issues with resending before responding and broken communication orders, this one just increased timeouts. Did I miss something else?

Thanks for your time.

This is completely fine if you say this is too much overcomplication, and I can just extract useful bits from here:

  • Correct end of SW I2C transfer on NACK.
  • SX1509 modification and depreciation of i2c_modify_bits().

@KevinOConnor
Copy link
Collaborator

Thanks.

I don't object to you (or others) reworking the i2c implementation. From my perspective, the Klipper i2c implementation is a "hack" - I added it to support some DAC chips that configured stepper current on old boards - and it has since grown to a number of unexpected configurations.

FWIW, one significant limitation of the current i2c implementation is that it calls shutdown() on minor errors. In retrospect, it would have been preferable for both i2c_read() and i2c_write() to always respond to requests with a status (success or failure) and then let the host figure out how to handle the problem.

My main motivation here was precious step timings

I'm confused by this statement. Stepper step pulses are generated from the irq handlers which are high priority events. They will preempt Klipper "tasks" (such as the i2c command handler) and thus i2c should not impact step timing.

I'm also not clear on the relationship between i2c and TSTEP.

You are right, but I already set is_async=True for i2c_read

Oh, sorry - I missed that in your changes.

Cheers,
-Kevin

@nefelim4ag
Copy link
Contributor Author

nefelim4ag commented Sep 11, 2024

Nice hack BTW :D

significant limitation of the current i2c implementation is that it calls shutdown() on minor errors.

Yes, I2C errors right now are a little annoying sometimes, in comparison to SPI/ADXL which just returns 0x00/0xff.
I will be happy to fix that.
(I just assumed before this was intended).

BTW, STM32 missing check for I2C_ISR_NACKF

Looking at SDIO,
technically we can do the same - define "error=%c" or "status=%c" field, define enum with "typical" errors.
Then just do raise i2c_error_X() (in bus.py) and allow handling of it in high level code.
I can do that in other PR.

Least we can do actually, in case of errors, is only retry on NACK.
high level stuff like reinitialization of device seems, strange.

I'm a little afraid that this will lead to a little storm of patches with little fixes for bad wiring.

I'm confused by this statement.

That is my misunderstanding then, I somewhat thought that timers run periodic tasks, and as a consequence, they also have the same priority. So command.c -> i2c_(read/write) & other stuff are effectively blocking everything else.

So, as steps are unaffected, now this more like competitive programming and now I'm not sure that it is worth it :
Because it will only make nice graphs, and free a very little CPU time.
is completely up to your, does it still has any value.

Thanks.

@KevinOConnor
Copy link
Collaborator

I somewhat thought that timers run periodic tasks, and as a consequence, they also have the same priority

Okay, that's not the case. Most of the Klipper mcu code is contained in timers or tasks, and timers run at a higher priority than tasks. It's okay to have lot of tasks and long running tasks because they do not cause increased latency for timers.

is completely up to your, does it still has any value.

As before, it's fine with me if someone wants to improve the i2c handling. It's not a performance issue (as far as I know), but I'm fine with incremental changes that make improvements.

Cheers,
-Kevin

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants