Rewrite MCU I2C to support non blocking execution. #6674

nefelim4ag · 2024-08-30T19:32:18Z

I'm not a hardcore embedded guy - I can have wrong assumptions or understanding here.

If I understood correctly.
For now, SPI/I2C is working in synchronous mode, which may shift other events in the scheduler queue.
I2C is slower in general because it is used with speeds: of 100kHz and 400kHz.
According to #6141, we can easily calculate timings, 5us per pulse, 18 pulses per byte + ack.
The average read request looks like 2 bytes write 6 byte read.
1 byte ~ 18 * 5 = 90 us
8 byte ~ 90 * 8 = 720 us

400kHz is 4x faster, so 22.5us and 180us with same load.
SPI is freaking fast, SW works as fast as pins go, HW looks like expected to run at 4Mhz.
So, 10 times faster, 1 pulse per bit ~ 0.25us.
0.25 * 8 = 2 us, scheduling per byte looks not so good here.

The toolhead moved at a speed of 200 mm/s, 40mm rotational distance and 32 micro steps should have 32us between steps: 1 / ((200 / 40) * 200 * 32)) ~ 31.25 us

Because i2c is blocking it will affect other timings occasionally.
It feels important to me to honor TSTEP timing for TMC because they are used for interpolation/thresholds (stealth/coolstep).
I cannot argue that those timings create interpolation position lag or that is a reason why switching stealth/spread modes is unstable.

Current implementation often blocks and waits for each byte (HW)/bit(SW) just wasting CPU time.
Those graphs when my machine is at a standstill:
Current implementation BME680 on host, SHT3x on mcu. *HW i2c decreased mcu load to ~5.6%

Current implementation with disabled i2c sensors.

Such a difference in load is unexpected for me. I can guess it is calculated from time drift.
Because the actual Awake time difference is in the expected range (less than 1%).

Patched async I2C (MCU software):

mcu hw i2c bus:

Host MCU pthread:

Current master SW: mcu_awake=0.006 mcu_task_avg=0.000013 mcu_task_stddev=0.000114
Async I2C SW: mcu: mcu_awake=0.001 mcu_task_avg=0.000001 mcu_task_stddev=0.000001

New I2C SW timings with default 100k.
It looks like 6-7 us per pulse originated to a safe way of switching.
#6141 (comment)

The transition between bytes is a little suboptimal.

With bit time correction at 100kHz.
I was able to drive the bus around 500kHz, it looks like there is a limit on the GPIO toggle speed.

The main question is it worth it?
I tried different approaches, but for now, this is something more or less simple.

*Current i2c_read/write works as before.

The high-level Idea is to buffer klippy i2c requests and execute them in the background.
There is a new infrastructure defined around it.
Because i2c_read is currently used in different places, there is the possibility of defining a callback for i2c_async.
There is the ability to change ldc1612 and mpu9250, to use it with the nonblocking API.

Correct nonblocking HW i2c is written for stm32f0_i2c.c
Others I didn't touch for now, but I can also rewrite them - the event loop is the same.
Software i2c is tricky, I tried to not overcomplicate things and still emulate the bit-banging approach in tmcuart.c
(I unintentionally reinvented it somehow, with function overrides :D)

For now, the event loop is stored in each HW i2p implementation because it is easier to overcome specific restrictions, like:

Linux HW i2s can only be made in one ioctl.
AVR can be slow and such a thing will not work, so there count of jumps can be reduced.

This is not ready for merge.
TODO:

~~Make i2c_modify_bits async. There is Write-Read-Modify-Write cycle here.~~ - Done
Write HW for other MCUs.
i2c bus is shared, there is the possibility to send 2 async requests to the same bus - it will not work
If there are decisions to get rid of old synchronous code here. I have a bad understanding of time measurements for i2c bulk data and how it will cooperate with async approach.
~~Does it make sense to have io pthread on Host MCU?~~ - that one reduce jitter
~~Commit mess should be fixed (I leave it for now as a history of thoughts).~~

MCU: Octopus Pro - stm32h723
Host: RPI 5

KevinOConnor · 2024-09-10T01:22:29Z

Interesting, thanks.

There seems to be a lot of work going on here. Out of curiosity, what is the underlying issue that is being addressed? For example, is there some hardware that needs high-speed access that the current code can't support?

Some random thoughts:

The reason for the high "mcu load" graphs is due to variability in the time it takes to idle when i2c devices are used. That is, the "mcu load" statistic is based on the total time the mcu spends running tasks ( mcu_awake) and the variability in the timing to complete all pending tasks (mcu_task_stddev). When the host sends periodic i2c requests it results in high variability, as most of the time tasks complete in a few microseconds, but when an i2c command is pending it may take a millisecond or so. Variability is frequently a symptom of high load (which is why it is used in the load calculation), but it isn't inherently a problem. Tasks are low-priority and as long as the delays are under a millisecond or so, there shouldn't be a degradation in performance.
The i2c_modify_bits command is only used in sx1509.py, and that code doesn't require it. Likely better to remove i2c_modify_bits than to spend lots of time reworking it.
If you change the i2c_read command to be asynchronous then the host code needs to also change. The host code expects the results of i2c_read to always be received synchronously with its command ack. (Note the is_async flag in lookup_query_command.)

Cheers,
-Kevin

nefelim4ag · 2024-09-10T14:50:03Z

My main motivation here was precious step timings and a little dislike for while() loops.
The only serious reason here, is that all TMC features use timings and data between steps, and when scheduler lags behind "now" and trying to catch up - that can confuse driver functions.

TMC drivers don't have(2209) or have small hysteresis for TSTEP, instantaneous change of TSTEP can cause lags of interpolation, instantaneous enabling, and disabling of stealth chop & other speed-related functionality.
But this is actually hard to prove.
That's it, no user-visible issue here, other stuff is actually not so important I think (like graphs).

(Maybe this is the wrong approach to such a theoretical problem, and just computing and limiting minimal time between steps, based on machine speed limits can be enough...).

I2C is just slightly faster than USART, but still can take a measurable amount of time.

My changes does not speedup i2c, so i2c devices will behave as before in terms of speed.
That changes only tradeoff code memory/complexity for CPU free time & tail latency.

There are also SPI coprocessors, but as mentioned above, crazy fast, even if we will poll driver status every ms, it is unlikely it will affect anything.

Understood.
Agree. I just can't suggest that without your approval and I did not have such expander to test. But I can open PR with such changes to sx1509, np.
You are right, but I already set is_async=True for i2c_read, to avoid issues with resending before responding and broken communication orders, this one just increased timeouts. Did I miss something else?

Thanks for your time.

This is completely fine if you say this is too much overcomplication, and I can just extract useful bits from here:

Correct end of SW I2C transfer on NACK.
SX1509 modification and depreciation of i2c_modify_bits().

KevinOConnor · 2024-09-11T20:45:27Z

Thanks.

I don't object to you (or others) reworking the i2c implementation. From my perspective, the Klipper i2c implementation is a "hack" - I added it to support some DAC chips that configured stepper current on old boards - and it has since grown to a number of unexpected configurations.

FWIW, one significant limitation of the current i2c implementation is that it calls shutdown() on minor errors. In retrospect, it would have been preferable for both i2c_read() and i2c_write() to always respond to requests with a status (success or failure) and then let the host figure out how to handle the problem.

My main motivation here was precious step timings

I'm confused by this statement. Stepper step pulses are generated from the irq handlers which are high priority events. They will preempt Klipper "tasks" (such as the i2c command handler) and thus i2c should not impact step timing.

I'm also not clear on the relationship between i2c and TSTEP.

You are right, but I already set is_async=True for i2c_read

Oh, sorry - I missed that in your changes.

Cheers,
-Kevin

nefelim4ag · 2024-09-11T22:43:13Z

Nice hack BTW :D

significant limitation of the current i2c implementation is that it calls shutdown() on minor errors.

Yes, I2C errors right now are a little annoying sometimes, in comparison to SPI/ADXL which just returns 0x00/0xff.
I will be happy to fix that.
(I just assumed before this was intended).

BTW, STM32 missing check for I2C_ISR_NACKF

Looking at SDIO,
technically we can do the same - define "error=%c" or "status=%c" field, define enum with "typical" errors.
Then just do raise i2c_error_X() (in bus.py) and allow handling of it in high level code.
I can do that in other PR.

Least we can do actually, in case of errors, is only retry on NACK.
high level stuff like reinitialization of device seems, strange.

I'm a little afraid that this will lead to a little storm of patches with little fixes for bad wiring.

I'm confused by this statement.

That is my misunderstanding then, I somewhat thought that timers run periodic tasks, and as a consequence, they also have the same priority. So command.c -> i2c_(read/write) & other stuff are effectively blocking everything else.

So, as steps are unaffected, now this more like competitive programming and now I'm not sure that it is worth it :
Because it will only make nice graphs, and free a very little CPU time.
is completely up to your, does it still has any value.

Thanks.

KevinOConnor · 2024-09-13T01:40:01Z

I somewhat thought that timers run periodic tasks, and as a consequence, they also have the same priority

Okay, that's not the case. Most of the Klipper mcu code is contained in timers or tasks, and timers run at a higher priority than tasks. It's okay to have lot of tasks and long running tasks because they do not cause increased latency for timers.

is completely up to your, does it still has any value.

As before, it's fine with me if someone wants to improve the i2c handling. It's not a performance issue (as far as I know), but I'm fine with incremental changes that make improvements.

Cheers,
-Kevin

Signed-off-by: Timofey Titovets <[email protected]>

nefelim4ag force-pushed the i2c-async branch 5 times, most recently from 7f80ee0 to 062dfa1 Compare September 6, 2024 15:15

nefelim4ag marked this pull request as draft September 8, 2024 17:45

nefelim4ag changed the title ~~Draft: Rewrite MCU I2C to support non blocking execution.~~ Rewrite MCU I2C to support non blocking execution. Sep 8, 2024

nefelim4ag force-pushed the i2c-async branch 6 times, most recently from 3bf2470 to 10e509b Compare September 9, 2024 13:46

nefelim4ag mentioned this pull request Sep 10, 2024

SX1509: drop i2c modify bits #6684

Open

nefelim4ag mentioned this pull request Sep 12, 2024

STM32: Check for NACK #6687

Open

nefelim4ag added 3 commits September 13, 2024 03:57

bus.py/i2c: declare as async

4de9438

Signed-off-by: Timofey Titovets <[email protected]>

i2ccmd: migrate to async execution

05ca30e

Signed-off-by: Timofey Titovets <[email protected]>

stm32f0_i2c: implement async i2c

a6c955b

Signed-off-by: Timofey Titovets <[email protected]>

nefelim4ag force-pushed the i2c-async branch from 10e509b to 6972689 Compare September 13, 2024 01:57

nefelim4ag added 4 commits September 13, 2024 04:49

i2c_software: rewrite to async bit-banging

0b36446

Signed-off-by: Timofey Titovets <[email protected]>

rp2040/i2c: implement async i2c

26f103e

Signed-off-by: Timofey Titovets <[email protected]>

avr/i2c: implement async i2c

6d698e1

Signed-off-by: Timofey Titovets <[email protected]>

linux/i2c: implement i2c async + pthread

072eaa4

Signed-off-by: Timofey Titovets <[email protected]>

nefelim4ag force-pushed the i2c-async branch from 6972689 to 072eaa4 Compare September 13, 2024 02:50

nefelim4ag mentioned this pull request Sep 14, 2024

I2C Error handling #6689

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rewrite MCU I2C to support non blocking execution. #6674

Rewrite MCU I2C to support non blocking execution. #6674

nefelim4ag commented Aug 30, 2024 •

edited

Loading

KevinOConnor commented Sep 10, 2024

nefelim4ag commented Sep 10, 2024 •

edited

Loading

KevinOConnor commented Sep 11, 2024

nefelim4ag commented Sep 11, 2024 •

edited

Loading

KevinOConnor commented Sep 13, 2024

Rewrite MCU I2C to support non blocking execution. #6674

Are you sure you want to change the base?

Rewrite MCU I2C to support non blocking execution. #6674

Conversation

nefelim4ag commented Aug 30, 2024 • edited Loading

KevinOConnor commented Sep 10, 2024

nefelim4ag commented Sep 10, 2024 • edited Loading

KevinOConnor commented Sep 11, 2024

nefelim4ag commented Sep 11, 2024 • edited Loading

KevinOConnor commented Sep 13, 2024

nefelim4ag commented Aug 30, 2024 •

edited

Loading

nefelim4ag commented Sep 10, 2024 •

edited

Loading

nefelim4ag commented Sep 11, 2024 •

edited

Loading