Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I2C Error handling #6689

Merged
merged 9 commits into from
Oct 27, 2024
Merged

Conversation

nefelim4ag
Copy link
Contributor

@nefelim4ag nefelim4ag commented Sep 14, 2024

As mentioned: #6674 (comment)
That would be nice to handle I2C errors, instead of the current shutdown approach.

There is a refactoring part to make things more obvious, where is the dev and where is the bus.

Errors handling works, but:

  • Some MCUs do not clearly state what happens underneath.
  • There should be a better way to define & pass errors to Klippy. I think it would be possible to reuse static_stings_id for that.
  • When klippy inits and I2C returns an error, it is an unhandled exception, so it only appears in logs. There should be a fix actually to show it to frontends as before with shutdown msg.
  • Missing handling for ldc1612 - used in homing sequence, there is no reason or way to handle errors.
  • mpu9250 - used as an accelerometer, it looks safe to ignore anything, as with SPI sensors.
  • Most sensors for now do something like:
def _sample():
        try:
            <code here>
        except Exception:
            logging.exception(...)
            self.temp = self.humidity = .0
            return self.reactor.NEVER

So, they stop updating data, if there is no heater for them, the machine will continue running.


About the code itself:

  • Refactoring is done for clarity, it can be dropped or moved to separate PR.
    (I still somewhat think about non-blocking i2c, it feels right to store queue/buf at bus level),
  • int ret looks a little dirty to me, maybe it is a bad approach here, I will be glad to receive any feedback.
    (maybe with timers there is a simple way with callbacks, cause we don't need to do something with stack).

Thanks.

@nefelim4ag nefelim4ag marked this pull request as draft September 14, 2024 20:24
@nefelim4ag nefelim4ag force-pushed the i2c-error-handling branch 8 times, most recently from 4488c11 to 4648028 Compare September 18, 2024 18:39
@nefelim4ag nefelim4ag force-pushed the i2c-error-handling branch 2 times, most recently from e36f675 to ec33a81 Compare September 28, 2024 17:53
@nefelim4ag
Copy link
Contributor Author

I think this is more or less ready to gather feedback, so I will remove the "draft" label.
It feels stable, but it looks like initialization is now a little bit slower, because i2c write now is "synchronous".

@nefelim4ag nefelim4ag marked this pull request as ready for review September 28, 2024 18:02
Copy link

Thank you for your contribution to Klipper. Unfortunately, a reviewer has not assigned themselves to this GitHub Pull Request. All Pull Requests are reviewed before merging, and a reviewer will need to volunteer. Further information is available at: https://www.klipper3d.org/CONTRIBUTING.html

There are some steps that you can take now:

  1. Perform a self-review of your Pull Request by following the steps at: https://www.klipper3d.org/CONTRIBUTING.html#what-to-expect-in-a-review
    If you have completed a self-review, be sure to state the results of that self-review explicitly in the Pull Request comments. A reviewer is more likely to participate if the bulk of a review has already been completed.
  2. Consider opening a topic on the Klipper Discourse server to discuss this work. The Discourse server is a good place to discuss development ideas and to engage users interested in testing. Reviewers are more likely to prioritize Pull Requests with an active community of users.
  3. Consider helping out reviewers by reviewing other Klipper Pull Requests. Taking the time to perform a careful and detailed review of others work is appreciated. Regular contributors are more likely to prioritize the contributions of other regular contributors.

Unfortunately, if a reviewer does not assign themselves to this GitHub Pull Request then it will be automatically closed. If this happens, then it is a good idea to move further discussion to the Klipper Discourse server. Reviewers can reach out on that forum to let you know if they are interested and when they are available.

Best regards,
~ Your friendly GitIssueBot

PS: I'm just an automated script, not a human being.

@KevinOConnor
Copy link
Collaborator

What is the status of this PR? It looks like it is not passing the build tests.

In general the idea seems useful - thanks for working on it. I haven't done a full review but I have some initial comments:

  • This change would be notable and involve a risk of introducing a regression. I think we'll need to be careful about the rollout of changes so that we are prepared to rollback parts of the change if needed. So, I think any changes to the mcu code needs to be one commit per architecture. (Boiler plate function parameter changes could be done across all archs in a single commit, but anything that could change an arch's interaction with the hardware shouldn't be in the same commit with work on other architectures.) So, maybe consider changing the host/mcu i2c interface in one set of commits, then allow all arch's to return an error code (which always return success) in a commit, and then one-by-one convert each arch to return an error instead of shutdown in a series of commits.
  • If we want the mcu to report i2c errors to the host then I think we want it to always report the status. I don't think there should be a check=0 type of command parameter. If there is a concern with the round-trip latency when issuing many commands then I think we can change the host to send many commands at once. I can help with this if you want.
  • The python exception handling stuff can get confusing quickly. I'd avoid custom exception objects. I think the reality is that code is going to need error free communication or not care about any errors. I suspect the bus.py code should probably, therefore, handle the errors locally.

Thanks again,
-Kevin

@nefelim4ag
Copy link
Contributor Author

nefelim4ag commented Oct 20, 2024

I think it does not pass tests because I changed some mcu commands and the current dictionary does not fit.

  • Understandable, agree. I will split patches.

  • I made this because I'm not fully understand constructions like:

          if self.i2c_write_cmd is None:
              # Send setup message via mcu initialization
              data_msg = "".join(["%02x" % (x,) for x in data])
              self.mcu.add_config_cmd("i2c_write oid=%d check=%c data=%s" % (
                  self.oid, 0x0, data_msg), is_init=True)
              return
    

    So, I cannot simply rework it to query alike.
    This allows me to avoid bothering with it and allow all things to work as before.
    (And I don't know where it actually used :D)

    I would like to have that sort of API, to pass several commands, and later wait for responses. But here, for now, we can avoid that. It just slows down klippy initialization a little.
    (FWIW, That would be useful to gather TMC registers by one roundtrip, or allow to not always make new MCU bulk commands).

  • I think, the main target in this PR is a situation when there are transient issues with sensor communications and a request is retryable. I would like to decrease i2c exception to one general or even remove them here. But I do not see a clear way to handle errors at bus.py level.
    We can still move errors/shutdown() to klippy level and leave it here for a while.

Thanks

@KevinOConnor
Copy link
Collaborator

KevinOConnor commented Oct 21, 2024

I think it does not pass tests because I changed some mcu commands and the current dictionary does not fit.

I think it's failing because self.mcu.add_config_cmd("i2c_write oid=%d check=%c data=%s" % (self.oid, 0x0, data_msg), is_init=True) should have %d instead of %c.

I made this because I'm not fully understand constructions like:

That branch is for the case where some code calls i2c_write() before the connect phase has completed. (That is, when called from the config reading phase.) The idea is that we can add the commands to the init block instead of waiting for the data dictionary to be returned. I'm not sure how often that feature is used.

I would like to have that sort of API, to pass several commands, and later wait for responses.

Unfortunately, that's quite difficult with the current mcu/host communication protocol. There's no good way (that I know of) to figure out which response is associated with each command if multiple commands are sent at once. It should be possible to just ignore responses though.

We can still move errors/shutdown() to klippy level and leave it here for a while.

Yeah - if we can do this in phases I think that would help. Maybe first phase is support for returning error codes to the mcu i2c layer, which calls shutdown on an error. Then one-by-one convert mcu archs to return errors instead of shutdown. Then convert mcu i2c commands to return error instead of invoking shutdown (internal mcu i2c users still invoke shutdown) and have klippy invoke shutdown. After that point, the host code can implement retry or error propagation. Again, just thinking out loud.

Thanks,
-Kevin

@nefelim4ag nefelim4ag force-pushed the i2c-error-handling branch 10 times, most recently from e2b342c to 216e923 Compare October 21, 2024 15:05
@nefelim4ag
Copy link
Contributor Author

nefelim4ag commented Oct 21, 2024

Sum up for now:

  • I dropped internal refactoring of an i2c dev address, the address stored inside i2c_config as before.
  • I split patches, so we can more or less safe introduce mcu changes one by one (I did not retested them yet stm32/linux works)
  • Protocol and bus.py changes were also removed, cause we want made them later (I can push them here or move them to another PR.
  • ldc1612 die on i2c errors
  • mpu does not

I hope those reduced size of changes will be easier to review/check/cherrypick.

@KevinOConnor
Copy link
Collaborator

Thanks. In general, it looks good to me. I like this plan to update the mcu first and then update the host in a following PR.

Some minor things I noticed:

  • mpu9250 can't just return random data on an i2c error. So, I think it has to shutdown on an error until some later point where we can update it to return back an error code to the host (the accelerometers typically do this by returning a pattern that the accelerometer will never itself produce). How about adding a new read_mpu() helper function that replaces the current calls to read_i2c() and have that invoke the shutdown on an error.
  • For ldc1612, can we just check for an error in read_reg() and shutdown from there? This sensor actually has a defined mechanism for returning errors, and I can test that and propagate the error codes in a follow up PR.

Thanks again,
-Kevin

@nefelim4ag nefelim4ag force-pushed the i2c-error-handling branch 2 times, most recently from da7d039 to 96135d8 Compare October 23, 2024 00:22
@nefelim4ag
Copy link
Contributor Author

nefelim4ag commented Oct 23, 2024

JFYI, ldc/mpu - done

@KevinOConnor
Copy link
Collaborator

Thanks! It looks good to me. I'll give a few days for others to comment, but otherwise look to commit.

-Kevin

@KevinOConnor
Copy link
Collaborator

While we're waiting, would you be able to rebase and add a signed-off-by line to the rp2040 patch? Also, is the commit comments in "i2ccmds: abstract i2c dev from bus" still accurate?

Thanks,
-Kevin

@nefelim4ag
Copy link
Contributor Author

nefelim4ag commented Oct 24, 2024

"i2ccmds: abstract i2c dev from bus" still accurate?

I updated the comment, to be more "clear"
Basically, I don't only add a wrapper, I just hide which specific structure is used underneath from sensors - i2c_config or i2c_software.
Like it is done in SPI.

I'm not sure that I have better wording to describe that, feel free to change or suggest a better one

Added wrapper around sw/hw bus API,
pins.py code will ensure that pins will not mix
between HW/SW buses.

Signed-off-by: Timofey Titovets <[email protected]>
Signed-off-by: Timofey Titovets <[email protected]>
Signed-off-by: Timofey Titovets <[email protected]>
Signed-off-by: Timofey Titovets <[email protected]>
Signed-off-by: Timofey Titovets <[email protected]>
Signed-off-by: Timofey Titovets <[email protected]>
@KevinOConnor KevinOConnor merged commit 08102a0 into Klipper3d:master Oct 27, 2024
1 check passed
@KevinOConnor
Copy link
Collaborator

Thanks!

-Kevin

Misterke pushed a commit to Misterke/klipper that referenced this pull request Nov 1, 2024
* i2ccmds: abstract i2c dev from bus implementation

Added wrapper around sw/hw bus API,
pins.py code will ensure that pins will not mix
between HW/SW buses.

Signed-off-by: Timofey Titovets <[email protected]>

* i2c: handle errors at i2ccmds

Signed-off-by: Timofey Titovets <[email protected]>

* i2c_software: forward errors to i2ccmd

Signed-off-by: Timofey Titovets <[email protected]>

* linux: forward i2c errors to i2ccmd

Signed-off-by: Timofey Titovets <[email protected]>

* rp2040: forward i2c errors to i2ccmd

Signed-off-by: Timofey Titovets <[email protected]>

* stm32: forward i2c errors to i2ccmd

Signed-off-by: Timofey Titovets <[email protected]>

* i2ccmds: move status checks to function

Signed-off-by: Timofey Titovets <[email protected]>

* ldc1612: shutdown on i2c errors

Signed-off-by: Timofey Titovets <[email protected]>

* mpu: shutdown on i2c errors

Signed-off-by: Timofey Titovets <[email protected]>

---------

Signed-off-by: Timofey Titovets <[email protected]>
Co-authored-by: Timofey Titovets <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants