Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZHA (Deconz) on two systems: No response to 'Command.aps_data_indication' command #105913

Closed
basslet opened this issue Dec 17, 2023 · 33 comments
Closed
Assignees

Comments

@basslet
Copy link

basslet commented Dec 17, 2023

The problem

Running two HA systems:

  1. core-2023.10.5 on rpi3-64
  2. core-2023.12.3 on rpi4-64

Both systems with ConBee II

Both systems show frequent warnings from zigpy_deconz.api: No response to 'Command.aps_data_indication' command with seq id 'xxxxx'

On hardware side, I replaced USB extension cables, connected to USB2 and tried to minimize any potential interference.

I also completely reset and re-paired one of the ConBee II and its devices, no change.

I'm getting this error 120-250 times a day per system.

System Information

version core-2023.12.3
installation_type Home Assistant OS
dev false
hassio true
docker true
user root
virtualenv false
python_version 3.11.6
os_name Linux
os_version 6.1.58-haos-raspi
arch aarch64
timezone Europe/Berlin
config_dir /config
Home Assistant Community Store
GitHub API ok
GitHub Content ok
GitHub Web ok
GitHub API Calls Remaining 4866
Installed Version 1.33.0
Stage running
Available Repositories 1358
Downloaded Repositories 9
Home Assistant Cloud
logged_in true
subscription_expiration 8 March 2024 at 01:00
relayer_connected true
relayer_region eu-central-1
remote_enabled true
remote_connected true
alexa_enabled true
google_enabled false
remote_server eu-central-1-7.ui.nabu.casa
certificate_status ready
instance_id a9199596b0bf48e08fb81ce91f7b5b4a
can_reach_cert_server ok
can_reach_cloud_auth ok
can_reach_cloud ok
Home Assistant Supervisor
host_os Home Assistant OS 11.2
update_channel stable
supervisor_version supervisor-2023.11.6
agent_version 1.6.0
docker_version 24.0.7
disk_total 220.0 GB
disk_used 12.9 GB
healthy true
supported true
board rpi4-64
supervisor_api ok
version_api ok
installed_addons UniFi Network Application (3.0.0), Samba share (12.2.0), Studio Code Server (5.14.2), Terminal & SSH (9.8.1)
Dashboards
dashboards 7
resources 2
views 7
mode storage
Recorder
oldest_recorder_run 7 December 2023 at 06:47
current_recorder_run 17 December 2023 at 17:08
estimated_db_size 595.50 MiB
database_engine sqlite
database_version 3.41.2

System Information

version core-2023.10.5
installation_type Home Assistant OS
dev false
hassio true
docker true
user root
virtualenv false
python_version 3.11.5
os_name Linux
os_version 6.1.21-v8
arch aarch64
timezone Europe/Berlin
config_dir /config
Home Assistant Community Store
GitHub API ok
GitHub Content ok
GitHub Web ok
GitHub API Calls Remaining 4866
Installed Version 1.33.0
Stage running
Available Repositories 1362
Downloaded Repositories 2
HACS Data ok
Home Assistant Cloud
logged_in false
can_reach_cert_server ok
can_reach_cloud_auth ok
can_reach_cloud ok
Home Assistant Supervisor
host_os Home Assistant OS 11.1
update_channel stable
supervisor_version supervisor-2023.11.6
agent_version 1.6.0
docker_version 24.0.6
disk_total 116.5 GB
disk_used 5.1 GB
healthy true
supported true
board rpi3-64
supervisor_api ok
version_api ok
installed_addons Samba share (10.0.2), Studio Code Server (5.13.0), Advanced SSH & Web Terminal (15.1.0)
Dashboards
dashboards 1
resources 0
mode auto-gen
Recorder
oldest_recorder_run 14 December 2023 at 10:37
current_recorder_run 16 December 2023 at 15:37
estimated_db_size 13.57 MiB
database_engine sqlite
database_version 3.41.2

What version of Home Assistant Core has the issue?

core-2023.12.3

What was the last working version of Home Assistant Core?

No response

What type of installation are you running?

Home Assistant Supervised

Integration causing the issue

ZHA

Link to integration documentation on our website

https://www.home-assistant.io/integrations/zha/

Diagnostics information

home-assistant_zha_2023-12-17T16-23-53.575Z.log

Example YAML snippet

No response

Anything in the logs that might be useful for us?

Logger: zigpy_deconz.api
Source: runner.py:186
First occurred: 16 December 2023 at 16:03:32 (10 occurrences)
Last logged: 15:41:05

No response to 'Command.aps_data_indication' command with seq id '0x8a'
No response to 'Command.aps_data_indication' command with seq id '0x80'
No response to 'Command.aps_data_indication' command with seq id '0x82'
No response to 'Command.aps_data_indication' command with seq id '0x6f'
No response to 'Command.aps_data_indication' command with seq id '0x87'

Additional information

No response

@home-assistant
Copy link

Hey there @dmulcahey, @Adminiuga, @puddly, @TheJulianJES, mind taking a look at this issue as it has been labeled with an integration (zha) you are listed as a code owner for? Thanks!

Code owner commands

Code owners of zha can trigger bot actions by commenting:

  • @home-assistant close Closes the issue.
  • @home-assistant rename Awesome new title Renames the issue.
  • @home-assistant reopen Reopen the issue.
  • @home-assistant unassign zha Removes the current integration label and assignees on the issue, add the integration domain after the command.
  • @home-assistant add-label needs-more-information Add a label (needs-more-information, problem in dependency, problem in custom component) to the issue.
  • @home-assistant remove-label needs-more-information Remove a label (needs-more-information, problem in dependency, problem in custom component) on the issue.

(message by CodeOwnersMention)


zha documentation
zha source
(message by IssueLinks)

@puddly
Copy link
Contributor

puddly commented Dec 17, 2023

Could you also collect a debug log from 2023.10.5?

@basslet
Copy link
Author

basslet commented Dec 17, 2023

@basslet
Copy link
Author

basslet commented Dec 17, 2023

To give you a sense of severity, this is a counter for the No Response warnings errors in the system log
image

We were away between 15:00 and 16:30 with little activity at home (e.g. no lights going on or off, no motion sensors), and it overlaps with the error line staying flat.

Lmk if you need anything else

@puddly
Copy link
Contributor

puddly commented Dec 17, 2023

Thanks! The log file for 2023.10.5 contains debug info only for about 20s, starting from 22:03:49. This unfortunately isn't enough time to capture the entire context for one of these events (you seem to get about one every 8 minutes). Could you leave debug logging on for about 20 minutes and then re-upload a new one?

@basslet
Copy link
Author

basslet commented Dec 17, 2023

Sure, I will do it tomorrow morning on both systems.

@basslet
Copy link
Author

basslet commented Dec 18, 2023

@basslet
Copy link
Author

basslet commented Dec 21, 2023

Is there anything else you need in order to evaluate the problem @puddly?

@puddly
Copy link
Contributor

puddly commented Dec 21, 2023

It should be fixed once 2023.12.4 is released later today.

@basslet
Copy link
Author

basslet commented Dec 22, 2023

Thanks, I'll test it once it's out, and let you know.
Could you share the PR for the fix? I'd like to better understand what the issue is.

TBH these ZHA / ConBee II issues created such a poor HA experience in our home that I'm considering switching systems, so I'm trying to understand what caused the problem.

Thanks again for looking into it.

@puddly
Copy link
Contributor

puddly commented Dec 22, 2023

This fixes your specific problem zigpy/zigpy-deconz#241, there were a few other changes you can trace through.

@TekFan
Copy link

TekFan commented Dec 22, 2023

Wow, quite interesting, I just discovered this thread.
I've searched the forum in the past year and found many occurrences of the same topic but never a fix.
I have the exact same issue since months, also with two different systems both with Combee II sticks.
Since I had to update the firmware on one of them lately, due to problems with Aqara T1 water leak sensors, the frequency has skyrocketed, I'm now at 600 errors per day on this system.
I don't know if there is any link to that but I simultaneously noted a steep increase of missed events, particularly from my Aqara motion sensors.
I was relating that to the amount of devices, 70 on the most affected system, which I thought might be too much, but reading this, I can't wait to try 2023.12.4 and see if it resolves it for good.

@basslet
Copy link
Author

basslet commented Dec 25, 2023

Interestingl Thanks for sharing @TekFan. I also added the Aqara water leak sensor recently. HA was running fine for months, but the last 3-4 weeks have been a nightmare. The system even crashes and restarts, which I had never seen before.

I might just unpair the Aqara water leak sensor for now and see if it changes anything.

I'm at 200+ "no response" errors and around 600 syslog errors and warnings in total, per day.

@basslet
Copy link
Author

basslet commented Dec 26, 2023

Removing the Aqara water leak sensor doesn't show a significant change in the behavior. Still getting ~200 'No response' errors per day. Can't wait to test 2023.12.4 when it's out, I'm ready to ditch ConBee II for either a SkyConnect or ditch ZHA altogether for a better commercial solution.

My Philips Hue and Bosch SHC devices have been running flawlessly across the entire apartment, except for some small errors. But nothing like the ZHA integration.

@TekFan
Copy link

TekFan commented Dec 26, 2023

@basslet , I'm afraid the Aqara water leak sensor has nothing to do with the problem, I just mentioned it because the new T1 version behaves differently than the older one. Aqara is well known to deviate completely from the Zigbee specs and use their own protocol. The last times, it is to the point that even the Conbee stick had to be updated to cope with this.
And after the update, I noticed a steep increase of these "No response to 'Command.aps_data_indication' command with seq id 'xxxxx" errors, but the Aqara T1 leak sensor is completely indirect to this because I got them a month ago and the problem is latent since almost 2 years and increasing with the numbers of devices I add to the network.
I suspect these errors have something to do with polling, for instance I have a lot (I think about a dozen) of NOUS (tuya) smartplugs and they have a know bug where energy (kWh) updates don't work all the time so the specific ZHA quirk fetch the data by polling and I guess the response doesn't always come back synchronously.
Do you happen to have polled devices ?

@basslet
Copy link
Author

basslet commented Dec 26, 2023

@TekFan yes I have the same NOUS (tuya) plugs - see below. All of these entities are tuya.

Logger: homeassistant.helpers.entity
Source: helpers/entity.py:929
First occurred: 25 December 2023 at 13:15:55 (259 occurrences)
Last logged: 15:58:29

Update of sensor.vitrine_power is taking over 10 seconds
Update of sensor.liam_room_echo_power is taking over 10 seconds
Update of sensor.klavier_power is taking over 10 seconds
Update of sensor.liam_room_echo_summation_delivered is taking over 10 seconds
Update of sensor.vitrine_summation_delivered is taking over 10 seconds

And yes, the Aqara water sensor seems to make no real difference here, thanks for clarifying.

@TekFan
Copy link

TekFan commented Dec 26, 2023

Interesting.
I was fearing that adding too much of this plugs would end up flooding the network, due to this polling requirement.
I had to update the plugs with the latest firmware last summer because they began to shut themselves down with voltages beyond 250V, it is a security feature, but with all the PV installations in my neighborhood, this was too low, so due to many complaints, NOUS released a new firmware with increased threshold. This firmware has fixed that problem, but I don't know if they fixed the energy update bug because the quirk is still polling, so now way to know if it would update spontaneously now, rendering the polling superfluous.
Maybe @puddly could tell us if there is a way to disable polling temporarily (even by side-loading a modified quirk) to try without polling.
Polling has always been a bad idea on such network, unless strictly necessary of course, so it would be interesting to be able to disable it per kind of device, or at least globally for debug purpose, like with an entry is the ZHA section of the config file.
Otoh, puddly and other devs. I think, are working on this issue and we might just as well wait for the 2023.12.4 if it ever comes out, to see if it helps.

@basslet
Copy link
Author

basslet commented Dec 26, 2023

@TekFan How did you update the FW of the NOUS smart plugs? Do you have a link to the file and changelog?

@puddly
Copy link
Contributor

puddly commented Dec 26, 2023

@TekFan your missed Aqara motion sensor events are also probably due to the bug fixed in the above PR. The warning you're seeing is the Conbee failing to respond to the request zigpy-deconz sends, asking the radio to give it a packet.

@TekFan
Copy link

TekFan commented Dec 26, 2023

@puddly , good news and thank you so much to have worked on this issue, like I said I can't wait to try the fix and will update on the results.

@basslet, you can find the firmware in this thread then drop it in a folder on your HA server and finally you have to follow the ZHA OTA procedure. But I guess, so far, it is only needed if you suffer from the OV protection in your area.

@TekFan
Copy link

TekFan commented Dec 27, 2023

@puddly : Ok, so, 2023.12.4 is out so I updated from 12.1.
And...it doesn't look good, previous type of warning seem to have disappeared, but replaced by 3 new types, flooding my log.
Unexpected transmit confirm for request id xxx, Status: TXStatus.SUCCESS - about 130 per hour
Duplicate or delayed response for 0x:xx sequence - about 200 per hour
Update of sensor.blabla_switch_xxxxxxxx_electrical_measurement is taking over 10 seconds - about 100 per hour.

HA_log.txt

As far as I can see, they are mostly related to my NOUS plugs (tuya TS011F), which is interesting because that could help finding the root cause.
Like I said previously, afaik, these plugs are polled and probably the only devices I have needing this, so I'm not sure if these warning are related to these TS011F plugs or simply to polling in general.

What I also note is that the instantaneous power and current updates of these switch are now very slow, I have to wait at least a minute before having an update where it was a couple a second before. (if I remember correctly)

Now, I just discovered that you can disable polling globally in the system settings of the ZHA integration, so I did.
And the "Update of sensor.blabla_switch_xxxxxxxx_electrical_measurement is taking over 10 seconds" Stopped.
But the "Unexpected transmit confirm for request id xxx, Status: TXStatus.SUCCESS" and the "Duplicate or delayed response for 0x:xx sequence" continue to flood the log
So at least one of these warnings is directly linked to the TS011F plugs polling.

Now, with polling disabled, power and current entities don't update anymore on the plugs, but the Energy and voltage are.
So this answers my question about the need to continue to poll them after their last firmware update.

I have a ZHA detailed log at disposal, but It's huge and probably with sensitive info so I would prefer to send it to a dev. directly instead of publicly posting it.

Finally, right know it is too soon to report if my missing events are fixed by this release.

@basslet
Copy link
Author

basslet commented Dec 27, 2023

Same here. Will report tomorrow when I had a chance to test everything.

@puddly
Copy link
Contributor

puddly commented Dec 27, 2023

The warnings can usually be ignored: they're internal to the radio and should be DEBUG anyways so you aren't bothered by them.

Can you upload a ZHA debug log as well? If there are timing issues with commands, those can be adjusted.

@TekFan
Copy link

TekFan commented Dec 27, 2023

Like I said, I have a debug log ready, but would rather upload it privately.

@puddly
Copy link
Contributor

puddly commented Dec 27, 2023

You can email it to me at [email protected].

@TekFan
Copy link

TekFan commented Dec 27, 2023

Done

@TekFan
Copy link

TekFan commented Dec 29, 2023

Ok, so @puddly , after about 2 days now, this is what I see:

  • the 12.4 release seems to have fixed my missed events and the "no response" warnings, I didn't experience a single misfire from my Aqara motion sensors, but it's obviously too soon to be certain. Anyway, if it stays like that during the coming days, then it is very positive.

  • Oth, like I wrote previously, the log is now flooded with new warnings I never or rarely had before, but worse, I now experienced 3 random ZHA reloads with a side effect that it triggers many of my automations, including some TTS through my HomePods in the middle of the night...
    At that moment, I had this in the log:

Logger: zigpy_deconz.uart
Source: /usr/local/lib/python3.11/site-packages/zigpy_deconz/uart.py:32
First occurred: December 27, 2023 at 16:13:37 (3 occurrences)
Last logged: 00:03:50

Lost connection: SerialException('device reports readiness to read but returned no data (device disconnected or multiple access on port?)')
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/serial_asyncio_fast/__init__.py", line 132, in _read_ready
    data = self._serial.read(self._max_read_size)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/serial/serialposix.py", line 595, in read
    raise SerialException(
serial.serialutil.SerialException: device reports readiness to read but returned no data (device disconnected or multiple access on port?)

So, basically, this is a mixed result, even though I apparently don't have missed events anymore, ZHA seems much more instable right now.

@erkr
Copy link

erkr commented Dec 30, 2023

Since I upgraded from 2023.12.2 to 2023.12.4 my log is flooded with these warnings. I also own several tuya power plugs and a conbee 2 adapter.
But I see the 10 second delay warning not only with the Tuya plugs. I also happens with a innr sp120 plug.
These are the 4 loggings I see. Thousands of them in a matter of 8 hours:

Unexpected transmit confirm for request id 157, Status: TXStatus.SUCCESS
18:49:44 – (WARNING) runner.py - message first occurred at 10:20:43 and shows up 3526 times
Duplicate or delayed response for 0x:60 sequence
18:49:44 – (WARNING) runner.py - message first occurred at 10:20:55 and shows up 2167 times
Updating zha sensor took longer than the scheduled update interval 0:00:30
18:49:27 – (WARNING) Sensor - message first occurred at 10:21:24 and shows up 637 times
Update of sensor.innr_sp_120_38ea6103_electrical_measurement is taking over 10 seconds
18:49:07 – (WARNING) helpers/entity.py - message first occurred at 10:21:04 and shows up 567 times

@dmulcahey
Copy link
Contributor

Please try the new beta

@erkr
Copy link

erkr commented Dec 31, 2023

Please try the new beta

Can I try the beta as a custom integration in the latest core 2023.12? Or are there dependencies with the core beta?!

@erkr
Copy link

erkr commented Dec 31, 2023

It's that is possible, I will give it a try tomorrow.
Best wishes
Eric

@TekFan
Copy link

TekFan commented Jan 4, 2024

Wow, it happened !
I upgraded to release 2024.1.0 8 hours ago and not even one ZHA warning in the log !
Besides the recent floods, I had frequent ZHA warnings and errors since many months, I think during about a year and everything disappeared after last update.
I just hope something has really been fixed and not just changing the warnings to debug status :-)
Will see in the coming hours/days if random reloads and missed events are definitely gone as well.

@basslet
Copy link
Author

basslet commented Jan 4, 2024

Same here, looking good so far. I think we can close this issue.

@github-actions github-actions bot locked and limited conversation to collaborators Feb 3, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants