Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

doc: Add known issue for TX processor block #20282

Merged
merged 1 commit into from
Feb 14, 2025

Conversation

HaavardRei
Copy link
Contributor

Adds a known issue for the TX processor being blocked if a callback is blocking when run in the system workqueue, and the HCI command buffer pool is empty.

@HaavardRei HaavardRei requested a review from a team as a code owner February 11, 2025 07:14
@github-actions github-actions bot added doc-required PR must not be merged without tech writer approval. changelog-entry-required Update changelog before merge. Remove label if entry is not needed or already added. labels Feb 11, 2025
@NordicBuilder
Copy link
Contributor

NordicBuilder commented Feb 11, 2025

CI Information

To view the history of this post, clich the 'edited' button above
Build number: 12

Inputs:

Sources:

sdk-nrf: PR head: e67979c7397929c9fff5ac4b381078b10f977733

more details

sdk-nrf:

PR head: e67979c7397929c9fff5ac4b381078b10f977733
merge base: ba4a37e4f1565ffa4c15af0745950b237bc458b9
target head (main): ba4a37e4f1565ffa4c15af0745950b237bc458b9
Diff

Github labels

Enabled Name Description
ci-disabled Disable the ci execution
ci-all-test Run all of ci, no test spec filtering will be done
ci-force-downstream Force execution of downstream even if twister fails
ci-run-twister Force run twister
ci-run-zephyr-twister Force run zephyr twister
List of changed files detected by CI (1)
doc
│  ├── nrf
│  │  ├── releases_and_maturity
│  │  │  │ known_issues.rst

Outputs:

Toolchain

Version:
Build docker image:

Test Spec & Results: ✅ Success; ❌ Failure; 🟠 Queued; 🟡 Progress; ◻️ Skipped; ⚠️ Quarantine

  • ◻️ Toolchain
  • ◻️ Build twister
  • ◻️ Integration tests
Disabled integration tests
    • desktop52_verification
    • doc-internal
    • test_ble_nrf_config
    • test-fw-nrfconnect-apps
    • test-fw-nrfconnect-ble_mesh
    • test-fw-nrfconnect-ble_samples
    • test-fw-nrfconnect-boot
    • test-fw-nrfconnect-chip
    • test-fw-nrfconnect-fem
    • test-fw-nrfconnect-nfc
    • test-fw-nrfconnect-nrf-iot_lwm2m
    • test-fw-nrfconnect-nrf-iot_mosh
    • test-fw-nrfconnect-nrf-iot_positioning
    • test-fw-nrfconnect-nrf-iot_samples
    • test-fw-nrfconnect-nrf-iot_serial_lte_modem
    • test-fw-nrfconnect-nrf-iot_thingy91
    • test-fw-nrfconnect-nrf-iot_zephyr_lwm2m
    • test-fw-nrfconnect-nrf_crypto
    • test-fw-nrfconnect-proprietary_esb
    • test-fw-nrfconnect-ps
    • test-fw-nrfconnect-rpc
    • test-fw-nrfconnect-rs
    • test-fw-nrfconnect-tfm
    • test-fw-nrfconnect-thread
    • test-fw-nrfconnect-zigbee
    • test-low-level
    • test-sdk-audio
    • test-sdk-dfu
    • test-sdk-find-my
    • test-sdk-mcuboot
    • test-sdk-pmic-samples
    • test-sdk-sidewalk
    • test-sdk-wifi
    • test-secdom-samples-public

Note: This message is automatically posted and updated by the CI

doc/nrf/releases_and_maturity/known_issues.rst Outdated Show resolved Hide resolved
doc/nrf/releases_and_maturity/known_issues.rst Outdated Show resolved Hide resolved
doc/nrf/releases_and_maturity/known_issues.rst Outdated Show resolved Hide resolved
@NordicBuilder
Copy link
Contributor

You can find the documentation preview for this PR at this link.

Note: This comment is automatically posted by the Documentation Publish GitHub Action.

@HaavardRei HaavardRei force-pushed the ncsdk_31528_known_issue branch from e986b5a to 5b29fad Compare February 11, 2025 07:36
@HaavardRei HaavardRei force-pushed the ncsdk_31528_known_issue branch from 5b29fad to 5d688cc Compare February 11, 2025 08:12
Comment on lines 192 to 210
.. rst-class:: v2-9-0-nRF54H20-1 v2-9-0 v2-8-0

NCSDK-31528: Deadlock on sysworkq with ``tx_notify`` in Host
If a blocking callback is run on the system workqueue when the HCI buffer pool is full, the TX processor will be blocked.
This prevents TX buffers from being freed, deadlocking the application.
An example of a blocking callback is calling the :c:func:`bt_conn_get_tx_power_level` function in a receive callback, which is blocking since it uses the :c:func:`bt_hci_cmd_send_sync` function.

**Workaround:** Do not use blocking calls in the system workqueue.
Alternatively, increase the value of the :kconfig:option:`CONFIG_BT_BUF_CMD_TX_COUNT` Kconfig option to increase the HCI buffer.
This does not guarantee that the problem is solved, as multiple blocking calls may exhaust the buffer pool.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My two cents to improve clarity:

Suggested change
.. rst-class:: v2-9-0-nRF54H20-1 v2-9-0 v2-8-0
NCSDK-31528: Deadlock on sysworkq with ``tx_notify`` in Host
If a blocking callback is run on the system workqueue when the HCI buffer pool is full, the TX processor will be blocked.
This prevents TX buffers from being freed, deadlocking the application.
An example of a blocking callback is calling the :c:func:`bt_conn_get_tx_power_level` function in a receive callback, which is blocking since it uses the :c:func:`bt_hci_cmd_send_sync` function.
**Workaround:** Do not use blocking calls in the system workqueue.
Alternatively, increase the value of the :kconfig:option:`CONFIG_BT_BUF_CMD_TX_COUNT` Kconfig option to increase the HCI buffer.
This does not guarantee that the problem is solved, as multiple blocking calls may exhaust the buffer pool.
NCSDK-31528: Deadlock on sysworkq with ``tx_notify`` in Host
If a blocking callback (which in-turn needs to trigger an HCI command to complete its task) is run on the system workqueue when the HCI command buffer pool is full, the TX processing code will get blocked as well.
This situation prevents TX command buffers from being freed, deadlocking the application.
An example of a blocking callback is calling the :c:func:`bt_conn_get_tx_power_level` function in a receive callback. Calling such function can result in deadlock, since it uses the :c:func:`bt_hci_cmd_send_sync` function to complete its operation. The operation won't complete if HCI command buffer is full resulting in deadlock.
**Workaround:** Do not use blocking calls in tasks triggered on the system workqueue.
Alternatively, increase the value of the :kconfig:option:`CONFIG_BT_BUF_CMD_TX_COUNT` Kconfig option to increase the HCI buffer.
This does not guarantee that the problem is solved, as multiple blocking calls may exhaust the buffer pool.

Copy link
Contributor

@peknis peknis Feb 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remember, all sentences on their own lines.
And one more thing, won't -> will not, or even better does not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated per your suggestion. I think the last two sentences (before workaround) might be repeating themselves a bit, so suggestions for improvement there is welcome :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be any better if we leave out "resulting in deadlock"? I think that is the redundant part.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think so, thanks!

Copy link
Contributor

@omkar3141 omkar3141 Feb 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would retain that as an explainer for the previous paragraph. The example explains why deadlock happens. May be its ok to leave it out.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The last two sentences are explaining how deadlock occurs in the context of the chosen example. It is easy for us to see it through, but it may not be easy for a typical reader of this text. I suggest to not remove too much for the sake of clarity.

@HaavardRei HaavardRei force-pushed the ncsdk_31528_known_issue branch from 5d688cc to 9fb6030 Compare February 11, 2025 09:23
@HaavardRei HaavardRei requested a review from omkar3141 February 11, 2025 09:25
@HaavardRei HaavardRei force-pushed the ncsdk_31528_known_issue branch from 9fb6030 to 424f8e9 Compare February 11, 2025 09:35

NCSDK-31528: Deadlock on sysworkq with ``tx_notify`` in Host
If a blocking callback (which in turn needs to trigger an HCI command to complete its task) is run on the system workqueue when the HCI command buffer pool is full, the TX processing code will get blocked.
This prevents TX command buffers from being freed, deadlocking the application.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This prevents TX command buffers from being freed, deadlocking the application.
This situation prevents TX command buffers from being freed, deadlocking the application.

Otherwise it is not clear what does "this" refers to.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO it's obviously referring to the statement/line above, and adding "situation" doesn't really contribute (to the situation :)).

@HaavardRei HaavardRei force-pushed the ncsdk_31528_known_issue branch 2 times, most recently from 84135b0 to 5cf4f64 Compare February 12, 2025 07:17
Copy link
Contributor

@omkar3141 omkar3141 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After re-write this is much much clearer. Thanks!

@@ -189,6 +189,26 @@ KRKNWK-14299: NRPA MAC address cannot be set in Zephyr
Bluetooth LE
============

.. rst-class:: v2-9-0-nRF54H20-rc1 v2-9-0 v2-8-0

NCSDK-31528: Blocking the system workqueue if the :kconfig:option:`CONFIG_BT_HCI_ACL_FLOW_CONTROL` Kconfig option is disabled can cause a deadlock in the Bluetooth Host when running out of buffers in the HCI commands pool.
Copy link
Contributor

@omkar3141 omkar3141 Feb 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sentence parsing error. Something went wrong in the rephrasing and now it combination of two sentences with two verbs into single one.

Perhaps?

Suggested change
NCSDK-31528: Blocking the system workqueue if the :kconfig:option:`CONFIG_BT_HCI_ACL_FLOW_CONTROL` Kconfig option is disabled can cause a deadlock in the Bluetooth Host when running out of buffers in the HCI commands pool.
NCSDK-31528: Potential for deadlock. If the :kconfig:option:`CONFIG_BT_HCI_ACL_FLOW_CONTROL` Kconfig option is disabled, blocking of the system workqueue can cause a deadlock in the Bluetooth Host when running out of buffers in the HCI commands pool.

Copy link
Contributor

@peknis peknis Feb 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first line with the Jira issue ID is usually the same as the title in the issue, without a fullstop. Therefore, suggesting:
NCSDK-31528: Deadlock on system workqueue with tx_notify in host
If the :kconfig:option:CONFIG_BT_HCI_ACL_FLOW_CONTROL Kconfig option is disabled, blocking of the system workqueue can cause a deadlock in the Bluetooth Host when running out of buffers in the HCI commands pool.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second line must be indented a couple of spaces (does not seem to show it properly in my comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added suggestion!

@HaavardRei HaavardRei force-pushed the ncsdk_31528_known_issue branch from 5cf4f64 to 3aaa324 Compare February 13, 2025 07:01
Copy link
Contributor

@peknis peknis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved with a nit.

Comment on lines 200 to 203
* The :kconfig:option:`CONFIG_BT_HCI_ACL_FLOW_CONTROL` Kconfig option is disabled.
* The system workqueue is blocked.
* The HCI commands pool is empty.
* A blocking Bluetooth Host API that uses the :c:func:`bt_hci_cmd_send_sync` function is called from any thread (including the system workqueue).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* The :kconfig:option:`CONFIG_BT_HCI_ACL_FLOW_CONTROL` Kconfig option is disabled.
* The system workqueue is blocked.
* The HCI commands pool is empty.
* A blocking Bluetooth Host API that uses the :c:func:`bt_hci_cmd_send_sync` function is called from any thread (including the system workqueue).
* The :kconfig:option:`CONFIG_BT_HCI_ACL_FLOW_CONTROL` Kconfig option is disabled.
* The system workqueue is blocked.
* The HCI commands pool is empty.
* A blocking Bluetooth Host API that uses the :c:func:`bt_hci_cmd_send_sync` function is called from any thread (including the system workqueue).

@HaavardRei HaavardRei force-pushed the ncsdk_31528_known_issue branch 2 times, most recently from 8e84506 to b227dda Compare February 13, 2025 07:15
@HaavardRei HaavardRei removed the changelog-entry-required Update changelog before merge. Remove label if entry is not needed or already added. label Feb 13, 2025
Adds a known issue for the TX processor being blocked if a callback is
blocking when run in the system workqueue, and the HCI command buffer
pool is empty.

Signed-off-by: Håvard Reierstad <[email protected]>
@HaavardRei HaavardRei force-pushed the ncsdk_31528_known_issue branch from b227dda to e67979c Compare February 14, 2025 07:07
@github-actions github-actions bot added the changelog-entry-required Update changelog before merge. Remove label if entry is not needed or already added. label Feb 14, 2025
@HaavardRei HaavardRei removed the changelog-entry-required Update changelog before merge. Remove label if entry is not needed or already added. label Feb 14, 2025
@rlubos rlubos merged commit 87fef23 into nrfconnect:main Feb 14, 2025
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
doc-required PR must not be merged without tech writer approval.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants