Skip to content

Conversation

sven-rosenzweig
Copy link
Contributor

@sven-rosenzweig sven-rosenzweig commented Feb 10, 2025

The active queue uses a heap-based priority queue, sorting jobs by priority.
Tasks received via RPC from Neutron have the highest priority.
Jobs from the sync loop have lower priority.
Jobs moved from the passive to the active queue retain their original priority.

Every interval x, the agent triggers a background sync, placing objects that
need synchronization into the passive queue.
These jobs may be moved to the active queue if capacity allows.

However, when the agent is fully occupied (handling 40 tasks simultaneously),
and more high-priority jobs are coming in, lower-priority jobs in the active
queue may become stuck for some time, as higher-priority
jobs are continuously processed first.

With our approach sort stability is not given.
Adding new jobs with HIGHEST priority does not guarantee they are
returned in the order the have been added. Details can be found in
the heapq documentation (https://docs.python.org/3/library/heapq.html#priority-queue-implementation-notes).

To overcome this potential drawbacks the active queue becomes a FiFo
Queue. Jobs on the active queue are treated with same equality and are
processed with insertion order.

Copy link

github-actions bot commented Feb 10, 2025

Name                                                                      Stmts   Miss  Cover
---------------------------------------------------------------------------------------------
networking_nsxv3/api/rpc.py                                                 233    110    53%
networking_nsxv3/common/config.py                                            16      0   100%
networking_nsxv3/common/constants.py                                         23      0   100%
networking_nsxv3/common/locking.py                                           35     11    69%
networking_nsxv3/common/synchronization.py                                  189     51    73%
networking_nsxv3/db/db.py                                                    94     19    80%
networking_nsxv3/extensions/nsxtoperations.py                               104     40    62%
networking_nsxv3/plugins/ml2/drivers/nsxv3/agent/agent.py                   162     50    69%
networking_nsxv3/plugins/ml2/drivers/nsxv3/agent/cli.py                     299    195    35%
networking_nsxv3/plugins/ml2/drivers/nsxv3/agent/client_nsx.py              187     49    74%
networking_nsxv3/plugins/ml2/drivers/nsxv3/agent/constants_nsx.py             6      0   100%
networking_nsxv3/plugins/ml2/drivers/nsxv3/agent/extensions/firewall.py      27      0   100%
networking_nsxv3/plugins/ml2/drivers/nsxv3/agent/provider.py                169     10    94%
networking_nsxv3/plugins/ml2/drivers/nsxv3/agent/provider_nsx_policy.py     765    113    85%
networking_nsxv3/plugins/ml2/drivers/nsxv3/agent/realization.py             203     33    84%
networking_nsxv3/plugins/ml2/drivers/nsxv3/driver.py                        129     74    43%
networking_nsxv3/prometheus/exporter.py                                      19      5    74%
networking_nsxv3/services/logapi/drivers/nsxv3/driver.py                     41      1    98%
networking_nsxv3/services/qos/drivers/nsxv3/qos.py                           34      4    88%
networking_nsxv3/services/trunk/drivers/nsxv3/trunk.py                       71      3    96%
---------------------------------------------------------------------------------------------
TOTAL                                                                      2806    768    73%

@sven-rosenzweig sven-rosenzweig force-pushed the fix/stuck_in_active_queue branch 2 times, most recently from 0678f85 to 5b5ec6d Compare February 11, 2025 08:06
@sven-rosenzweig sven-rosenzweig marked this pull request as ready for review February 11, 2025 08:08
mutax
mutax previously approved these changes Feb 11, 2025
@mutax mutax dismissed their stale review February 11, 2025 09:55

overlooked the missing uniq

The active queue uses a heap-based priority queue, sorting jobs by priority.
Tasks received via RPC from Neutron have the highest priority.
Jobs from the sync loop have lower priority.
Jobs moved from the passive to the active queue retain their original priority.

Every interval x, the agent triggers a background sync, placing objects that
need synchronization into the passive queue.
These jobs may be moved to the active queue if capacity allows.

However, when the agent is fully occupied (handling 40 tasks simultaneously),
and more high-priority jobs are coming in, lower-priority jobs in the active
queue may become stuck for some time, as higher-priority
jobs are continuously processed first.

The problem with starvation is amplified by the fact, that items from the passive
queue with lower priority will prevent new jobs being added with higher
priority -- see the __eq__ method of Runnable  in line 78 and the comment.

With our approach sort stability is not given.
Adding new jobs with HIGHEST priority does not guarantee they are
returned in the order the have been added. Details can be found in
the heapq documentation (https://docs.python.org/3/library/heapq.html#priority-queue-implementation-notes).

To overcome this potential drawbacks the active queue becomes a FiFo
Queue. Jobs on the active queue are treated with same equality and are
processed with insertion order.
@sven-rosenzweig sven-rosenzweig force-pushed the fix/stuck_in_active_queue branch from 5b5ec6d to dfc3b63 Compare February 12, 2025 08:41
@mutax
Copy link
Contributor

mutax commented Feb 20, 2025

code cherry picked into PR #145

@mutax mutax closed this Feb 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants