Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

issue: 3898040 DOCA RX Data Path #174

Merged
merged 7 commits into from
Aug 5, 2024

Conversation

AlexanderGrissik
Copy link
Collaborator

@AlexanderGrissik AlexanderGrissik commented Jul 2, 2024

Description

  1. DOCA RX Data Path
  2. Changes in epoll os events handling
  3. Some cleanups
What
  1. It implements DOCA RX Data Path regular RQ including notification mechanism.
  2. It integrates CQ moderation and MSIX affinity DOCA features.
  3. In addition, it removes internal thread involvement in OS events handling for epoll. This improves performance and avoids lock contentions. Now OS polling will be done the same as it was for poll/select. i.e Polling each every loops or epolls for OS events depending on the configuration.
  4. Obsolete and unneeded flags cleanup.
Why ?

DOCA integration

Change type

What kind of change does this PR introduce?

  • Bugfix
  • Feature
  • Code style update
  • Refactoring (no functional changes, no api changes)
  • Build related changes
  • CI related changes
  • Documentation content changes
  • Tests
  • Other

Check list

  • Code follows the style de facto guidelines of this project
  • Comments have been inserted in hard to understand places
  • Documentation has been updated (if necessary)
  • Test has been added (if possible)

Copy link
Member

@pasis pasis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few minor comments

*/
if (safe_mce_sys().app.distribute_cq_interrupts && g_p_app->get_worker_id() >= 0 &&
m_p_ib_ctx_handler->is_notification_affinity_supported()) {
uint32_t core = g_p_app->get_worker_id() % std::thread::hardware_concurrency();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from the hardware_concurrency() description:

Returns the number of concurrent threads supported by the implementation. The value should be considered only a hint.

Is it reliable approach? What if available CPUs have non-continuous numbering? What if number of completion channels (IRQs) is lower than number of CPUs?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct. This implementation is not good because CPU can be higher than comp channels. We need to ask from DOCA to give us the number of available channels.
I will change this section to use some temporary constant comp ch number just to test functionality.

{
if (!safe_mce_sys().doca_flow) {
int ret = -1;
m_lock_ring_rx.lock();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[suggestion] replace manual locking with a lock_guard at the top

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -405,13 +404,12 @@ int epoll_wait_call::ring_poll_and_process_element()
return m_epfd_info->ring_poll_and_process_element(&m_poll_sn_rx, &m_poll_sn_tx, nullptr);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[question] do we still need "poll sn" logic?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed all leftovers of poll sn

if (safe_mce_sys().app.distribute_cq_interrupts && g_p_app->get_worker_id() >= 0 &&
m_p_ib_ctx_handler->is_notification_affinity_supported()) {
uint32_t core = g_p_app->get_worker_id() % std::thread::hardware_concurrency();
hwqrx_loginfo("Setting PE core affinity: %" PRIu32 ", pid: %d", core, getpid());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace info with debug log level.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -89,10 +87,6 @@ epfd_info::epfd_info(int epfd, int size)

xlio_stats_instance_create_epoll_block(m_epfd, &(m_stats->stats));

// Register this socket to read nonoffloaded data
g_p_event_handler_manager->update_epfd(m_epfd, EPOLL_CTL_ADD,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[question] Is "issue: 3502635 Avoid epoll OS polling through internal thread" generic enough to port to vNext?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Rebased and ported from vNext.

README Outdated
Similar to XLIO_RX_SKIP_OS, but in select() or poll() this will force the XLIO
In select(),poll() or epoll() this will force the XLIO
to check the non offloaded fd even though an offloaded socket has ready
packets found while polling.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make a pretty layout with up to 80 symbols in a row

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct layout ported from vNext

// (most likely in case of a wakeup and probably only under epoll_wait (Not
// select/poll))
ring_clear_rx_notification();
// DOCA WA: Unclearable manual triggered events generate false PE events.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[suggestion] Put "XXX" or "TODO" prefix. Some editors such as vim highlight it and this will raise attention to a temporary code.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Adding RX data-path with DOCA ETH RX.
Utilizes DOCA regular RX queue.
DOCA path is completely moved to hw_queue_rx without using cq_mgr_rx.
DPCP path is mostly unchanged and is implemented with cq_mgr_rx.

Signed-off-by: Alexander Grissik <[email protected]>
Adding support for RX PE notification moderation with RXQ.

Signed-off-by: Alexander Grissik <[email protected]>
Adding interrupt distribution affinity for Nginx/Envoy use cases.
Each worker uses different CPU core for PE interrupts.

Signed-off-by: Alexander Grissik <[email protected]>
1. Remove global poll sn logic
2. Use lock instead of trylock for notifications
3. Differentiate between clear-notification and poll-and-process
4. Remove poll-and-process from clear-notification. We do check-all-offloaded-sockets after that anyway. This removes poll overhead.

Signed-off-by: Alexander Grissik <[email protected]>
This flag logic is unneccessary for epolls without offloaded sockets.
We must wait_os for such epolls, otherwise we break DOCA or other application logics.
This flag is never changed in practice and changing it will only introduce overhead without additional functionality.
In case we do not have offloaded sockets in epoll context there is no reason to go and try to do polling/blocking loops.

Signed-off-by: Alexander Grissik <[email protected]>
@AlexanderGrissik AlexanderGrissik changed the base branch from doca_xlio to doca_xlio_vNext August 4, 2024 15:17
If request notification called on non empty PE, DOCA will set manual triggered event.
This event is not cleared as part of clear notification.
Consequent calls to epoll_wait such as inside wait_os generate event again and again,
until request notification is called again on that PE.
We need to clear false PE events after wait_os to avoid redundant polling attempts.
This is a temporary solution.

Signed-off-by: Alexander Grissik <[email protected]>
Removing all global poll sn leftovers from all pathes.
Global poll SN is an optimization attempt to skip request notification if another thread polled that CQ.
This approach is not applicable for HPC application and may introduce polling overhead (It is likely that the second thread drains the CQ anyway).

Signed-off-by: Alexander Grissik <[email protected]>
@AlexanderGrissik AlexanderGrissik merged commit a8ab162 into Mellanox:doca_xlio_vNext Aug 5, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants