Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HLD for Memory_Statistics #1760

Open
wants to merge 15 commits into
base: master
Choose a base branch
from

Conversation

Arham-Nasir
Copy link

@Arham-Nasir Arham-Nasir commented Jul 23, 2024

This High-Level Design (HLD) document explains the Memory Statistics feature in SONiC. The aim is to improve memory monitoring and management to optimize network performance and reliability.

Repo PR title State
sonic-host-services Support for Memory Statistics Host-Services ![GitHub issue/pull request detail](https://img.shields.io/github/pulls/detail/state/Azure/sonic-host-services/167 [img.shields.io])
sonic-buildimage Add YANG Model and Configuration Support for Memory Statistics ![GitHub issue/pull request detail](https://img.shields.io/github/pulls/detail/state/Azure/sonic-buildimage/20354 [img.shields.io])
sonic-buildimage Support for Memory Statisticsd Process and Configurations ![GitHub issue/pull request detail](https://img.shields.io/github/pulls/detail/state/Azure/sonic-buildimage/20355 [img.shields.io])
sonic-utilities Memory Statistics Config and Show Commands ![GitHub issue/pull request detail](https://img.shields.io/github/pulls/detail/state/Azure/sonic-utilities/3575 [img.shields.io])
sonic-swss-common Add definition for MEMORY_STATISTICS table in schema ![GitHub issue/pull request detail](https://img.shields.io/github/pulls/detail/state/Azure/sonic-swss-common/898 [img.shields.io])

Copy link

linux-foundation-easycla bot commented Jul 23, 2024

CLA Signed

The committers listed above are authorized under a signed CLA.

@zhangyanzhao
Copy link
Collaborator

reviewer is expected and please leave your comments if you want to be a reviewer. Thanks.

@zhangyanzhao
Copy link
Collaborator

Comment on lines +267 to +277
Metric Current High Low D1-D3 D3-D5 D5-D7 D7-D9 D9-D11 D11-D13 D13-D15
Value Value Value 01Jun24 03Jun24 05Jun24 07Jun24 09Jun24 11Jun24 13Jun24
-------------- -------- -------- -------- --------- --------- --------- --------- --------- --------- ---------
Total Memory 15.6G 15.6G 15.1G 15.1G 15.2G 15.3G 15.3G 15.4G 15.5G 15.5G
Used Memory 2.3G 2.5G 2.0G 2.1G 2.2G 2.2G 2.3G 2.4G 2.3G 2.2G
Free Memory 11.9G 12.4G 10.6G 11.0G 11.2G 11.4G 11.5G 11.7G 11.8G 11.9G
Available Memory 13.0G 14.3G 12.4G 12.6G 12.7G 12.8G 12.9G 13.0G 13.1G 13.2G
Cached Memory 1.2G 1.5G 1.0G 1.1G 1.2G 1.3G 1.4G 1.3G 1.4G 1.2G
Buffer Memory 0.3G 0.4G 0.2G 0.2G 0.3G 0.3G 0.4G 0.3G 0.3G 0.4G
Shared Memory 0.5G 0.6G 0.4G 0.5G 0.5G 0.5G 0.4G 0.5G 0.5G 0.5G

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Arham-Nasirare these stats collected from the output of free linux command?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The data is not directly collected from the output of the free Linux command. Instead, they are gathered using the psutil Python library, which directly accesses system memory information via the same underlying mechanisms that the free command relies on. Specifically, the psutil.virtual_memory() function retrieves memory stats such as total, used, free, buffers, cached, and available memory.This approach allows for more direct access to memory data, avoiding the need to parse command output while providing flexibility for programmatic use.

Analysis Period: From 2024-06-01 09:00:00 to 2024-06-15 09:00:00
Interval: 2 days
--------------------------------------------------------------------------------
Metric Current High Low D1-D3 D3-D5 D5-D7 D7-D9 D9-D11 D11-D13 D13-D15
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sonic-gnmi (telemetry) already has a comprehensive implementation https://github.com/sonic-net/sonic-gnmi/blob/64ed32b715e1673c3376701ee192d6292f5ec0ec/sonic_data_client/non_db_client.go#L235. Please check and ensure matrix are aligned.
Even better, suggest reuse:

  1. delegate telemetry container as the memory collector
  2. reduce this HLD scope to only memory-stats and storage, connecting to telemetry container by gnmi protocol
  3. ensure this design could run inside and outside sonic switch
  4. ensure this feature could be completely disabled inside a sonic switch and no performance burden if disabled

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Thank you for the suggestion. We will review the sonic-gnmi implementation to align our metrics with the existing telemetry framework for system consistency. We will also explore reusing the telemetry container for memory collection and narrow the HLD scope to focus on memory statistics and storage.
  • Currently, we use the psutil.virtual_memory() function from the psutil library to gather memory data, which is similar to the getProcMeminfo() function in non_db_client.go, as both methods pull information from /proc/meminfo. We believe that psutil offers a more efficient and Pythonic approach compared to relying solely on the telemetry container for this purpose.
  • Can you elaborate more on how the design could run inside and outside the sonic switch.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In enable/disable case, do you still use SIGHUP which means memstatsd will keep running? or just SIGTERM the memoryStats daemon.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hostcfgd will detect the enable/disable flag in ConfigDB. For disable, it sends SIGTERM to stop the daemon, where the daemon gracefully stops all processes, releases resources, and completely shuts down. For enable, it restarts the daemon using systemd, and SIGHUP is used for configuration reloads while the daemon is running.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, better to mention it explicitly, memorystats is not daemon but process controlled by hostcfgd, what if hostcfgd crashes, will it reload memorystats and recalculate retention?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was a misunderstanding. Yes, we are introducing memorystatsd as a process and not a separate container. If the hostcfgd crashes, memorystats will restart and when the user enables memory statistics feature, it will use the default retention time unless the user explicitly mentions a custom retention period. Lets say, user enters 5 as retention period, then it will start the retention period from that time and will show the last 5 days data.

@Arham-Nasir
Copy link
Author

Arham-Nasir commented Sep 26, 2024

@qiluo-msft @xincunli-sonic @zbud-msft @FengPan-Frank We have linked the code PRs. Please help review the feature.

@ridahanif96
Copy link
Collaborator

ridahanif96 commented Oct 25, 2024

@qiluo-msft @xincunli-sonic @zbud-msft @FengPan-Frank We have linked the code PRs. Please help review the feature.
Target 202411, help with expedite review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: 🏗 In Progress
Development

Successfully merging this pull request may close these issues.

7 participants