Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLI support for SmartSwitch PMON #3271

Open
wants to merge 145 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 39 commits
Commits
Show all changes
145 commits
Select commit Hold shift + click to select a range
11cc04d
CLI support for SmartSwitch PMON
rameshraghupathy Apr 14, 2024
02df0ea
imad minor fixes
rameshraghupathy Apr 16, 2024
e0e4700
Did some cleanup for backward compatibility
rameshraghupathy Apr 27, 2024
0a8fc5a
removed the column wrapping
rameshraghupathy Apr 27, 2024
6d61faa
Made it backward compatible and removed textwrap and added ut to PR
rameshraghupathy Apr 28, 2024
8d95dae
1. There was a duplication of part of a function and that has been
rameshraghupathy May 1, 2024
5c1b666
reboot_cause and system_health are obtained directly from chassisStateDB
rameshraghupathy May 8, 2024
fe4a8cf
The expected and result are the same but the test is throwing an error,
rameshraghupathy May 8, 2024
f896438
Let us get the build going and then look into the test mockup
rameshraghupathy May 9, 2024
9d6c093
Implemented as per the pmon hld, also made some improvements in the
rameshraghupathy May 10, 2024
0904515
Fixed the key for CHASSIS_MODULE_INFO_TABLE entries
rameshraghupathy May 15, 2024
ccc380b
Fixed "show reboot-cause all" and "show reboot-cause history all"
rameshraghupathy May 16, 2024
a8fa81d
Addressing review comments
rameshraghupathy May 31, 2024
1cf96a0
Checking if the test issue still exists
rameshraghupathy May 31, 2024
64fd559
Resolving SA errors triggered due to reboot_cause_test
rameshraghupathy Jun 1, 2024
d202e1c
Resolved pre-commit issues
rameshraghupathy Jun 7, 2024
b8c92ae
Resolved pre-commit issues
rameshraghupathy Jun 7, 2024
9986f7b
Improving coverage
rameshraghupathy Jun 7, 2024
0dc52f6
Fixed SA related warnings
rameshraghupathy Jun 7, 2024
93df26d
Did some cleanup
rameshraghupathy Jun 7, 2024
7a2aaf4
Minor improvements and fixes
rameshraghupathy Jun 7, 2024
26f9b8a
Adding tests for system health
rameshraghupathy Jun 7, 2024
3a592f8
Adding more system health related tests
rameshraghupathy Jun 7, 2024
71472a8
Fixed a minor issue
rameshraghupathy Jun 7, 2024
fd8bd6b
Fixed long line SA issue
rameshraghupathy Jun 7, 2024
5b15bc4
Trying to please SA
rameshraghupathy Jun 7, 2024
b35c987
Trying to improve coverage
rameshraghupathy Jun 7, 2024
ee10649
import mock
rameshraghupathy Jun 8, 2024
27546a6
Fixed a typo
rameshraghupathy Jun 8, 2024
883e35c
mocking DB
rameshraghupathy Jun 8, 2024
713ffa2
Fixed syntax issues
rameshraghupathy Jun 8, 2024
62fc3d0
DB mock fix
rameshraghupathy Jun 8, 2024
ecb2ecc
removed unused import
rameshraghupathy Jun 8, 2024
e2eb660
creating ut for dpu state
rameshraghupathy Jun 8, 2024
ef87cb5
Improving coverage
rameshraghupathy Jun 8, 2024
53c2277
Fixed a typo
rameshraghupathy Jun 8, 2024
fb989e4
Adjusted the reboot-cause key as per the updated hld
rameshraghupathy Jun 13, 2024
8ea7960
Added fix to gracefully handle sytem health DB keys not present case
rameshraghupathy Jun 30, 2024
76de68a
Addressed minor review comments
rameshraghupathy Jul 9, 2024
a08e0cb
Addressed review comments. Commented out system-health support until
rameshraghupathy Jul 29, 2024
766b303
Resolved minor issues and SA failures
rameshraghupathy Jul 29, 2024
c474940
Added role to PORT table in config_db. Using role to differentiate
rameshraghupathy Aug 31, 2024
1910163
Resolving pre-commit check error related to line > 120
rameshraghupathy Aug 31, 2024
851dc78
Trying to avoid pre-commit issues
rameshraghupathy Aug 31, 2024
cb54b73
Testing SA and precommit checks
rameshraghupathy Aug 31, 2024
4dfb5f8
Making it backward compatible
rameshraghupathy Aug 31, 2024
6941baf
Resolving column size and whitespace issue
rameshraghupathy Sep 1, 2024
f3c8e36
Working on SA issue
rameshraghupathy Sep 1, 2024
6d7d539
Testing SA and UT
rameshraghupathy Sep 1, 2024
433bc50
Added 2 spaces before inline comment
rameshraghupathy Sep 1, 2024
3ddcc9c
Merge branch 'sonic-net:master' into master
rameshraghupathy Sep 1, 2024
95da5c0
Enabling "show system-health dpu" cli alone. The rest of the dpu health
rameshraghupathy Sep 4, 2024
627dd5e
Fixed SA issues
rameshraghupathy Sep 4, 2024
934e6ef
Adde new line at EOF
rameshraghupathy Sep 4, 2024
64d06ec
Enabling the UT for the CLI "show system-health dpu"
rameshraghupathy Sep 4, 2024
4870a86
Resolved SA issues
rameshraghupathy Sep 4, 2024
fed3f67
Resolved a SA issue
rameshraghupathy Sep 4, 2024
68b6416
Added smartswitch specific "reboot-cause" and "reboot-cause history" CLI
rameshraghupathy Sep 24, 2024
d229307
Removed the phase:2 related system-health cli extensions as a seperate
rameshraghupathy Sep 24, 2024
78e71c5
Using smartswitch qualifier for the clie extensions
rameshraghupathy Sep 28, 2024
d7fbe9d
Fixed SA issues
rameshraghupathy Sep 28, 2024
313a9d2
mocking device_info for test cases
rameshraghupathy Sep 28, 2024
0ea1227
import patch in tests
rameshraghupathy Sep 28, 2024
f5f88bb
Debugging test failure
rameshraghupathy Sep 28, 2024
62817ea
Fixing SA issues
rameshraghupathy Sep 28, 2024
9fb005d
fixing sa issues
rameshraghupathy Sep 28, 2024
7c8c5d7
Debugging sa issues
rameshraghupathy Sep 28, 2024
b5b068b
trying to resolve sa issues
rameshraghupathy Sep 28, 2024
25259cb
fixed indentation
rameshraghupathy Sep 28, 2024
808e7b4
debugging
rameshraghupathy Sep 28, 2024
7eb8304
debugging
rameshraghupathy Sep 28, 2024
44bed5c
debugging
rameshraghupathy Sep 28, 2024
d7fd0ce
debugging
rameshraghupathy Sep 28, 2024
b0e51f8
Debugging
rameshraghupathy Sep 29, 2024
ed742fc
debugging
rameshraghupathy Sep 29, 2024
11f48f3
debugging
rameshraghupathy Sep 29, 2024
402887d
Debugging
rameshraghupathy Sep 29, 2024
8db11f3
Debugging
rameshraghupathy Sep 29, 2024
2ab48b5
Debuggingg
rameshraghupathy Sep 29, 2024
e843fff
Debugging
rameshraghupathy Sep 29, 2024
9ba21d2
Debugging
rameshraghupathy Sep 29, 2024
738634d
Debugging
rameshraghupathy Sep 29, 2024
c491687
Debugging
rameshraghupathy Sep 29, 2024
ee3f927
Debugging
rameshraghupathy Sep 29, 2024
d47a431
Debugging
rameshraghupathy Sep 29, 2024
04c520e
Debugging
rameshraghupathy Sep 29, 2024
c5abc01
Debugging
rameshraghupathy Sep 29, 2024
6ab7742
Debugging
rameshraghupathy Sep 29, 2024
4299ac3
Debugging
rameshraghupathy Sep 29, 2024
d30ead7
Debugging
rameshraghupathy Sep 29, 2024
a07e8c0
Debugging
rameshraghupathy Sep 29, 2024
a2cece6
Debugging
rameshraghupathy Sep 29, 2024
e2b65af
Debugging
rameshraghupathy Sep 29, 2024
53909f0
Debugging
rameshraghupathy Sep 29, 2024
9849436
Debugging
rameshraghupathy Sep 29, 2024
02152e3
Debuggingg
rameshraghupathy Sep 29, 2024
a75a4d3
Debugging
rameshraghupathy Sep 29, 2024
f8a1f57
Debugging
rameshraghupathy Sep 29, 2024
29000c3
Debugging
rameshraghupathy Sep 29, 2024
e273a16
Debugging
rameshraghupathy Sep 29, 2024
d720cf6
Debugging
rameshraghupathy Sep 29, 2024
c6040b3
Debugging
rameshraghupathy Sep 29, 2024
864c96c
Debugging
rameshraghupathy Sep 29, 2024
8580f76
Debugging
rameshraghupathy Sep 29, 2024
f4942b7
Debugging
rameshraghupathy Sep 29, 2024
3e44844
Debugging
rameshraghupathy Sep 29, 2024
e7355b0
Debugging
rameshraghupathy Sep 30, 2024
b132f90
Debugging
rameshraghupathy Sep 30, 2024
781270a
Debugging
rameshraghupathy Sep 30, 2024
2e8813b
Debugging
rameshraghupathy Sep 30, 2024
6cba5ed
Removing the test to build an image
rameshraghupathy Sep 30, 2024
5db0bc2
Removed mock import
rameshraghupathy Sep 30, 2024
807529f
Improving coverage
rameshraghupathy Sep 30, 2024
885b168
pleasing SA
rameshraghupathy Sep 30, 2024
b6efa8c
Fixing tests for design changes as per review comments
rameshraghupathy Sep 30, 2024
4c26a25
Resolving test failure
rameshraghupathy Sep 30, 2024
ed3d24b
fixed indentation
rameshraghupathy Sep 30, 2024
68a9efe
cleaned up the test case
rameshraghupathy Oct 1, 2024
d09d58f
Addressed review comments in Command-Reference.md and trying to improve
rameshraghupathy Oct 1, 2024
c217c18
Improving coverage
rameshraghupathy Oct 1, 2024
df87438
Fixed a test issue
rameshraghupathy Oct 1, 2024
2dfc2b5
Addressed review comments
rameshraghupathy Oct 7, 2024
c261b0c
Addressed review comment. Reading DPUs list from config_db.json
rameshraghupathy Oct 8, 2024
ab200bc
Improving coverage
rameshraghupathy Oct 8, 2024
5e36792
Resolved SA error
rameshraghupathy Oct 8, 2024
4a43780
Trying to improve coverage. Also, reading from platform.json
rameshraghupathy Oct 8, 2024
8b2c9cb
adding json import in the test
rameshraghupathy Oct 8, 2024
155ba3f
Fixed a test failure
rameshraghupathy Oct 8, 2024
e8c8b42
Fixed SA error
rameshraghupathy Oct 8, 2024
9601177
Exercising the new function in test
rameshraghupathy Oct 9, 2024
9713bf7
Removed a blank line
rameshraghupathy Oct 9, 2024
fdf8569
fixing mock issue
rameshraghupathy Oct 9, 2024
4b30138
Trying a different approach
rameshraghupathy Oct 9, 2024
e725add
working on coverage
rameshraghupathy Oct 9, 2024
d2e7590
debugging
rameshraghupathy Oct 9, 2024
3e1fc12
debugging
rameshraghupathy Oct 9, 2024
51dce03
Debugging
rameshraghupathy Oct 9, 2024
a016ead
Increasing coverage
rameshraghupathy Oct 9, 2024
041fad6
improving coverage
rameshraghupathy Oct 9, 2024
5c85cf4
Adjusting the show cli implementation to align with the reboot-cause
rameshraghupathy Oct 23, 2024
1b3fabb
Fixing a minor issue
rameshraghupathy Oct 23, 2024
9a0225b
Removed ID column from the "show system-health dpu DPUx" cli as per t…
rameshraghupathy Oct 25, 2024
8f191d6
Addressed default dpu admin status for dark-mode and seamless migration
rameshraghupathy Oct 29, 2024
523a42c
Resolving SA issue
rameshraghupathy Oct 29, 2024
a90b878
Resolved a typo
rameshraghupathy Oct 30, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions config/chassis_modules.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,8 +30,9 @@ def shutdown_chassis_module(db, chassis_module_name):

if not chassis_module_name.startswith("SUPERVISOR") and \
not chassis_module_name.startswith("LINE-CARD") and \
not chassis_module_name.startswith("FABRIC-CARD"):
ctx.fail("'module_name' has to begin with 'SUPERVISOR', 'LINE-CARD' or 'FABRIC-CARD'")
not chassis_module_name.startswith("FABRIC-CARD") and \
Copy link

@gpunathilell gpunathilell Oct 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to perform additional validation to check if the chassis_module_name is actually present (or is an actual valid module name) or not, if user executes config chassis modules startup DPU5 on a system which does not have DPU5, this will cause crash in chassisd for the SmartSwitchConfigManagerTask in chassisd preventing further startup or shutdown calls (even though output of the command would be Starting up chassis module DPU1 or Shutting down chassis module DPU1 the only operation which is performed is addition/removal from the CONFIG_DB )

not chassis_module_name.startswith("DPU"):
rameshraghupathy marked this conversation as resolved.
Show resolved Hide resolved
ctx.fail("'module_name' has to begin with 'SUPERVISOR', 'LINE-CARD', 'FABRIC-CARD', 'DPU'")

fvs = {'admin_status': 'down'}
config_db.set_entry('CHASSIS_MODULE', chassis_module_name, fvs)
Expand Down
8 changes: 4 additions & 4 deletions show/chassis_modules.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,14 +37,14 @@ def status(db, chassis_module_name):
header = ['Name', 'Description', 'Physical-Slot', 'Oper-Status', 'Admin-Status', 'Serial']
chassis_cfg_table = db.cfgdb.get_table('CHASSIS_MODULE')

state_db = SonicV2Connector(host="127.0.0.1")
state_db = SonicV2Connector(host="127.0.0.1", port="6379")
state_db.connect(state_db.STATE_DB)

key_pattern = '*'
key_pattern = 'CHASSIS_MODULE_TABLE|*'
if chassis_module_name:
key_pattern = '|' + chassis_module_name
key_pattern = 'CHASSIS_MODULE_TABLE|' + chassis_module_name
rameshraghupathy marked this conversation as resolved.
Show resolved Hide resolved

keys = state_db.keys(state_db.STATE_DB, CHASSIS_MODULE_INFO_TABLE + key_pattern)
keys = state_db.keys(state_db.STATE_DB, key_pattern)
if not keys:
print('Key {} not found in {} table'.format(key_pattern, CHASSIS_MODULE_INFO_TABLE))
rameshraghupathy marked this conversation as resolved.
Show resolved Hide resolved
return
Expand Down
139 changes: 112 additions & 27 deletions show/reboot_cause.py
100755 → 100644
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
import json
import os
import sys
import redis

import click
from tabulate import tabulate
Expand All @@ -9,7 +10,8 @@


PREVIOUS_REBOOT_CAUSE_FILE_PATH = "/host/reboot-cause/previous-reboot-cause.json"

STATE_DB = 6
rameshraghupathy marked this conversation as resolved.
Show resolved Hide resolved
CHASSIS_STATE_DB = 13

vvolam marked this conversation as resolved.
Show resolved Hide resolved
def read_reboot_cause_file():
reboot_cause_dict = {}
Expand All @@ -24,6 +26,98 @@ def read_reboot_cause_file():
return reboot_cause_dict


# Function to fetch reboot cause data from database
def fetch_data_from_db(module_name, fetch_history=False, use_chassis_db=False):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rameshraghupathy how are these functions prevented from executing on non-smartswitch platforms?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The KEY has unique identifier "DPUx" which will prevent the other platforms.
@prgeor

prefix = 'REBOOT_CAUSE|'
if use_chassis_db:
try:
rdb = redis.Redis(host='redis_chassis.server', port=6380, decode_responses=True, db=CHASSIS_STATE_DB)
rameshraghupathy marked this conversation as resolved.
Show resolved Hide resolved
rameshraghupathy marked this conversation as resolved.
Show resolved Hide resolved
table_keys = rdb.keys(prefix+'*')
except Exception:
return []
else:
rdb = SonicV2Connector(host='127.0.0.1')
rdb.connect(rdb.STATE_DB, False) # Make one attempt only
table_keys = rdb.keys(rdb.STATE_DB, prefix+'*')

if table_keys is not None:
table_keys.sort(reverse=True)

table = []
d = []
for tk in table_keys:
rameshraghupathy marked this conversation as resolved.
Show resolved Hide resolved
r = []
append = False
if use_chassis_db:
entry = rdb.hgetall(tk)
else:
entry = rdb.get_all(rdb.STATE_DB, tk)

if module_name is not None:
if 'device' in entry:
if module_name != entry['device'] and module_name != "all":
continue
if entry['device'] in d and not history:
append = False
continue
elif not entry['device'] in d or entry['device'] in d and history:
append = True
if not entry['device'] in d:
d.append(entry['device'])
r.append(entry['device'] if 'device' in entry else "SWITCH")
suffix = ""
if append and "DPU" in entry['device']:
suffix = entry['device'] + '|'
r.append(tk.replace(prefix, "").replace(suffix, ""))
r.append(entry['cause'] if 'cause' in entry else "")
r.append(entry['time'] if 'time' in entry else "")
r.append(entry['user'] if 'user' in entry else "")
if append and not fetch_history:
table.append(r)
elif fetch_history:
r.append(entry['comment'] if 'comment' in entry else "")
if module_name is None or module_name == 'all' or module_name.startswith('SWITCH') or \
'device' in entry and module_name == entry['device']:
table.append(r)

return table


# Wrapper-function to fetch reboot cause data from database
def fetch_reboot_cause_from_db(module_name):
table = []
r = []

# Read the previous reboot cause
reboot_cause_dict = read_reboot_cause_file()
reboot_gen_time = reboot_cause_dict.get("gen_time", "N/A")
reboot_cause = reboot_cause_dict.get("cause", "Unknown")
reboot_time = reboot_cause_dict.get("time", "N/A")
reboot_user = reboot_cause_dict.get("user", "N/A")

r.append("SWITCH")
r.append(reboot_gen_time if reboot_gen_time else "")
r.append(reboot_cause if reboot_cause else "")
r.append(reboot_time if reboot_time else "")
r.append(reboot_user if reboot_user else "")
table.append(r)

table += fetch_data_from_db(module_name, fetch_history=False, use_chassis_db=True)
return table


# Function to fetch reboot cause history data from database
def fetch_reboot_cause_history_from_db(module_name):
if module_name == "all":
# Combine data from both Redis containers for "all" modules
data_switch = fetch_data_from_db(module_name, fetch_history=True, use_chassis_db=False)
data_dpu = fetch_data_from_db(module_name, fetch_history=True, use_chassis_db=True)
return data_switch + data_dpu
elif module_name is None or module_name == "SWITCH":
return fetch_data_from_db(module_name, fetch_history=True, use_chassis_db=False)
else:
return fetch_data_from_db(module_name, fetch_history=True, use_chassis_db=True)

#
# 'reboot-cause' group ("show reboot-cause")
#
Expand Down Expand Up @@ -62,33 +156,24 @@ def reboot_cause(ctx):
click.echo(reboot_cause_str)


# 'history' subcommand ("show reboot-cause history")
# 'all' command within 'reboot-cause'
@reboot_cause.command()
def history():
"""Show history of reboot-cause"""
REBOOT_CAUSE_TABLE_NAME = "REBOOT_CAUSE"
TABLE_NAME_SEPARATOR = '|'
db = SonicV2Connector(host='127.0.0.1')
db.connect(db.STATE_DB, False) # Make one attempt only
prefix = REBOOT_CAUSE_TABLE_NAME + TABLE_NAME_SEPARATOR
_hash = '{}{}'.format(prefix, '*')
table_keys = db.keys(db.STATE_DB, _hash)
if table_keys is not None:
table_keys.sort(reverse=True)
def all():
"""Show cause of most recent reboot"""
reboot_cause_data = fetch_reboot_cause_from_db("all")
header = ['Device', 'Name', 'Cause', 'Time', 'User']
click.echo(tabulate(reboot_cause_data, header, numalign="left"))

table = []
for tk in table_keys:
entry = db.get_all(db.STATE_DB, tk)
r = []
r.append(tk.replace(prefix, ""))
r.append(entry['cause'] if 'cause' in entry else "")
r.append(entry['time'] if 'time' in entry else "")
r.append(entry['user'] if 'user' in entry else "")
r.append(entry['comment'] if 'comment' in entry else "")
table.append(r)

header = ['Name', 'Cause', 'Time', 'User', 'Comment']
click.echo(tabulate(table, header, numalign="left"))
# 'history' command within 'reboot-cause'
@reboot_cause.command()
@click.argument('module_name', required=False)
def history(module_name):
"""Show history of reboot-cause"""
reboot_cause_history = fetch_reboot_cause_history_from_db(module_name)
if module_name is not None:
header = ['Device', 'Name', 'Cause', 'Time', 'User', 'Comment']
click.echo(tabulate(reboot_cause_history, header, numalign="left"))
else:
click.echo("Reboot-cause history is not yet available in StateDB")
sys.exit(1)
header = ['Name', 'Cause', 'Time', 'User', 'Comment']
click.echo(tabulate(reboot_cause_history, header, numalign="left"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rameshraghupathy please test this CLI on non-smartswitch

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done and UT log is attached
@prgeor

Loading
Loading