Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit logging of soft errors (draft) #245

Draft
wants to merge 8 commits into
base: split-arch
Choose a base branch
from
Draft

Conversation

ffoulkes
Copy link
Collaborator

@ffoulkes ffoulkes commented Jul 4, 2024

Revised and expanded on the changes made in PR #240.

  • Defined ALREADY_EXISTS and NOT_FOUND to be soft errors.
  • Implemented IsSoftError() predicates to check whether a status code is a soft error.
  • Suppressed logging of soft errors.

- Revised earlier changes to limit logging of ALREADY_EXISTS
  errors. Defined ALREADY_EXISTS and NOT_FOUND as soft errors,
  and implemented IsSoftError() and IsSoftTdiError() predicates
  to check whether a status code is a soft error.

Signed-off-by: Derek Foster <[email protected]>
Signed-off-by: Derek Foster <[email protected]>
@ffoulkes ffoulkes marked this pull request as draft July 4, 2024 22:23
- Renamed filtering macro to RETURN_IF_ERROR_WITH_FILTERING().

- Add filtering to calls to GetTableEntry().

Signed-off-by: Derek Foster <[email protected]>
Signed-off-by: Derek Foster <[email protected]>

// Checks whether a TDI status code is a soft error.
static inline bool IsSoftTdiError(tdi_status_t err) {
return (err == TDI_ALREADY_EXISTS) || (err == TDI_OBJECT_NOT_FOUND) ||
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the definition of a soft error in this implementation? Why is "object/table not found" considered a soft error?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. When I reviewed the updated HSD-ES ticket, I discovered that it contained log messages like this:

    E20231201 00:28:53.970028 5140 tdi_table_manager.cc:624] Return Error: tdi_sde_interface_->GetTableEntry(device_, session, table_id, table_key.get(), table_data.get()) failed with StratumErrorSpace::ERR_ENTRY_NOT_FOUND: 'table->entryGet(*real_session->tdi_session_, *dev_tgt, flags, *real_table_key->table_key_, real_table_data->table_data_.get())' failed with error message: Object not found. 
    E20231201 00:28:53.970144 5140 es2k_node.cc:278] StratumErrorSpace::ERR_AT_LEAST_ONE_OPER_FAILED: One or more read operations failed.
    E20231201 00:28:53.970456 5140 p4_service.cc:353] Failed to read forwarding entries from node 1: One or more read operations failed.
    
  2. The commentary showed that they were also bothered about these issues (i.e., my previous commit had addressed only half their problem).

  3. ALREADY_EXISTS and NOT_FOUND are two sides of the same coin. The conditions are easily recoverable, which is what makes them "soft" errors.

  4. The customer is complaining about the fact that the intermediate code is logging, not just ALREADY_EXISTING, but ALREADY_EXISTING and NOT_FOUND. They are therefore complaining about our logging soft errors.

  5. When TDI status codes are converted to Stratum status codes, TDI_TABLE_NOT_FOUND and TDI_OBJECT_NOT_FOUND are both mapped to ERR_ENTRY_NOT_FOUND. I therefore treat them as identical.

  6. The idea is to reduce the amount of special-purpose code by treating all errors with this classification identically.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Point taken. I've added some additional documentation to the header file.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More to the point, there is code (in ovsp4rt, and perhaps elsewhere) that checks for the existence of a table entry before it proceeds. TDI does not provide an EntryExists() method, so the code does a read and checks the status code.

And think about it. If "entry already exists" is a soft error, why wouldn't "entry does not exist" be a soft error? They're complementary.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The chief effect of this change is to go from saying that a specific status code (ALREADY_EXISTS) deserves special handling to saying that there's a class of status codes that deserve special handling, and ALREADY_EXISTS and NOT_FOUND are both members.

The other effect is to call these soft errors instead of appending "(may not be an error)" to the message text. I suppose I could have changed the text to "(note that the application may not consider this to be an error)", but calling it a "soft error" is more elegant.

@ffoulkes ffoulkes requested a review from 5abeel July 8, 2024 16:09
@ffoulkes ffoulkes added the enhancement New feature or request label Jul 12, 2024
@ffoulkes ffoulkes changed the title Limit logging of soft errors Limit logging of soft errors (draft) Oct 22, 2024

// Checks whether a TDI status code is a soft error.
static inline bool IsSoftTdiError(tdi_status_t err) {
return (err == TDI_ALREADY_EXISTS) || (err == TDI_OBJECT_NOT_FOUND) ||
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More to the point, there is code (in ovsp4rt, and perhaps elsewhere) that checks for the existence of a table entry before it proceeds. TDI does not provide an EntryExists() method, so the code does a read and checks the status code.

And think about it. If "entry already exists" is a soft error, why wouldn't "entry does not exist" be a soft error? They're complementary.


// Checks whether a TDI status code is a soft error.
static inline bool IsSoftTdiError(tdi_status_t err) {
return (err == TDI_ALREADY_EXISTS) || (err == TDI_OBJECT_NOT_FOUND) ||
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The chief effect of this change is to go from saying that a specific status code (ALREADY_EXISTS) deserves special handling to saying that there's a class of status codes that deserve special handling, and ALREADY_EXISTS and NOT_FOUND are both members.

The other effect is to call these soft errors instead of appending "(may not be an error)" to the message text. I suppose I could have changed the text to "(note that the application may not consider this to be an error)", but calling it a "soft error" is more elegant.


// Special version of RETURN_IF_ERROR() that suppresses logging if this
// is a soft error.
#define RETURN_IF_ERROR_WITH_FILTERING(expr) \
Copy link
Collaborator Author

@ffoulkes ffoulkes Nov 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RETURN_IF_ERROR_LOG_IF_NOT_SOFT()?

I keep trying to come up with a good name for this macro, to no avail.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants