Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Azure event(s) that log send failures, retries, and last mile failures on item basis #17300

Open
2 tasks
jsutantio opened this issue Feb 7, 2025 · 7 comments
Labels
onboarding-ops Work related to onboarding with a partner. Addressed by the Onboarding & Operations team in RS. platform Platform Team

Comments

@jsutantio
Copy link
Collaborator

jsutantio commented Feb 7, 2025

User Story

As an Engagement Engineer and Support Team member, I need to know more about the messages that have failed – so that I can help senders diagnose why their messages are not being delivered.

Description/Use Case

The UP Message Monitoring Dashboard in Azure will eventually be a tool used by the Support Team to monitor messages that fail to be properly processed in the UP. Although a bit old, this SOP document provides good context on the goal of the message monitoring effort.

1. For the [Items routed (destination assigned) but not sent AND did not trigger a filter] dashboard module, we need Azure event(s) that enable us to do the following:

  • Need to know which item has failed to send to the receiver at the send step.
  • Need to know if that item has retried to send and how many times it has attempted retries
    Consider: The event that achieves the first two bullets can be named ITEM_SEND_FAIL or ITEM_SEND_ATTEMPT.
  • Need to know if that item has completed 5 retries and unsuccessfully sent to the receiver > meaning it hit Last Mile Failures
    Consider: The event that achieves the third bullet can be named ITEM_LAST_MILE_FAILURES.

Metadata requested to be logged during these events:

  • Retry count
  • Receiver it was destined for
  • submittedReportIds []
  • submittedItemIndex
  • parentReportId
  • parentItemIndex
  • childReportId
  • childItemIndex
  • sender
  • blobUrl
  • pipelineStepName
  • topic
  • trackingId/ unique ID (must have a way to identify the same item across retries and trace this item back to the original sender’s submission/ report
  • retryTime (planned time for next retry)
  • queueMessage (the queue message that initiates the event step; Used to expedite the process of looking for the queue message, which normally requires an Eng Eng to look through all the traces)

Nice to have: Time it takes to send (each attempt)

Risks/Impacts/Considerations

  • How do we log events at the item-level when they are at the UP send step (after being batched)?
  • Will we hit the character limit?
  • If we need to potentially send thousands of ITEM events in the send step, what does that mean for performance? (ITEM_ACCEPTED is similar in this regard and is currently implemented)

Dev Notes

Acceptance Criteria

  • Azure event(s) created that enable us to know which item has failed to send to the receiver at the send step, and if that item has retried to send and how many times it has attempted retries
  • Azure event created that enables us to know if that item has completed 5 retries and unsuccessfully sent to the receiver
@jsutantio jsutantio added platform Platform Team onboarding-ops Work related to onboarding with a partner. Addressed by the Onboarding & Operations team in RS. labels Feb 7, 2025
@jsutantio jsutantio changed the title Add Azure event(s) that log send failures, retries, and last mile failures on item basis Add Azure event(s) that log send failures, retries, last mile failures, and result interpretations on item basis Feb 7, 2025
@jack-h-wang
Copy link
Collaborator

@arnejduranovic
Copy link
Collaborator

arnejduranovic commented Feb 10, 2025

@jsutantio @chris-kuryak @victor-chaparro For number two (Need to include the test result interpretation (positive or negative or etc.): do we have a specific FHIR field in mind?

Also, can we make number 2 a separate ticket?

@arnejduranovic
Copy link
Collaborator

@jsutantio can we discuss what you mean by this comment?

timestamp (I assume this would be the most recent retry time)

@arnejduranovic
Copy link
Collaborator

Another question: do we need an event when an item is successfully sent?

@jsutantio
Copy link
Collaborator Author

jsutantio commented Feb 11, 2025

Another question: do we need an event when an item is successfully sent?

@arnejduranovic Is this not ITEM_SENT, which exists already?

@jsutantio
Copy link
Collaborator Author

jsutantio commented Feb 11, 2025

For number two (Need to include the test result interpretation (positive or negative or etc.): do we have a specific FHIR field in mind?

The task related to the dashboard module Items routed but filtered out has been split into it's own ticket: #17330.

@jsutantio jsutantio changed the title Add Azure event(s) that log send failures, retries, last mile failures, and result interpretations on item basis Add Azure event(s) that log send failures, retries, and last mile failures on item basis Feb 11, 2025
@jsutantio
Copy link
Collaborator Author

timestamp (I assume this would be the most recent retry time)

Because timestamp of the failure is already logged automatically during the generation of the event, this additional metadata request is now changed to be the estimated time when the next retry is planned = retryTime.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
onboarding-ops Work related to onboarding with a partner. Addressed by the Onboarding & Operations team in RS. platform Platform Team
Projects
None yet
Development

No branches or pull requests

3 participants