Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log branchToken on potential data loss error #7027

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

yycptt
Copy link
Member

@yycptt yycptt commented Dec 20, 2024

What changed?

  • Log branchToken on potential data loss error. The logged branchToken is in base64 format and can be decoded with the command below. The treeID is usually the workflow runID. Also the branchToken is constructed by persistence layer, so there could be additional info re. the workflow in the branchToken.
./tdbg decode proto --type temporal.server.api.persistence.v1.HistoryBranch --hex-data "$(echo "CiQwMTkzZTY1OC1lZjc0LTc1ZDAtYTU4NS1hZWI0MzZlYThmNDESJDAxOTNlNjU4LWVmNzQtNzYyNi05ZTlmLWI1ODAwMGI2MjNkMQ==" | base64 --decode | hexdump -ve '/1 "%02x"')"
{
 "treeId": "0193e658-ef74-75d0-a585-aeb436ea8f41",
 "branchId": "0193e658-ef74-7626-9e9f-b58000b623d1"
}

Why?

  • We recently introduced retry for data loss errors in persistence client, but that also means application logic may not see those errors and won't emit any logs with workflowInfo. Inside persistence layer, there's no workflow key info (e.g. workflowID, runID), the best we can do is log branchToken, which gives us some hint re which workflow is running into the error.

How did you test it?

  • Tested locally.

Potential risks

Documentation

Is hotfix candidate?

@yycptt yycptt requested a review from a team as a code owner December 20, 2024 23:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant