Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

observability: annotate Session+SessionPool events #1207

Merged

Conversation

odeke-em
Copy link
Contributor

@odeke-em odeke-em commented Sep 19, 2024

This change adds annotations for session and session pool events to aid customers in debugging latency issues with session pool malevolence and also for maintainers to figure out which session pool type is the most appropriate.

Updates #1170

BEGIN_COMMIT_OVERRIDE
feat: add additional opentelemetry span events for session pool
END_COMMIT_OVERRIDE

@odeke-em odeke-em requested review from a team as code owners September 19, 2024 03:48
@product-auto-label product-auto-label bot added the size: m Pull request size is medium. label Sep 19, 2024
@product-auto-label product-auto-label bot added the api: spanner Issues related to the googleapis/python-spanner API. label Sep 19, 2024
@odeke-em odeke-em force-pushed the trace-annotation-SessionPools branch 3 times, most recently from 8d6f2b7 to 4bc937e Compare September 19, 2024 07:37
@odeke-em
Copy link
Contributor Author

Exhibit

On running examples/grpc_instrumentation_enabled.py we get back these traces
Screenshot 2024-09-18 at 9 39 29 PM
Screenshot 2024-09-18 at 9 39 47 PM
Screenshot 2024-09-18 at 9 40 01 PM
Screenshot 2024-09-18 at 9 40 25 PM

@odeke-em
Copy link
Contributor Author

cc @harshachinta

@odeke-em odeke-em force-pushed the trace-annotation-SessionPools branch from 4bc937e to 0d5bf26 Compare September 19, 2024 07:45
@harshachinta harshachinta added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Sep 19, 2024
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Sep 19, 2024
@odeke-em odeke-em force-pushed the trace-annotation-SessionPools branch from 0d5bf26 to 3617921 Compare September 19, 2024 13:43
@harshachinta harshachinta added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Sep 19, 2024
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Sep 19, 2024
@odeke-em odeke-em force-pushed the trace-annotation-SessionPools branch 2 times, most recently from f71d05c to eb9dd5a Compare September 20, 2024 01:27
@harshachinta harshachinta added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Sep 20, 2024
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Sep 20, 2024
@odeke-em odeke-em force-pushed the trace-annotation-SessionPools branch from b1ea772 to ab23f09 Compare September 20, 2024 22:09
@odeke-em
Copy link
Contributor Author

Kindly cc-ing you @harshachinta.

@harshachinta harshachinta added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Sep 23, 2024
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Sep 23, 2024
google/cloud/spanner_v1/pool.py Outdated Show resolved Hide resolved
google/cloud/spanner_v1/_opentelemetry_tracing.py Outdated Show resolved Hide resolved
google/cloud/spanner_v1/pool.py Show resolved Hide resolved
google/cloud/spanner_v1/pool.py Show resolved Hide resolved
google/cloud/spanner_v1/pool.py Outdated Show resolved Hide resolved
google/cloud/spanner_v1/pool.py Outdated Show resolved Hide resolved
google/cloud/spanner_v1/pool.py Outdated Show resolved Hide resolved
google/cloud/spanner_v1/session.py Outdated Show resolved Hide resolved
google/cloud/spanner_v1/session.py Outdated Show resolved Hide resolved
google/cloud/spanner_v1/session.py Outdated Show resolved Hide resolved
@odeke-em odeke-em requested a review from harshachinta October 7, 2024 02:36
@odeke-em
Copy link
Contributor Author

odeke-em commented Oct 7, 2024

Kindly please take another look @harshachinta, feedback addressed, thank you for the code review!

)

if requested_session_count > 0:
current_span.add_event(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

current_span is not initialized.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sheesh, sorry I had a bunch of benchmarks running locally and they interrupted my nox -s unit-3.8 tests so didn't see those but all fixed now.

@odeke-em odeke-em force-pushed the trace-annotation-SessionPools branch from e60b662 to 3a1611e Compare November 27, 2024 04:08
google/cloud/spanner_v1/_opentelemetry_tracing.py Outdated Show resolved Hide resolved
google/cloud/spanner_v1/session.py Outdated Show resolved Hide resolved
Comment on lines 286 to 287
if self._transaction_id is None and len(self._mutations) > 0:
self.begin()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if self._transaction_id is None and len(self._mutations) > 0:
self.begin()

Why is this added here? This is already getting executed in the beginning of commit() call, so not needed

if self._transaction_id is None and len(self._mutations) > 0:

)
session = self._sessions.get(block=True, timeout=timeout)
except queue.Empty as e:
add_span_event(current_span, "No session available", span_event_attributes)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
add_span_event(current_span, "No session available", span_event_attributes)
add_span_event(current_span, "No session available in the pool", span_event_attributes)

google/cloud/spanner_v1/pool.py Outdated Show resolved Hide resolved
Comment on lines 261 to 262
"exception",
"exception",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are there 2 exception events?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a quirk with opentelemetry-python; they add "exception" on every exit call whether or not you set your own, I'll file a bug later on but for now I'll just remove our explicit invocation of "span.record_exception" and instead add our comment.

tests/unit/test_pool.py Show resolved Hide resolved
tests/unit/test_pool.py Show resolved Hide resolved
tests/unit/test_pool.py Show resolved Hide resolved
tests/unit/test_pool.py Outdated Show resolved Hide resolved
@harshachinta harshachinta added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Nov 28, 2024
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Nov 28, 2024
@odeke-em
Copy link
Contributor Author

Thank you @harshachinta for the review, I've addressed the feedback. Please take a look again.

@odeke-em odeke-em force-pushed the trace-annotation-SessionPools branch from b6187d8 to e654029 Compare November 28, 2024 14:26
@harshachinta harshachinta added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Nov 28, 2024
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Nov 28, 2024
@odeke-em
Copy link
Contributor Author

@harshachinta kindly help me run the bots.

@harshachinta harshachinta added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Nov 29, 2024
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Nov 29, 2024
Copy link
Contributor

@harshachinta harshachinta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix lint

@@ -102,7 +103,21 @@ def trace_call(name, session, extra_attributes=None, observability_options=None)
yield span
except Exception as error:
span.set_status(Status(StatusCode.ERROR, str(error)))
span.record_exception(error)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets not remove this exception. We are not sure if there are any cases where the span will end up not recording an exception.
I would suggest adding this back here and let us discuss more during our demo.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I was wondering that this behavior of exception getting added twice was not seen earlier since this code exists from very long.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's because OpenTelemetry was upgraded only recently.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do have locked tests to check for the exceptions to ensure that they are in there and from Span.enter. I had to dive back into OpenTelemetry-Python's code as it isn't even documented and in our demos it was very distracting to have mysteriously both errors. I think for the sake of our sanity and project stability let's leave that comment in and if anything happens it is a trivial one to add back @harshachinta

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. But the opentelemetry documentation for Python guides to record exception for instrumentation libraries.
https://opentelemetry.io/docs/languages/python/instrumentation/#record-exceptions-in-spans

Can you share the code pointer on where the opentelemetry records exception by default when exiting?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@harshachinta it was the cause of us seeing 2 exceptions and took a ton of confusion and time for me to debug, they don't seem to document this condition.

@harshachinta harshachinta added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Dec 2, 2024
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Dec 2, 2024
@odeke-em
Copy link
Contributor Author

odeke-em commented Dec 2, 2024

@harshachinta kindly help me re-run the bots; all unit tests pass locally.

@sakthivelmanii sakthivelmanii added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Dec 2, 2024
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Dec 2, 2024
@odeke-em odeke-em force-pushed the trace-annotation-SessionPools branch from 498a70a to 060d17c Compare December 3, 2024 10:04
@harshachinta harshachinta added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Dec 3, 2024
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Dec 3, 2024
@odeke-em odeke-em force-pushed the trace-annotation-SessionPools branch from 060d17c to b400718 Compare December 5, 2024 10:31
@sakthivelmanii sakthivelmanii added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Dec 5, 2024
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Dec 5, 2024
This change adds annotations for session and session pool
events to aid customers in debugging latency issues with
session pool malevolence and also for maintainers to
figure out which session pool type is the most appropriate.

Updates googleapis#1170
@odeke-em odeke-em force-pushed the trace-annotation-SessionPools branch from b400718 to 1d7e440 Compare December 6, 2024 07:54
@rahul2393 rahul2393 added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Dec 6, 2024
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Dec 6, 2024
@harshachinta harshachinta merged commit a6811af into googleapis:main Dec 6, 2024
11 of 13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: spanner Issues related to the googleapis/python-spanner API. size: l Pull request size is large.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants