Cloud Run example #62

kintel · 2020-05-15T15:04:33Z

To make sure OpenTelemetry and the JS exporter supports Cloud run, write an example that can be deployed to Cloud Run and includes:

Auto-instrumentation of incoming requests
Auto-instrumentation of outgoing requests
(optional) manual instrumentation of outgoing request

Then:

Documentation for how to deploy
Verification that all spans are visible in Google Cloud Console
Verification that canonical labels exist for all spans: SERVICE, REVISION
Verification that canonical label naming follows the OpenTelemetry spec

Optional:

Automated tests for as much of the above as possible

This ticket is can be split into sub-tickets if appropriate.

patryk-smc · 2021-06-28T06:33:02Z

Any progress on this? OpenTelemetry doesn't work on Cloud Run / Cloud Functions yet, correct?

aabmass · 2021-07-07T20:36:46Z

@patryk-smc I don't have an actual example but tracing should work fine. Use a regular BatchSpanProcessor and call TracerProvider.shutdown() before your program ends. For Cloud Run, you can add a SIGTERM handler to call shutdown. Not sure about Cloud Functions.

Metrics are a little more complicated unfortunately.

pebo · 2021-09-03T13:28:41Z

Are there any news on the metrics side of things? I did a quick test run and got errors like:

Send TimeSeries failed: One or more TimeSeries could not be written: Points must be written in order. One or more of the points specified had an older start time than the most recent point

Checking the code, a label opentelemetry_task gets added automatically and the value is set to 'nodejs-' + pid + '@' + hostname;

In Cloud Run the label value will always be something like nodejs-1@localhost and hence it does not help at all. To avoid problems with out of order or to frequent updates of a single time series we could assign a custom label, e.g. with the container instance id as the value, but this would lead to an explosion of the number of time series created and I guess seriously affect the time series query performance.

What's the plan forward for supporting open telementry metrics from cloud run? Is there a workaround for the problems?

aabmass · 2021-09-14T19:03:24Z

Thanks for the report on that @pebo, I was not aware of that, but our plans should fix this. We are not actively working on the metrics exporter right now because the upstream OTel metrics API is going through many breaking changes.

legendsjohn · 2022-07-08T16:42:51Z

@aabmass Any update on this? Looking at documentation here: https://cloud.google.com/trace/docs/setup/nodejs-ot, support seems to be implied?

For anyone else looking at this, switching from a SimpleSpanProcessor to a BatchSpanProcessor like @aabmass mentioned was the key to get tracing to work well for me - although I don't fully understand why this would be the case.

legendsjohn · 2022-08-03T16:17:39Z

For anyone else having issues with tracing and Cloud Run, there were two primary changes I had to implement for them to be exported properly:

Use a BatchSpanProcessor with Google's TraceExporter, ie:

provider.addSpanProcessor(new BatchSpanProcessor(new TraceExporter()));

Force all traces to be sampled with an AlwaysOnSampler, ie:

export const provider = new NodeTracerProvider({
  sampler: new AlwaysOnSampler(),
  resource: new Resource({
    [SemanticResourceAttributes.SERVICE_NAME]: processName,
  }),
  //...
});

Obviously an AlwaysOnSampler might not be the best choice for everyone, but I wanted to ensure all of the traces I expected were being recorded in Cloud Trace.

The second change, forcing all traces to be sampled with an AlwaysOnSampler, was very time consuming for me to figure out. I had similar issues to open-telemetry/opentelemetry-js#3057 (albeit on a completely different platform).

If I ran my instance locally and exported traces to Cloud Trace, all of them would appear. However, after pushing my instance to Cloud Run, only traces associated with serving HTTP OPTIONS requests would be exported. Changing the sampler fixed this issue and all traces, associated with all requests, were logged.

legendsjohn · 2022-08-03T16:23:30Z

@aabmass shutdown in TraceExporter doesn't seem to be currently implemented:

opentelemetry-operations-js/packages/opentelemetry-cloud-trace-exporter/src/trace.ts

Line 92 in c64613e

async shutdown(): Promise<void> {}

Regardless, calling shutdown didn't solve or change any of my issues with Cloud Run & Cloud Tracing

aabmass · 2023-02-06T18:54:22Z

It's interesting you needed AlwaysOnSampler, that shouldn't be strictly required. The default OTel sampler is ParentBased(root=AlwaysOn) which would only not sample if the parent is indicating not to sample (the incoming request header) . Which propagator were you using? I believe Cloud Run supports W3C traceparent propagation (OTel's default) and will actually do adaptive sampling for you.

@aabmass shutdown in TraceExporter doesn't seem to be currently implemented:

We don't have any buffering in the trace exporter so the empty implementation is intentional. Calling TracerProvider.shutdown() will also call shutdown() on the BatchSpanProcessor which will flush the spans it has batched. In cases with very sparse/bursty traffic, many serverless processes may be short lived and the buffered spans would never be sent without calling shutdown().

It sounds like at least a minimal example would be useful here, so I'll leave this issue open and up for grabs.

AkselAllas · 2023-02-07T07:31:07Z

@aabmass GCP parent samples by default.

https://anecdotes.dev/opentelemetry-on-google-cloud-unraveling-the-mystery-f61f044c18be
https://cloud.google.com/trace/docs/setup#force-trace

aabmass · 2023-02-07T17:09:18Z

@AkselAllas what do you mean samples by default? Afaik it will do some adaptive sampling based on current QPS, which is why the author of the first blog post had missing spans unless they used always-on sampling.

AkselAllas · 2023-02-08T07:26:32Z

Ok. It might be adaptive sampling. 🤔

I experienced ~0.5 sampling even with very low QPS on cloud run.

eduardosanzb · 2023-03-16T14:00:59Z

Hi; just to check if anyone is having issues to get the route attribute populated for the cloud_run metrics request_count

aabmass · 2023-03-16T17:37:04Z

@eduardosanzb those metrics are not related to this repo or OpenTelemetry. I'd recommend reaching out to support, but I don't think the route label is ever populated.

eduardosanzb · 2023-03-16T17:43:16Z

@aabmass Thanks!

steve-marmalade · 2023-04-07T04:26:40Z

For anyone else having issues with tracing and Cloud Run, there were two primary changes I had to implement for them to be exported properly:

Use a BatchSpanProcessor with Google's TraceExporter, ie:

provider.addSpanProcessor(new BatchSpanProcessor(new TraceExporter()));

@legendsjohn , Regarding this point, did you also set the CPU Allocation to CPU always allocated? The documentation states that:

Selecting CPU always allocated allows you to execute short-lived background tasks and other asynchronous processing work after returning responses. For example:

Leveraging monitoring agents like OpenTelemetry that may assume to be able to run in the background.

My hope is to run with CPU only allocated during request processing as this has major cost implications, but it is not clear to me if if the solution @aabmass proposed of trapping SIGTERM and calling TracerProvider.shutdown() handles the case when the container is still "Serving" but CPU has been deallocated, per:

aabmass · 2023-07-05T17:14:25Z

it is not clear to me if if the solution @aabmass proposed of trapping SIGTERM and calling TracerProvider.shutdown() handles the case when the container is still "Serving" but CPU has been deallocated, per:

That's fair, I don't think that alone would help. However, with retries (#523), hopefully the request would succeed in the background when CPU is next allocated. Another option is to call await MeterProvider.forceFlush() before responding to your request so the spans are flushed while CPU is allocated.

aabmass · 2023-07-05T17:15:58Z

Anyone else coming to this issue, have you tried the OTLP exporter to an OpenTelemetry collector sidecar in Cloud Run? The collector has more robust retry and batching logic and could solve your issues. You can flush data from OpenTelemetry SDK to the collector at the end of each request which should be very fast.

JonathanHope · 2023-09-12T23:00:02Z

I had a ton of trouble trying to get cloud trace exports working on cloud run. The suggestions in this thread were super helpful and particularly @aabmass was right on the money a couple of times. I pushed up a minimal working example here in the hopes that will save someone else some time.

kintel changed the title ~~Cloud Run: example w/docs~~ Cloud Run Support May 15, 2020

punya added enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed and removed help wanted Extra attention is needed labels Feb 1, 2021

damemi assigned aabmass Jan 30, 2023

aabmass added the enhancement accepted An actionable enhancement for which PRs will be accepted label Feb 6, 2023

aabmass changed the title ~~Cloud Run Support~~ Cloud Run example Feb 6, 2023

dashpole added the priority: p3 label Feb 28, 2023

aabmass mentioned this issue Jul 5, 2023

Handle BatchWriteSpans retries when a retryable error occurs #523

Open

jaredellison-nyt mentioned this issue Sep 6, 2023

Documentation request: details on sidecar instance scaling and traffic routing GoogleCloudPlatform/opentelemetry-cloud-run#5

Open

macrozone mentioned this issue Feb 13, 2024

@google-cloud/opentelemetry-cloud-trace-exporter does not work in cloud run #677

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cloud Run example #62

Cloud Run example #62

kintel commented May 15, 2020 •

edited

Loading

patryk-smc commented Jun 28, 2021

aabmass commented Jul 7, 2021 •

edited

Loading

pebo commented Sep 3, 2021

aabmass commented Sep 14, 2021

legendsjohn commented Jul 8, 2022 •

edited

Loading

legendsjohn commented Aug 3, 2022 •

edited

Loading

legendsjohn commented Aug 3, 2022 •

edited

Loading

aabmass commented Feb 6, 2023 •

edited

Loading

AkselAllas commented Feb 7, 2023

aabmass commented Feb 7, 2023

AkselAllas commented Feb 8, 2023 •

edited

Loading

eduardosanzb commented Mar 16, 2023

aabmass commented Mar 16, 2023

eduardosanzb commented Mar 16, 2023

steve-marmalade commented Apr 7, 2023

aabmass commented Jul 5, 2023 •

edited

Loading

aabmass commented Jul 5, 2023

JonathanHope commented Sep 12, 2023

Cloud Run example #62

Cloud Run example #62

Comments

kintel commented May 15, 2020 • edited Loading

patryk-smc commented Jun 28, 2021

aabmass commented Jul 7, 2021 • edited Loading

pebo commented Sep 3, 2021

aabmass commented Sep 14, 2021

legendsjohn commented Jul 8, 2022 • edited Loading

legendsjohn commented Aug 3, 2022 • edited Loading

legendsjohn commented Aug 3, 2022 • edited Loading

aabmass commented Feb 6, 2023 • edited Loading

AkselAllas commented Feb 7, 2023

aabmass commented Feb 7, 2023

AkselAllas commented Feb 8, 2023 • edited Loading

eduardosanzb commented Mar 16, 2023

aabmass commented Mar 16, 2023

eduardosanzb commented Mar 16, 2023

steve-marmalade commented Apr 7, 2023

aabmass commented Jul 5, 2023 • edited Loading

aabmass commented Jul 5, 2023

JonathanHope commented Sep 12, 2023

kintel commented May 15, 2020 •

edited

Loading

aabmass commented Jul 7, 2021 •

edited

Loading

legendsjohn commented Jul 8, 2022 •

edited

Loading

legendsjohn commented Aug 3, 2022 •

edited

Loading

legendsjohn commented Aug 3, 2022 •

edited

Loading

aabmass commented Feb 6, 2023 •

edited

Loading

AkselAllas commented Feb 8, 2023 •

edited

Loading

aabmass commented Jul 5, 2023 •

edited

Loading