-
Notifications
You must be signed in to change notification settings - Fork 50
Proposal: record first local span's ID in SpanContext #229
Comments
Brave very recently introduced the concept of "local root", which is conceptually the same thing: openzipkin/brave#801 |
yes we also used this in testing aggregation techniques with expedia
haystack/adaptive alerting last week. in cases where you want to reduce
volume to just remote spans this can be handy
…On Mon, Jan 21, 2019, 5:46 PM Andrew Wilkins ***@***.*** wrote:
Brave very recently introduced the concept of "local root", which is
conceptually the same thing: openzipkin/brave#801
<openzipkin/brave#801>
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#229 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAD6156HEPBBeZsC29pOP-Pv1i-_7Epoks5vFYyFgaJpZM4aKA9h>
.
|
perf implications aside, I wonder if this will be the job for |
FYI, we are implementing the W3C Trace-Context draft, but we currently only use It could hypothetically be useful to propagate transaction IDs between processes, but we do not currently have a need for that. In our model, a transaction's parent may be either another (remote) transaction, or a (remote) span. We set the "span" ID in Regardless, it would require vendor-specific code in the application, right? And so the exporter service wouldn't be possible? |
FWIW the reason we said localRoot is to imply this isnt likely useful in
remote (propagated) contexts.
so it is a library concern but not likely in scope for headers
…On Tue, Jan 22, 2019, 1:45 AM Sergey Kanzhelev ***@***.*** wrote:
perf implications aside, I wonder if this will be the job for tracestate?
The whole idea of tracestate is to hold a position of a span in a
multiple distributed traces graphs. Holding Elasic's transaction ID it in
tracestate will also give you parent/child relationship between
transactions as tracestate will be propagated across processes.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#229 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAD614bmcu20n6sjQH07kVYarK3arT3sks5vFfyjgaJpZM4aKA9h>
.
|
Got it. So it's not an issue of how to implement it - there are at least two ways OC SDK should be able to support this scenario - via local-only scoped tags or via Is transaction ID the only thing needed? In Application Insights we used to have a notion of local operation name (basically localRoot.Name). This name is also useful (maybe even more) for aggregations. User ID and Session ID obtained by the incoming request handler are also good candidates to store in this localRoot context natively. Do you think those are in scope of this issue? |
so your question is whether to add more scope than the id? specifically
names are problematic as they change over time where the id does not. That
said, I wonder if you couldn't create a mapping to any data given the local
root.
…On Wed, Jan 23, 2019, 1:01 AM Sergey Kanzhelev ***@***.*** wrote:
Got it. So it's not an issue of how to implement it - there are at least
two ways OC SDK should be able to support this scenario - via local-only
scoped tags or via tracestate. It is about a new concept to be introduced
natively.
Is transaction ID the only thing needed? In Application Insights we used
to have a notion of local operation name (basically localRoot.Name). This
name is also useful (maybe even more) for aggregations. User ID and Session
ID obtained by the incoming request handler are also good candidates to
store in this localRoot context natively. Do you think those are in scope
of this issue?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#229 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAD61zEvWRK9iNK2YPkHFztv4QeYjybPks5vF0PMgaJpZM4aKA9h>
.
|
For us: currently, yes.
I think User ID and Session ID might fit better in Correlation-Context, since they may be useful for aggregations across services. IIANM, this is already handled with instrumentation-provided tags. For Elastic APM's exception reporting, we could make use of a locally propagated transaction name/type for aggregation/filtering. That doesn't fit with OpenCensus anyway though, so that's very hypothetical. |
@adriancole I'm wondering if local root ID is just one of properties alongside other things that may be promoted to local/remote tags or to the From API cleaness - implementing local root ID natively may require new methods like
Can you please elaborate on why local root ID fits and name/type doesn't from your perspective? |
I was referring to exception reporting here, not local root IDs. Elastic APM provides a means of reporting exceptions that occur within a transaction or span; OpenCensus does not (aside from setting the status of spans). If there were some mapping, then there might be additional local root/transaction properties that we would want to propagate locally.
I'm relatively inexperienced with OpenCensus, so can you please expand on that a bit? Specifically how is that cleaner than my proposal? In case my proposal was not clear, here it is again with references to the Tracer and SpanBuilder from the Java API: For Spans created via a SpanBuilder obtained with Tracer.spanBuilderWithRemoteParent, the span's ID will be recorded in its SpanContext as "local root ID". This will be inherited by descendant spans created within the same process. |
hi Sergey. having implemented this feature I can assure there is zero api
surface area. the first entry is implicit and can be done internal to the
tracer. in brave there was no new Api method needed to track local root as
it is defined.
like I said.. a span id is immutable. when you ask how things are different
between this and mutable data such as span name please ponder this. if it
isnt clear practice on your own and you will notice the difference.
…On Thu, Jan 24, 2019, 4:03 AM Sergey Kanzhelev ***@***.*** wrote:
so your question is whether to add more scope than the id?
@adriancole <https://github.com/adriancole> I'm wondering if local root
ID is just one of properties alongside other things that may be promoted to
local/remote tags or to the tracestate. There may be elaborate
configuration for this kind of attributes to tags promotion.
From API cleaness - implementing local root ID natively may require new
methods like StartScopedLocalRoot or ResetLocalRootId on span builder.
When implementing it via tags or tracestate extension module will keep APIs
simpler.
That doesn't fit with OpenCensus anyway though, so that's very
hypothetical.
Can you please elaborate on why local root ID fits and name/type doesn't
from your perspective?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#229 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAD619SHvuubWjKWDEavQ7eK41q6Vncnks5vGL_9gaJpZM4aKA9h>
.
|
Yeah, I meant "fullness", not "cleaness". Sorry. I looked at brave PR @adriancole posted above. My comment was that with any new concept there will be people who want to have better control over when local root being set and reset. One example may be if incoming request is tracked by automatic module, but one want to signal that the "real" local root is the span that is tracked from customer code and which has all the information representing local root like span name and attributes. Another scenario is a trace that initiate another trace as a background job. Given those may be just theoretical - @adriancole I wonder if anybody asked for this after this concept was introduced? Another two questions for the new concept I was trying to raise with my questions:
First is enough to have it. Second is nice to think about in advance. I'm all for the scenarios enabled via local root. And I understand desire to make this feature to be pre-installed. My first reaction on this is that configurable extension approach would be a better option here. What are your thoughts on whether it's universal and generic. |
I think the target is exporters not end users. I don't expect end users to
employ this data. it could seem fine to wait for more people to ask here
specifically. line brave we already waited until multiple use cases
presented themselves before adding the feature. it is no harm waiting for
people to make this popular if you have some gauge for what would make it
popular enough.
I would offer that the local root functionality is technically difficult if
possible to do as a bolt on. If there is an efficient way that doesnt add
api surface area personally I wasnt able to find it. for example the tracer
itself is the only thing that knows when a span id has ever hit it and
tracking that externally ironically might mean adding more api change to
make it possible (or reliance on tools like bytecode weaving)
…On Mon, Jan 28, 2019, 7:37 PM Sergey Kanzhelev ***@***.*** wrote:
Yeah, I meant "fullness", not "cleaness". Sorry. I looked at brave PR
@adriancole <https://github.com/adriancole> posted above. My comment was
that with any new concept there will be people who want to have better
control over when local root being set and reset. One example may be if
incoming request is tracked by automatic module, but one want to signal
that the "real" local root is the span that is tracked from customer code
and which has all the information representing local root like span name
and attributes. Another scenario is a trace that initiate another trace as
a background job. Given those may be just theoretical - @adriancole
<https://github.com/adriancole> I wonder if anybody asked for this after
this concept was introduced?
Another two questions for the new concept I was trying to raise with my
questions:
1. Is it universal and needed for every (or most) open census user?
2. Is it generic enough and whether the very next request will be to
expose other local root span properties?
First is enough to have it. Second is nice to think about in advance.
I'm all for the scenarios enabled via local root. And I understand desire
to make this feature to be pre-installed. My first reaction on this is that
configurable extension approach would be a better option here. What are
your thoughts on whether it's universal and generic.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#229 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAD61_gdAlu9DG0_MHDG5h0DDGHqbrt2ks5vH0NjgaJpZM4aKA9h>
.
|
Is it what Skywalking requested here? I might be missing bigger context here |
As Adrian says above, the target is exporters and not (directly) end-users. For anyone wanting to export OpenCensus spans to Elastic APM, this is necessary. I can't speak for any other solution, though it's clearly not universal as there are other exporters without this need. Every solution is a little different of course, taking advantage of their particular constraints.
I do not anticipate such requests coming from our end. |
Hi @axw and everyone. First: Sorry that I am late in this discussion. Second: The current design of the library assumes that SpanContext is the "propagated" between processes (client to server RPCs), which I don't think it is the case for this. I am trying to think to not add any other non-propagated property in the SpanContext, what if add this to the Span? PS: this is a bit of brainstorming so ideas are welcome. |
@bogdandrutu no worries.
Makes sense, and you're right that this does not need to be propagated inter-process.
Yes, that would work: record it on Span, and copy across to SpanData for the exporter. I've updated the opencensus-go PR accordingly. |
To make a design that does this in the Span, assumes the context in
process is tightly coupled with the span anyway (a single object w/1-1
relationship). It also assumes things that want to snoop on context
properties such as trace IDs are doing that via the span (ex logging
integration). If some of these aren't the case, I'd recommend putting
this with the rest of the properties in the same place so as to make
integrations coherent.
Also, there's the immutability factor. local root needn't be mutable.
|
@bogdandrutu census-instrumentation/opencensus-go#1029 was merged; not my intention, but that's fine by me if you're happy with it. How should we proceed? |
Rather than extending the span data can't this be an attribute? |
Attributes aren't mentioned at https://opencensus.io/tracing/span/, but based on what I've gleaned from code, I understand that attributes are meant to describe properties of the operation that a span represents, and should not be required for describing causality or anything else crucial to the tracing system. At least in the OpenCensus Go implementation, attributes are capped and can be dropped: https://github.com/census-instrumentation/opencensus-go/blob/57c09932883846047fd542903575671cb6b75070/trace/trace.go#L47-L49 |
Yea, true, it is meant to to drop when a limit is reached. If it isn't a "good to have" to improve experience then I guess that isn't an option. |
Is this information used during the trace? Or only on the elastic backend when bringing everything together? Spans can already be set that their parent is remote, not in the same process, which would signal that the span id for that span is the transaction id. It just wouldn't be available if needed when other children are creating/handling spans. |
The API spec of the Elastic APM Server requires the The UI needs that information in order to be able to see all spans of a transaction. This view is especially interesting to service owners which can use it to dig into why a particular transaction was slow, without the noise of the rest of the trace. This view also allows to "zoom out" and see the full trace. Another thing where the entry span can be useful is to collect skeleton traces (Adrian already elaborated about this use case). Basically, you'd only send up the entry spans aka transactions. In order for the parent ids to make sense, the client spans don't propagate their ids but rather their transaction ids. This request is not about adding support for skeleton traces, which may be a follow up for this. This is to enable a basic Elastic APM exporter. As mentioned, the APM Server requires the |
Yea, I wasn't suggesting it would fit the api today but what I thought was an alternative way for the Elastic APM Server to put this information together, instead of extending the opencensus span definition. |
…1029)" This reverts commit 57c0993. It was merged without updating the spec and resolving census-instrumentation/opencensus-specs#229
This reverts commit 57c0993. It was merged without updating the spec and resolving census-instrumentation/opencensus-specs#229
After some more investigation, we've come up with an alternative that unblocks creating an exporter for Elastic APM: make the transaction ID optional for spans. This may mean loss of future functionality, but the initial impact will be minimal. So this can be downgraded to "nice to have" for us. |
by the way, this is actively desired by zipkin users. I'm not sure why people are theoretical about this. It exists in multiple APMs systems and there seems to be a disbelief in the value. anyway I've seen you've reverted this, so we can add it to the "no idea why census' doesn't get it list" along with messaging spans |
I maintain the Elastic APM Go Agent. I've spent some time investigating whether we can implement an OpenCensus exporter, and I've hit a blocker that I think requires a change to opencensus-go.
The issue is that we have an additional concept of "transaction", which you can think of as the entry span in a service (e.g. an incoming HTTP request). Every span within a transaction records the trace ID, transaction ID, and (if it has one) parent span ID. Every span has a transaction ID, which may used for grouping. For each service in a trace, we would map the first local OpenCensus span (i.e. spans with a remote parent) to an Elastic APM transaction.
Because the IDs are all flowing to the exporter from opencensus-go, which has no concept of transaction, we cannot propagate the transaction ID to child spans in process.
Describe the solution you'd like
Record the first local span's ID in SpanContext / record the IDs of spans which have remote parents.
Describe alternatives you've considered
Alternative 1
I have considered keeping a map of OpenCensus span IDs to Elastic APM transaction IDs. This would be expensive to maintain, and would only work if the parent ends (i.e. is exported) before the child, which would be unusual.
Alternative 2
Another option would be something long the lines of the callbacks described in #70. Our exporter/plugin would be invoked when a span is created, along with the parent span context; we would store the transaction ID in tracestate, and propagate via this callback. There are a couple of issues with this approach:
Alternative 3
Change Elastic APM to make the transaction ID on spans optional. As of its current release (6.6), they are required. While we may lose some future functionality for trace data originating in the OpenCensus, but for now the impact would be minimal.
The text was updated successfully, but these errors were encountered: