Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DO NOT MERGE] Rename db|messaging|gen_ai.system to *.provider.name, rpc.system to rpc.protocol.name #1613

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

lmolkova
Copy link
Contributor

@lmolkova lmolkova commented Nov 26, 2024

Fixes #1581

Warning

THIS IS A VERY-VERY-VERY BREAKING CHANGE (even though it affects experimental attributes only).
Backends use presence of these attributes as an indication that corresponding spans follow certain conventions.

  • *.provider.name
  • rpc.protocol.name is more precise for RPC where it captures different application-or-higher-level protocol that may work on top of another application protocol (grpc/thrift over http/2)

Despite being problematic, we believe this change is necessary to accommodate future extension of semantic conventions for *.system.* and this is the last chance to make this attribute name right before we declare any of these attributes stable.

Merge requirement checklist


Given potentially high impact of this change and also upcoming holiday season, we'll need to keep it open for a while to collect the feedback

@lmolkova lmolkova requested review from a team as code owners November 26, 2024 01:40
schema-next.yaml Outdated
- rename_attributes:
attribute_map:
feature_flag.provider_name: feature_flag.system
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to extract it to another PR that we can merge right now to reduce churn in feature-flag area

@@ -33,7 +33,6 @@ with the naming guidelines for RPC client spans.

| Attribute | Type | Description | Examples | [Requirement Level](https://opentelemetry.io/docs/specs/semconv/general/attribute-requirement-level/) | Stability |
|---|---|---|---|---|---|
| [`rpc.system`](/docs/attributes-registry/rpc.md) | string | The value `aws-api`. | `aws-api` | `Required` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe it's used correctly here

Copy link
Member

@Flarna Flarna Dec 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is, AWS SDK is modeled as remote call. There is #1622 to add aws_api to the well known enums.

Anyhow, removing rpc.system but keep rpc.method and rpc.service doesn't seem right.

Copy link
Contributor Author

@lmolkova lmolkova Dec 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not a remote call though - it's a logical call that uses HTTP underneath. OTel RPC semconv represent actual remote calls using remote calling protocols.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, that's more or less the same for GRPC which uses HTTP/2 as transport.
Anyhow, likely a topic for a different issue.

But still I think you should either keep or remove all rpc attributes here.

@lmolkova lmolkova changed the title Rename db|messaging|gen_ai.system to *.provider.name, rpc.system to rpc.protocol.name, feature_flag.system back to feature_flag.provider_name [DO NOT MERGE] Rename db|messaging|gen_ai.system to *.provider.name, rpc.system to rpc.protocol.name Nov 26, 2024
@codefromthecrypt
Copy link
Contributor

codefromthecrypt commented Dec 10, 2024

I have some opinions on this, which I don't expect to be accepted, but would like on the record even if dismissed.

As genai is typically accessed from web service apis which are commodity and easier than the typical case of database protocols, I would prefer changes to be discussed independently.

For the case of genai, I do strongly prefer whatever word we use for api (e.g. openai which is often emulated) to be decoupled from the provider of that. However, I would like this to advise to use the same constant even if in some cases the client might get it wrong. For example, don't say something like "azure inference client" as the system or provider name, for when it is being used to access azure openai service with the openai api. Especially a proxy client (litellm also has a proxy client library), should attempt to do more than say its name.

The field will be used for both sides, and it seems a step back to end up with client SDK names in either of these fields, vs a web service api, possibly private provider of that. The edge case of local llms, e.g. running llama.cpp embedded, we can discuss, too, but most of the time this is about web services.

more below if you like


I think there is ongoing tension treating genai like database, when in practice they are typically accessed via web services apis and so more like a normal cloud service than a database. This becomes even more the case as more and more are instrumented server side, or as proxies. For example, while there are indeed some L7 proxies for database protocols, it is a lot more common and easier to do on genai as most if not all well known services are web services. So, there are backends who instrument the http server side like a normal private cloud product, as well as proxies who instrument both ways today.

So, I would like to decouple the topic of database from genai, especially this sense of provider being implied as a client identifier. I'm not sure "provider" or "system" for that matter intuitively mean "client perceived name" especially when the attribute isn't namespaced as client and also used for server data (in practice even if only in genai metrics are the only server thing so far).

I cofounded a project back in 2009 called jclouds which has very much overlap on the problem of api vs provider of that. When the community got together, probably the single best feature we did was formalizing "api" independent of "provider" of that. So, while I do like the idea of using provider, I would like it to be more strong, so it at least suggests to use an authoritative name for the actual service as opposed to a client SDK name, especially in SDKs that target multiple services.

https://jclouds.apache.org/start/concepts/#:~:text=Providers%20implement%20APIs%2C%20and%20describe,those%20offered%20by%20the%20API.

@lmolkova
Copy link
Contributor Author

@codefromthecrypt can you suggest what'd you like to be changed/added ?

Copy link

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions github-actions bot added the Stale label Dec 26, 2024
@lmolkova lmolkova removed the Stale label Dec 26, 2024
Copy link
Contributor

@codefromthecrypt codefromthecrypt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are my suggestions, and most are limited to gen_ai, however one isn't.

Generically:

  • use less dots: unless we expect provider.something_else just call it provider. Similar for protocol.name. e.g. protocol instead of protocol.name.

In GenAI:

  • Correct the text about provider as it has a very indirect relationship with models in many cases. For example, if it is about models, rename it to model_family or something. Concretely, many LLM providers host model families owned elsewhere.
  • The "Provider" concept should be the, possibly off the shelf, platform in use, for example OpenAI (the platform) vs Azure OpenAI vs xAI vs ollama. In some cases, the client won't know this without looking at the hostname.
  • As many are emulating apis, and the client may not be able to guess the "provider", or choose not to based on hostname etc. use "api" to establish that. For example, commonly for at least several systems it can be "openai" even if the provider is not that. Similarly, on a proxy, you may have an inbound "openai" compatible request despite the provider being that proxy.

If these changes aren't specific enough, @lmolkova lemme know and I'll try again!

@lmolkova
Copy link
Contributor Author

@codefromthecrypt this PR is limited to renaming gen_ai.system to something else without changing the meaning of it. Can you suggest a specific name we should use for the thing we want to capture there?

You're welcome to propose any additional clarifications, caveats and changes to gen_ai.system meaning (but they would need to be sent as a separate PR).

@codefromthecrypt
Copy link
Contributor

@lmolkova renaming gen_ai.system to gen_ai.provider then my going back with other comments elsewhere is fine. Mainly on the rename topic, I don't like adding a .name suffix for reasons mentioned earlier.

Hope this helps!

@codefromthecrypt
Copy link
Contributor

specifically I think it will be the "api" that has a version associated with it, which would be something we can resolve elsewhere. The provider itself doesn't need to add dots in defense of adding a version to it. I would prefer it simple and us take the api/ api version stuff in another PR. Make sense?

@lmolkova
Copy link
Contributor Author

@codefromthecrypt thanks for the suggestion. I agree provider makes sense. Adding .name is important for the reasons mentioned in the PR description and in the original issue.

…ider.name, pc.system to pc.protocol.name, feature_flag.system back to feature_flag.provider_name
@@ -0,0 +1,5 @@
change_type: breaking
component: db, messaging, gen_ai, rpc
note: Rename `db|messaging|gen_ai.system` to `db|messaging|gen_ai.provider.name`, `rpc.system` to `rpc.protocol.name`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding this comment here for lack of a better place and so it can be marked resolved when appropriate.

I think I prefer the word "system" over "provider".

When I read "db.provider.", "messaging.provider.", "gen_ai.provider.*", my first reaction is: provider of what? Provider of the db? No.. the concept we're trying capture is the "The database management system (DBMS) product as identified by the client instrumentation." Provider of the messaging? That doesn't seem right. Provider of the gen_ai? In all these cases we're trying to capture client's understanding of the type of system its communicating with.

"provider" is also strange when you consider expanding into the server side. In #1581 I proposed we may one day want to add an attribute representing the version of the db system, which would yield db.provider.version. db.system.version is more intuitive to me.

The word "system" seems just generic enough to be applicable to a broad set of domains. Its intuitive for a "system" to have name and version attributes describing it. Its not obvious to me what a "provider" is in the context of different domains, and what a provider's name and version attributes are describing. (I.e. why does a provider need a name and version in the first place?)

The use of "protocol" for the rpc domain is an interesting divergence. The "protocol" according to the client's POV is really the thing we're capturing in all these cases. I.e. a db client may report "postgres" even though the system behind the scenes it not postgres but has added postgres compatibility - "postgres" is just the protocol. This suggests we could go in a different direction and use a client-oriented naming convention, choosing a word that makes it abundantly clear that this is the name of the protocol as far as the client is concerned. But if we did this, we'd have invent new attributes to refer to the server side of things and correlation would suffer.

That's another reason I like the word "system". It seems to make enough intuitive sense from both a client and server perspective that we can get away with using the same word in both places.

Copy link
Contributor

@codefromthecrypt codefromthecrypt Jan 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to me system is better if we are trying to fit a name to multiple domains, however it is clear as mud here. It is used for a model family, and api and a platform. When we use "provider" it fits into "LLM provider" but then again this namespace is likely to be used for things besides pure LLM proxies.

I have seen "cloud provider" used, and it has potentially multiple apis, and that's the salient point of being not terrible for genai. Most services are web services even if they aren't always a cloud platform. platform could be a better word.

In any case, the concrete problem I've found that should be changed in genai isn't about system not having a .name suffix, rather system being used for unclear reasons. If it were broken down into what impacts things, it would be time better spent, vs justifying a global rename.

system ~= platform, cloud, etc, e.g. openai (the platform), xai, etc
api ~= the api used on that system which hints at the SDK, e.g. openai (used by many) sometimes others also re-use non-open apis. Even openai has a v2 now, and soon that's likely to be emulated! Also, some systems you could argue the invocation style matters. e.g. bedrock has a converse api which is more portable than its other ones. Long story short, the api has a lot more structured info needed, than the name of the system. an end user could convince me easily otherwise.
model_family ~= if we really want to know the model family separate from the model requested and responded with, we could make a bucket for that, but conflating model family with system and api breaks down quickly.

I know I'll be told to make separate issues on these, etc, but I think those should happen before doing things for global namespaces as some of these change the motivation entirely. I also think these things should be led by the leads, I'm not even an approver, so take it for what it's worth.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to me system is better if we are trying to fit a name to multiple domains, however it is clear as mud here. It is used for a model family, and api and a platform

Yeah system (or whatever the word) is overloaded to refer to multiple things, and semantics vary based on whether we're talking about client or server's perspective. Tradeoff between distinct attributes for distinct concepts, or reusing attributes in different contexts with reduced conceptual purity.

I have seen "cloud provider" used, and it has potentially multiple apis, and that's the salient point of being not terrible for genai. Most services are web services even if they aren't always a cloud platform. platform could be a better word.

The "provider" in "cloud provider" probably refers to the many services or apis provided by the cloud. Services being things like ec2, or rds, or gke. Applying "provider" to domains like messaging, database, etc doesn't seem right from this perspective. I don't mind platform - it has some of the same properties as system.

system ~= ...
api ~= ...

Whether or not we should differentiate between these is the tradeoff I was referring to above. Also, API might be the right word in genai, but seems like protocol is a better word in database, messaging, and rpc (if we do want to differentiate).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I think you are making my most important point, which is that we shouldn't make a change that couples db|messaging|gen_ai and maybe also rpc. We shouldn't have to defend one of these in the middle of a big change on other things. The pressure isn't helpful to making the model clear, even if the pressure is helpful for less PRs and more naming consistency regardless of domain it applies to.

TL;DR; These should be individually discussed and not assumed they should converge or be pressured to.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TL;DR; These should be individually discussed and not assumed they should converge or be pressured to.

That seems right, since its unlikely that we can find a single word which is the best across all domains. But I think what all these domains have in common is whether or not they should all follow this advice:

I have an arbitrary (and optional) rule of thumb for attribute names to follow {domain}.{thing}.{a property of the thing} pattern.

In all these domains, the current word "system" is being used as "{a property of the thing}" when the current semantics are actually more of a "{thing}".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the "system" point, when in a hole, stop digging. We should leave it alone until a conceptually sound alternative is made and in the case of genai it is likely multiple attributes vs multiple sub-attributes one being .name.

That's because breaking once to rename system to whatever.name, then later correct the real conceptual problem, so put it into a different X.name, that's two hard breaks.

Leaving system alone until the correct destination of the values are found is less breaks, less arduous work for folks who may not even see this PR, only find out later everything is broken and you have to re-do your indices, metrics, alerts etc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we should totally minimize churn. If we're going to make a breaking change, let's make sure we have the best possible shot at getting it right this time around.

To be fair, this PR is annotated with "do not merge" in the title and has language indicating it will be open for a while to encourage feedback from a wide array of people. I think this is a tactic to minimize churn.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging this pull request may close these issues.

Should db.system be db.system.name?
4 participants