Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add db.cosmosdb.partition_key as a span attr on item ops #23664

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

kjg
Copy link
Contributor

@kjg kjg commented Oct 28, 2024

It would be useful to have the partition_key as an attribute on Item operations to make it easier to see if there are any performance based patterns in certain partitions, what are you thoughts on including it?

I know that db.cosmosdb.partition_key isn't in the OpenTelemetry Semantic Conventions (yet) as an attribute, but we could open a PR for that as well.

  • The purpose of this PR is explained in this or a referenced issue.
  • The PR does not update generated files.
  • Tests are included and/or updated for code changes.
  • Updates to module CHANGELOG.md are included.
  • MIT license headers are included in each file.

@github-actions github-actions bot added Community Contribution Community members are working on the issue Cosmos customer-reported Issues that are reported by GitHub users external to the Azure organization. labels Oct 28, 2024
Copy link

Thank you for your contribution @kjg! We will review the pull request and get back to you soon.

@Pilchie
Copy link
Member

Pilchie commented Oct 30, 2024

@FabianMeiswinkel @sourabh1007 @jcocchi - any thoughts here on the impact of this?

@analogrelay
Copy link
Member

My main concern is PII leakage. The query text and parameters are in the Semantic Conventions but with explicit notes about sanitization. The partition key could be considered PII in some cases. I don't know for sure if we currently emit db.query.text or db.query.parameter.<key> but they're all in the same "class" of value to me, and should be opt-in.

I'm OK having it as an option, it does seem quite useful if you know your PKs are not PII.

@kjg
Copy link
Contributor Author

kjg commented Oct 30, 2024

azcosmos does not yet emit db.query.text but it actually is on my list of PRs I'd like to make. I'll see what I can do about making it optional. Any thoughts on the best place for consumers to specify the option and if it should be opt-in or opt-out?

For my team's use case, I could see us wanting to emit db.query.text and db.cosmosdb.partition_key but probably not db.query.parameter.<key>, so I'm not sure if we'd want to make them separately optional?

@analogrelay
Copy link
Member

azcosmos does not yet emit db.query.text but it actually is on my list of PRs I'd like to make.

I absolutely think we need db.query.text to be opt-in, given the requirement in the OpenTelemetry guidance (which aligns with our own privacy/security guidance) to sanitize query text. I don't think it's feasible for us to add sanitization logic, so having it be opt-in is the appropriate alternative.


One thought is to have a list of "enabled tracing attributes" in azcosmos.ClientOptions. This would be a list of opt-in tracing attributes which the user is opting in to. Alas, my Go is rusty, but I don't think you can initialize a slice inside a struct initializer. What I'd like is to be able to write this:

options := azcosmos.ClientOptions {
    EnabledTracingAttributes: []string { "db.query.text", "db.cosmosdb.partition_key" },
}

But I don't know if that's feasible.

There's also the element of having to scan this list every time we build a span, to identify which attributes to specify.

I'd be curious what, if anything, other SDKs (both other Azure services' Go SDKs and Azure SDKs for other languages) do here 🤔 . Might poke around a bit and see what I can find.

@sourabh1007
Copy link

azcosmos does not yet emit db.query.text but it actually is on my list of PRs I'd like to make.

I absolutely think we need db.query.text to be opt-in, given the requirement in the OpenTelemetry guidance (which aligns with our own privacy/security guidance) to sanitize query text. I don't think it's feasible for us to add sanitization logic, so having it be opt-in is the appropriate alternative.

One thought is to have a list of "enabled tracing attributes" in azcosmos.ClientOptions. This would be a list of opt-in tracing attributes which the user is opting in to. Alas, my Go is rusty, but I don't think you can initialize a slice inside a struct initializer. What I'd like is to be able to write this:

options := azcosmos.ClientOptions {
    EnabledTracingAttributes: []string { "db.query.text", "db.cosmosdb.partition_key" },
}

But I don't know if that's feasible.

There's also the element of having to scan this list every time we build a span, to identify which attributes to specify.

I'd be curious what, if anything, other SDKs (both other Azure services' Go SDKs and Azure SDKs for other languages) do here 🤔 . Might poke around a bit and see what I can find.

@analogrelay We have added db.query.text as an opt-in attribute (ref. here) in cosmos DB SDK.

@sourabh1007
Copy link

@FabianMeiswinkel @sourabh1007 @jcocchi - any thoughts here on the impact of this?

As @analogrelay pointed out, the primary concern here is protecting PII, particularly if referencing logical partition keys tied to customer data. Instead, we could use physical identifiers such as partitionid (and replicaid) or pkrangeid, which would help in identifying potential hot partition issues without exposing sensitive information.

In the Cosmos DB SDK, we currently don't emit traces at the network level, as network interactions can involve multiple partition IDs per operation. Therefore, we haven't included this data in traces yet. However, we are planning to introduce it as an optional dimension in network-level OpenTelemetry (OTel) metrics (ref here). This would allow users to opt in as needed, helping to manage cardinality effectively.

@analogrelay
Copy link
Member

Since the Go SDK is purely gateway mode and doesn't support queries that cross logical partitions, I think using physical identifiers would be difficult. I don't believe we have any logic to fetch the pkrangeid in the Go SDK today. So, I'm OK with providing an option to trace the logical partition key. I don't think pkrangeid is what a user would expect to be able to trace here, though I'm fine if we also want to add that at some point.

We definitely need an opt in. I think my earlier suggestion wouldn't be a good fit here, since we've already established some patterns in other SDKs for opt-in telemetry. I think what I'd propose is to add a CosmosClientTelemetryOptions struct, embed it in CosmosClientOptions, and add some kind of setting for enabling partition key logging (IncludePartitionKey?). We can start there, and if we want, we can also embed a CosmosClientTelemetryOptions in some operation-specific options (like QueryOptions) to allow per-operation overrides, but that's not critical right now if you don't need it @kjg .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Community Contribution Community members are working on the issue Cosmos customer-reported Issues that are reported by GitHub users external to the Azure organization.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants