Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AI Agent Span Semantic Convention #1657

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

gyliu513
Copy link
Member

@gyliu513 gyliu513 commented Dec 6, 2024

Fixed part of #1530

Changes

Please provide a brief description of the changes here.

Note: if the PR is touching an area that is not listed in the existing areas, or the area does not have sufficient domain experts coverage, the PR might be tagged as experts needed and move slowly until experts are identified.

Merge requirement checklist

@gyliu513 gyliu513 requested review from a team as code owners December 6, 2024 16:49
@gyliu513 gyliu513 marked this pull request as draft December 6, 2024 16:49
@gyliu513
Copy link
Member Author

gyliu513 commented Dec 6, 2024

@lmolkova @lzchen @nirga @karthikscale3 @drewby this is the very draft version, we may need a long discussion for this, hope we can start from here.

Please share your comments here, actually, I do not know if we want to put the ai agent semantic convention to same folder as gen-ai or we need a new folder for ai-agent. Thanks!

Copy link
Contributor

@lmolkova lmolkova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few general points:

  • we should not create attributes that would be the same as existing gen_ai attributes. We should use those instead of defining agent ones by default
  • we need to define everything in yaml and stay compatible with the schema

I have a draft here - microsoft#3 for OpenAI assistant-like API which covers a lot of similar things, PTAL


| Attribute | Type | Description | Example | Requirement Level | Stability |
| ------------------------------ | ------ | ------------------------------------------ | -------------------------------- | ----------------- | --- |
| `ai_agent.agent.name` | string | Name of the agent. | `Researcher Bot` | Required | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those are still genai agents, so I think this out should be gen_ai.agent.name

| `ai_agent.agent.role` | string | Role assigned to the agent. | `Data Collector` | Required | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
| `ai_agent.agent.backstory` | string | Background story or context for the agent. | `Specializes in web data mining` | Required | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
| `ai_agent.agent.workflow_name` | string | Name of the workflow the agent is part of. | `Data Processing Pipeline` | Required | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
| `ai_agent.agent.model` | string | Underlying model powering the agent. | `gpt-4` | Required | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how's agent model is different from gen_ai.request.model?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

| `ai_agent.agent.backstory` | string | Background story or context for the agent. | `Specializes in web data mining` | Required | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
| `ai_agent.agent.workflow_name` | string | Name of the workflow the agent is part of. | `Data Processing Pipeline` | Required | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
| `ai_agent.agent.model` | string | Underlying model powering the agent. | `gpt-4` | Required | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
| `ai_agent.agent.tools` | array | List of tools available to the agent. | `["Web Scraper", "Analyzer"]` | Recommended | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how'd it be different from generic gen_ai tool?

@svrnm
Copy link
Member

svrnm commented Dec 16, 2024

hey there,

I think this is a great starting point for a very hot topic right now. There is one major comment I have on that, and it's around the workflow and task. If I understand it correctly there is a workflow to task relationship and the task, I would assume they are modelled in a parent span, span relationship? So a workflow is a "parent span" and then tasks are "child spans" to that, is this a correct assumption?

If that's correct and if I look at your examples, a workflow (and maybe even a task?) can be very long running, which is a currently unsolved piece of the otel specification, so maybe this need for AI Agents being modelled could help to be a driven force behind providing a beter specification for that, because I additionally see workflow and task not being unique to AI agents, see CICD pipeline attributes for example.

@gyliu513
Copy link
Member Author

gyliu513 commented Dec 16, 2024

If that's correct and if I look at your examples, a workflow (and maybe even a task?) can be very long running, which is a currently open-telemetry/opentelemetry-specification#373 piece of the otel specification, so maybe this need for AI Agents being modelled could help to be a driven force behind providing a beter specification for that, because I additionally see workflow and task not being unique to AI agents, see CICD pipeline attributes for example.

@svrnm Yes, this is the case, at least from my point of view, the workflow and task relationship is very similar as the CICD pipeline attributes.

Let me review microsoft#3 from @lmolkova first, and I will try to update my PR soon after some discussion on microsoft#3

we should not create attributes that would be the same as existing gen_ai attributes. We should use those instead of defining agent ones by default
we need to define everything in yaml and stay compatible with the schema

@lmolkova yes, let me consolidate the agent attributes to gen_ai, but let me first go through you PR microsoft#3 first, thanks!

@karthikscale3
Copy link
Contributor

hey there,

I think this is a great starting point for a very hot topic right now. There is one major comment I have on that, and it's around the workflow and task. If I understand it correctly there is a workflow to task relationship and the task, I would assume they are modelled in a parent span, span relationship? So a workflow is a "parent span" and then tasks are "child spans" to that, is this a correct assumption?

If that's correct and if I look at your examples, a workflow (and maybe even a task?) can be very long running, which is a currently unsolved piece of the otel specification, so maybe this need for AI Agents being modelled could help to be a driven force behind providing a beter specification for that, because I additionally see workflow and task not being unique to AI agents, see CICD pipeline attributes for example.

@svrnm / @gyliu513 - Just wanted to add my 2 cents here. This is very much an issue and thanks for bringing it up. The OTEL instrumentation we have(at Langtrace) for frameworks like CrewAI, DSPy etc. runs into this issue from time to time where traces have, in occasions 100s of spans as part of the same trace. An option to flush spans in progress will be ideal for these scenarios so the user can see realtime feedback on the UI for ongoing agentic sessions. Having said that, a vast majority of the agents we are seeing(from our perspective) still work well with the existing capabilities. But, we definitely need to think about this sooner than later.

| ------------------------------ | ------ | ------------------------------------------ | -------------------------------- | ----------------- | --- |
| `ai_agent.agent.name` | string | Name of the agent. | `Researcher Bot` | Required | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
| `ai_agent.agent.role` | string | Role assigned to the agent. | `Data Collector` | Required | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
| `ai_agent.agent.backstory` | string | Background story or context for the agent. | `Specializes in web data mining` | Required | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICT, role and backstory are very crewAI concepts that may or may not be applicable to other frameworks. So maybe we should consider making them specific to a crewAI namespace?


| Attribute | Type | Description | Example | Requirement Level | Stability |
| --------------------------- | ------- | ------------------------------------------------------------------------ | ---------------------------------- | ----------------- | --- |
| `ai_agent.task.name` | string | Name of the task. | `Data Collection` | Required | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we also need a ai_agent.task.input in addition to these?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious if we can consolidate it under gen_ai.system|user|tool|assistant.message events rather than attirbutes

| Attribute | Type | Description | Example | Requirement Level | Stability |
| ------------------------ | ------ | -------------------------------------------- | ----------------- | ----------------- | --- |
| `ai_agent.tool.name` | string | Name of the tool utilized by the agent. | `Web Scraper` | Required | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
| `ai_agent.tool.function` | string | Specific function or capability of the tool. | `Data Extraction` | Recommended | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we also need a ai_agent.tool.input in addition to these?

@karthikscale3
Copy link
Contributor

Nice first draft @gyliu513 . thanks for starting this.

@codefromthecrypt
Copy link
Contributor

@gyliu513 I'm giving you unsolicited advice and being intentionally not specific to a change here, because I think more research would lead you to your own changes. That's always the best (in my mind). Hope it helps.

One of the main gains we had in re-organizing the llm now genai sig to have a space in otel-contrib python was to be able to practice specs before committing to it. I have seen this in practice done in java and it helps quite a bit.

  • Are you keen on instrumenting a draft PR on some open source agent library you believe is valid for this semconv
  • How about an example PII washed feed from agentic cloud provider data, which would translate

Another thing to guide is especially bookend timestamps sounds like a discussion that would have happened here in another domain (start_xx end_xx). Certainly, it happened way back in zipkin days with "cs" "cr" though these were separate events. Can you research some prior work in otel where a spec like this was accepted or denied?

@gyliu513
Copy link
Member Author

@codefromthecrypt good comment, thanks and happy holidays!

How about an example PII washed feed from agentic cloud provider data, which would translate

Can you please share more detail for your comment here?

I was now reviewing microsoft#3 and this PR really helped a lot, I will probably update my PR soon after new year based on microsoft#3.

@codefromthecrypt
Copy link
Contributor

@gyliu513 for this comment I made "How about an example PII washed feed from agentic cloud provider data, which would translate"

What I mean is that we most of the time assume the data is coming from the application. Like we instrument langchain or something and spans and metrics are collected directly from the app.

While I don't know what services exist, another way is cloud integration, where a platform is generating the signals. One example is AWS Bedrock, where you can get data regardless of what the developers do https://aws.amazon.com/blogs/mt/monitoring-generative-ai-applications-using-amazon-bedrock-and-amazon-cloudwatch-integration/

So, for this PR, I mean that if its scope is only for application instrumentation, then we should look at which frameworks we are considering and maybe a draft/experiment/proof of concept that exercises the specs you are making.

Beyond that, if you are thinking about a specific cloud integration (I don't know if you are), some sample data or documentation on what that agentic feed looks like could help us translate if the semantic conventions here are valid for it or not.

Does that help? If not you can also quiz me on slack, but anyway happy holidays!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging this pull request may close these issues.

5 participants