instrument by harrisonchu · Pull Request #20311 · anomalyco/opencode

harrisonchu · 2026-03-31T17:03:19Z

Exactly — here's the description:

Langfuse Tracing for the OpenCode Agentic Loop

Why this exists

OpenCode's core loop is not a simple request/response — it's a multi-step agent that thinks, calls tools, gets results, thinks again, and repeats until it decides it's done. When you type a message in the TUI, a lot happens that's invisible: multiple LLM calls, chains of tool executions, reasoning traces, token accumulation. This instrumentation makes all of that visible in Langfuse as a structured trace, so you can read a session the same way you'd read source code.

The core loop, explained

The entry point is prompt() in packages/opencode/src/session/prompt.ts, which creates your user message and calls loop(). The loop is a while(true) that keeps running until the model decides it's done:

loop():
  while (true):
    1. Load all messages for the session
    2. Check exit: did the model finish with a non-tool reason? → break
    3. Handle special cases: pending subtasks, context compaction
    4. NORMAL PATH:
       a. Resolve agent + available tools
       b. Create a SessionProcessor
       c. processor.process() → calls LLM.stream() → AI SDK's streamText()
       d. The stream emits events processed in order:
            start-step
              reasoning-start/delta/end   ← "thinking" blocks
              text-start/delta/end        ← response text
              tool-input-start/delta/end  ← tool args streaming in
              tool-call                   ← tool execution begins
              tool-result                 ← tool execution completes
            finish-step                   ← token usage captured here
            (repeat if finish reason = "tool-calls")
       e. result = "continue" | "stop" | "compact"
    5. If finish reason is "tool-calls" → loop again
       Otherwise → break

The key insight: one while iteration ≠ one LLM call. The AI SDK's streamText() handles an internal sub-loop: if the model calls tools, it sends results back and calls the model again — all within one processor.process() call. So one loop step can contain multiple LLM calls chained together.

Why we instrumented where we did

Trace = one loop() invocation. This is the natural unit of a "coding session turn" — one user message through to final response.

loop.step-N span = one while iteration. Each iteration is one attempt to make progress: resolve the agent, call the LLM (possibly multiple times with tool use), and land on a result. Seeing steps lets you understand how many times the agent had to "go back" — e.g. compaction, subtasks, or continuing after tools.

llm-call-N generation = one internal LLM call within a step. This is where the actual model activity is. Each generation captures:

Input: what the model received — user message on call 0, tool results on subsequent calls
Output: { thinking, text, toolCalls } — exactly what you see in the TUI, now queryable
Usage: input/output/reasoning tokens, cache hits, cost per call

This is the most granular and important level. It lets you answer: how much of the token budget is going to reasoning vs. output? Which tool results are being fed back in? Is the model thinking before or after tool calls?

tool.* spans = individual tool executions. These are children of the iteration span, not the generation, because they happen between LLM calls — the model requests them, they execute, and their results become the input to the next LLM call. Seeing them as spans with timing lets you identify which tools are slow or returning large outputs.

The data flow as a trace

trace: opencode.loop  (input = user message)
  loop.step-1
    llm-call-0   input: "fix a simple TODO"
                 output: { thinking: "Let me find TODOs...", toolCalls: [read, read] }
                 usage: 1056 in → 170 out
    tool.read    input: { path: "config.ts" }  output: "file contents..."
    tool.read    input: { path: "bun/index.ts" } output: "..."
    llm-call-1   input: [tool results]
                 output: { thinking: "These reference a Bun issue, let me check...", toolCalls: [webfetch] }
                 usage: 581 in → 253 out
    tool.webfetch ...
    llm-call-2   input: [tool results]
                 output: { thinking: "Issue is resolved, safe to remove", text: "Here's the fix..." }
                 usage: 680 in → 76 out

The rising input token counts across llm-call-N within one step show context accumulation — the model is carrying more and more tool results forward. The reasoning tokens tell you how much of the output budget went to thinking vs. actual response.

…/output - Capture reasoning/thinking content in generation output, matching what the TUI displays between tool calls - Fix generation input/output: step 0 gets the user message, subsequent steps get tool results from the previous step as input - Structure generation output as { thinking, text, toolCalls } so each LLM call is fully inspectable in Langfuse - Also fix kimi-k2p5 TODO in transform.ts (resolved upstream) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

github-actions · 2026-03-31T17:03:28Z

This PR doesn't fully meet our contributing guidelines and PR template.

What needs to be fixed:

PR description is missing required template sections. Please use the PR template.

Please edit this PR description to address the above within 2 hours, or it will be automatically closed.

If you believe this was flagged incorrectly, please let a maintainer know.

github-actions · 2026-03-31T17:03:30Z

Hey! Your PR title instrument doesn't follow conventional commit format.

Please update it to start with one of:

feat: or feat(scope): new feature
fix: or fix(scope): bug fix
docs: or docs(scope): documentation changes
chore: or chore(scope): maintenance tasks
refactor: or refactor(scope): code refactoring
test: or test(scope): adding or updating tests

Where scope is the package name (e.g., app, desktop, opencode).

See CONTRIBUTING.md for details.

github-actions · 2026-03-31T17:04:05Z

The following comment was made by an LLM, it may be inaccurate:

Based on the search results, I found a potentially related PR:

PR #6629: "feat(telemetry): add OpenTelemetry instrumentation with Aspire Dashboard support"
#6629

This PR appears to be related to instrumentation, specifically for telemetry and OpenTelemetry. Since PR #20311 has the title "instrument" but lacks a detailed description, this existing PR on instrumentation could be a duplicate or related work.

However, the PR description for #20311 is incomplete (just the template with no actual details filled in), so I cannot confirm if they are truly duplicates without more context about what #20311 is trying to accomplish.

harrisonchu and others added 3 commits March 31, 2026 12:03

working langfuse instrumentation

704bab8

working langfuse instrumentation

431fc24

github-actions bot added the needs:compliance This means the issue will auto-close after 2 hours. label Mar 31, 2026

github-actions bot added the needs:title label Mar 31, 2026

harrisonchu closed this Mar 31, 2026

harrisonchu reopened this Mar 31, 2026

harrisonchu closed this Mar 31, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

instrument#20311

instrument#20311
harrisonchu wants to merge 3 commits intoanomalyco:devfrom
harrisonchu:claude/sharp-euler

harrisonchu commented Mar 31, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 31, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 31, 2026

Uh oh!

github-actions bot commented Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

harrisonchu commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Langfuse Tracing for the OpenCode Agentic Loop

Why this exists

The core loop, explained

Why we instrumented where we did

The data flow as a trace

Uh oh!

github-actions bot commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Mar 31, 2026

Uh oh!

github-actions bot commented Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

harrisonchu commented Mar 31, 2026 •

edited

Loading

github-actions bot commented Mar 31, 2026 •

edited

Loading