fix(experiments): move evals out of root span by hassiebp · Pull Request #1437 · langfuse/langfuse-python

hassiebp · 2025-11-12T14:40:48Z

Important

Move evaluator execution out of the root span in _process_experiment_item() in client.py to ensure evaluations run independently of task execution.

Behavior:
- Move evaluator execution and score creation out of the root span in _process_experiment_item() in client.py.
- Evaluations are now processed independently of task execution, ensuring they run even if the task fails.
Error Handling:
- Maintains error logging for evaluator failures with langfuse_logger.error().
Misc:
- Adjusted indentation for clarity and separation of concerns in client.py.

^{This description was created by}^{for dad3bfe. You can customize this summary. It will automatically update as commits are pushed.}

Disclaimer: Experimental PR review

Greptile Overview

Greptile Summary

Moved evaluator execution outside the experiment-item-run span context to prevent evaluation operations from being nested under the root experiment span.

Key changes:

Evaluators now run after the experiment span context exits (unindented the evaluator loop by one level)
Evaluations are still correctly associated with the span via observation_id=span.id
Error handling remains intact - evaluators only run if task execution succeeds
Variable scoping is preserved as Python retains variables defined within with blocks after exit

Confidence Score: 4/5

This PR is safe to merge with low risk - it's a straightforward refactoring that changes span nesting without affecting functionality
The change is architecturally sound: moving evaluators outside the span context prevents them from appearing as nested operations. All variable references remain valid (Python retains variables from with blocks), error handling is preserved (evaluators only run on success), and the span.id reference is valid after context exit since it's an instance attribute.
No files require special attention

Important Files Changed

File Analysis

Filename	Score	Overview
langfuse/_client/client.py	4/5	Moved evaluator execution outside the experiment-item-run span context to prevent evaluations from being nested under the root span. The change maintains correct variable scoping and error handling.

Sequence Diagram

sequenceDiagram
    participant Client as Langfuse Client
    participant Span as Experiment Span
    participant Task as User Task
    participant Eval as Evaluators
    
    Client->>Span: start_as_current_span("experiment-item-run")
    activate Span
    
    Span->>Task: run task with item input
    Task-->>Span: return output
    
    Span->>Span: update span with input/output
    
    Client->>Span: exit span context
    deactivate Span
    
    Note over Client,Eval: Evaluators run OUTSIDE span context
    
    loop For each evaluator
        Client->>Eval: run_evaluator(input, output, expected_output)
        Eval-->>Client: evaluation results
        Client->>Client: create_score(trace_id, observation_id=span.id)
    end
    
    Client->>Client: return ExperimentItemResult

greptile-apps

_{1 file reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

thdesc · 2025-11-22T10:50:18Z

Hi @hassiebp, thanks for the update! I have a question regarding the new tracing structure.

Now that the evaluator (LLM-based in our case) runs outside of the root span, how can we easily understand or debug how an evaluation produced a given score for a task? Since the evaluation events now appear in a separate trace, it seems harder to connect the task run with the corresponding evaluation.

In our workflow, we upload a dataset to Langfuse, run an agent over all items using the experiment SDK (the task function), and then use another agent to generate scores for those runs. Since this update, I don’t see an easy way in the Langfuse UI to quickly navigate from the task’s trace to the associated evaluation trace.

Is there something we’re missing, or any recommended way to link them now?
Thank you!

fix(experiments): move evals out of root span

dad3bfe

greptile-apps bot reviewed Nov 12, 2025

View reviewed changes

hassiebp merged commit fd7e850 into main Nov 12, 2025
12 checks passed

hassiebp deleted the fix-evals-out-of-exp branch November 12, 2025 14:49

thdesc mentioned this pull request Dec 18, 2025

Evaluator traces not linked to dataset-item runs when using run_experiment langfuse/langfuse#11227

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(experiments): move evals out of root span#1437

fix(experiments): move evals out of root span#1437
hassiebp merged 1 commit intomainfrom
fix-evals-out-of-exp

hassiebp commented Nov 12, 2025 •

edited by greptile-apps bot

Loading

Uh oh!

greptile-apps bot left a comment

Uh oh!

Uh oh!

thdesc commented Nov 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

hassiebp commented Nov 12, 2025 • edited by greptile-apps bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Disclaimer: Experimental PR review

Greptile Overview

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

thdesc commented Nov 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

hassiebp commented Nov 12, 2025 •

edited by greptile-apps bot

Loading