Setup langfuse #716

GuittenyMartin · 2025-09-22T08:40:15Z

Setup first observability object with langfuse

Create traces for process-audio and summarize-transcription
Create span for stt-call
Declare generations for LLM calls
Add variables in end.d, config and pyproject.toml

This provides clear observability of transcription and summarization separately.

update depencies to add langfuse add keys and host in config for langfuse to prepare for observabilty

lebaudantoine · 2025-09-30T20:47:05Z

@coderabbitai full review

coderabbitai · 2025-09-30T20:47:13Z

✅ Actions performed

Full review triggered.

coderabbitai · 2025-09-30T20:54:25Z

Walkthrough

Added three environment variables to env.d/development/summary.dist: LANGFUSE_SECRET_KEY, LANGFUSE_PUBLIC_KEY, LANGFUSE_HOST.
Added dependency langfuse==3.4.0 in src/summary/pyproject.toml.
In src/summary/summary/core/celery_worker.py, integrated Langfuse: decorators for two Celery tasks, spans around MinIO retrieval, generation-context for LLM calls, runtime trace updates, and a final flush after webhook submission. Updated feature flag check to use a fixed distinct_id.
In src/summary/summary/core/config.py, added Settings fields: langfuse_secret_key, langfuse_public_key, langfuse_host (defaulting to https://cloud.langfuse.com).

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Pre-merge checks

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title Check	✅ Passed	The pull request title “Setup langfuse” succinctly captures the primary change of integrating Langfuse observability into the project by setting up environment variables, dependencies, and instrumentation.
Description Check	✅ Passed	The provided description clearly outlines the integration of Langfuse observability, including traces for process-audio and summarize-transcription tasks, spans for STT calls, and declarations of LLM generations, along with additions to environment variables, configuration, and project dependencies. These points directly map to the changes in this pull request. Therefore, the description is on topic and sufficiently descriptive.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/summary/summary/core/celery_worker.py (1)
208-460: Consider adding tests for Langfuse integration.

The Langfuse integration is well-implemented with proper error handling, but lacks test coverage for the observability paths. Consider adding unit tests to verify:

Decorator behavior when Langfuse is/isn't configured

Span and generation context creation

Trace metadata updates

Fallback to nullcontext when methods are unavailable

Example test structure:
def test_process_audio_with_langfuse_disabled(mocker):
    """Test task execution when Langfuse is not configured."""
    mocker.patch('summary.core.celery_worker.langfuse', None)
    # Task should still complete successfully
    
def test_process_audio_with_langfuse_error(mocker):
    """Test task execution when Langfuse operations fail."""
    mock_langfuse = mocker.patch('summary.core.celery_worker.langfuse')
    mock_langfuse.update_current_trace.side_effect = Exception("Langfuse error")
    # Task should log warning and continue

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a3ca6f0 and fa07881.

📒 Files selected for processing (4)

env.d/development/summary.dist (1 hunks)
src/summary/pyproject.toml (1 hunks)
src/summary/summary/core/celery_worker.py (10 hunks)
src/summary/summary/core/config.py (1 hunks)

🔇 Additional comments (7)

env.d/development/summary.dist (1)

24-27: LGTM! Environment variables correctly configured.

The Langfuse environment variables are appropriately defined with placeholder values for development template and match the corresponding config fields in src/summary/summary/core/config.py.

src/summary/summary/core/celery_worker.py (6)

230-243: LGTM! Defensive error handling for observability.

The try/except pattern ensures Langfuse trace updates don't break the main workflow. The warning log provides visibility while the bare pass allows graceful degradation.

256-271: LGTM! Robust span instrumentation with fallback.

The span context correctly wraps the MinIO operation with appropriate metadata and uses nullcontext() as a safe fallback when Langfuse is unavailable.

409-435: LGTM! Consistent LLM call instrumentation.

All LLM calls have been successfully refactored to use the call_llm_gen helper with descriptive names for observability. The pattern is applied consistently across all generation steps.

455-459: LGTM! Proper telemetry flush with error handling.

The final langfuse.flush() call ensures traces and spans are sent before task completion. The error handling appropriately prevents observability failures from affecting the main workflow.

395-407: Verify user prompt truncation limit
Line 405 truncates user input to 2,000 characters—this is the only truncation in the service. Ensure this limit covers all prompt types (tldr, plan, part, next_steps, cleaning) without dropping critical context or consider bumping it or using a start/end‐preserving strategy.

16-17: Verify Langfuse OpenAI wrapper usage
Instances at lines 118 and 291 call openai.OpenAI; ensure this class is the instrumented client provided by langfuse.openai (not the raw OpenAI SDK) and that your langfuse dependency exports it correctly.

coderabbitai · 2025-09-30T20:54:29Z

src/summary/pyproject.toml

    "posthog==6.0.3",
    "requests==2.32.4",
    "sentry-sdk[fastapi, celery]==2.30.0",
+    "langfuse==3.4.0",


🧹 Nitpick | 🔵 Trivial

🧩 Analysis chain

Verify the Langfuse version is current and secure.

The pinned version langfuse==3.4.0 should be validated to ensure it's up-to-date and free from known vulnerabilities.

Run the following script to check the latest version and security advisories:

🏁 Script executed:

#!/bin/bash # Description: Verify langfuse package version and check for security advisories # Check latest version on PyPI echo "=== Latest langfuse version on PyPI ===" curl -s https://pypi.org/pypi/langfuse/json | jq -r '.info.version' # Check for security advisories echo -e "\n=== Security advisories for langfuse ===" gh api graphql -f query=' { securityVulnerabilities(first: 10, ecosystem: PIP, package: "langfuse") { nodes { advisory { summary severity publishedAt } vulnerableVersionRange firstPatchedVersion { identifier } } } }'

Length of output: 137

Update langfuse to latest version (3.5.2)
In src/summary/pyproject.toml (line 18), change langfuse==3.4.0 to langfuse==3.5.2. No known vulnerabilities.

🤖 Prompt for AI Agents

In src/summary/pyproject.toml around line 18, the dependency pin for langfuse is outdated; update the version string from langfuse==3.4.0 to langfuse==3.5.2 in the file, save the change, and then run your dependency install/lock command (e.g., pip-compile or poetry lock / pip install -r) to ensure the updated version is recorded and tested.

coderabbitai · 2025-09-30T20:54:29Z

src/summary/summary/core/celery_worker.py


 celery.config_from_object("summary.core.celery_config")

+langfuse = get_client()


⚠️ Potential issue | 🟠 Major

Langfuse client initialization may fail when not configured.

The get_client() call at module level will execute during import, before checking if Langfuse credentials are configured. If langfuse_secret_key and langfuse_public_key are None (which is the default), this could cause initialization errors or create a non-functional client.

Apply this diff to initialize the client conditionally or handle missing configuration:

-langfuse = get_client() +def get_langfuse_client(): + """Get Langfuse client if configured, otherwise return None.""" + if settings.langfuse_secret_key and settings.langfuse_public_key: + return get_client() + return None + +langfuse = get_langfuse_client()

Then update all usages to check if langfuse is not None before calling methods, or ensure get_client() gracefully handles missing credentials.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

langfuse = get_client()

def get_langfuse_client():

"""Get Langfuse client if configured, otherwise return None."""

if settings.langfuse_secret_key and settings.langfuse_public_key:

return get_client()

return None

langfuse = get_langfuse_client()

🤖 Prompt for AI Agents

In src/summary/summary/core/celery_worker.py around line 51, the module-level call langfuse = get_client() runs at import time and can fail when langfuse credentials are not configured; change initialization so get_client() is only called when credentials exist or wrap it to return None on missing/invalid config. Specifically, either move get_client() into a lazy initializer (call it inside functions/tasks that need it) or modify get_client() to detect missing langfuse_secret_key/langfuse_public_key and return None (or swallow errors) instead of raising; then update all places that use langfuse to check if langfuse is not None before invoking methods (or acquire the client via the lazy initializer each time).

coderabbitai · 2025-09-30T20:54:29Z

src/summary/summary/core/celery_worker.py

    max_retries=settings.celery_max_retries,
 )
-def process_audio_transcribe_summarize_v2(
+@observe(name="process-audio", capture_input=True, capture_output=False)


⚠️ Potential issue | 🟠 Major

Review PII capture in observe decorator.

The @observe decorator with capture_input=True will capture all function arguments, including potentially sensitive data like email and sub (user identifiers).

Consider either:

Set capture_input=False to avoid PII capture

Use Langfuse's input redaction/filtering if available

Apply decorator conditionally when Langfuse is configured

Example for conditional decoration:

+def conditional_observe(**kwargs): + """Apply observe decorator only if Langfuse is configured.""" + def decorator(func): + if langfuse is not None: + return observe(**kwargs)(func) + return func + return decorator + @celery.task( bind=True, autoretry_for=[exceptions.HTTPError], max_retries=settings.celery_max_retries, ) -@observe(name="process-audio", capture_input=True, capture_output=False) +@conditional_observe(name="process-audio", capture_input=False, capture_output=False) def process_audio_transcribe_summarize_v2(

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In src/summary/summary/core/celery_worker.py around line 208, the @observe decorator currently uses capture_input=True which will record all function arguments (including PII like email and sub); change it to capture_input=False to avoid capturing inputs, or if Langfuse redaction is available, configure the decorator to redact sensitive fields (e.g., pass a redaction/filters list or use the SDK’s input sanitization), or apply the decorator only when Langfuse is enabled by conditionally wrapping the function with @observe based on the app/config (e.g., check a LANGFUSE_ENABLED flag and decorate at runtime) so PII is not recorded.

coderabbitai · 2025-09-30T20:54:29Z

src/summary/summary/core/celery_worker.py


    if (
-        analytics.is_feature_enabled("summary-enabled", distinct_id=sub)
+        analytics.is_feature_enabled("summary-enabled", distinct_id=1756993468466)


⚠️ Potential issue | 🔴 Critical

Hardcoded distinct_id breaks per-user feature flagging.

The distinct_id was changed from the user identifier (sub) to a hardcoded value 1756993468466. This means the feature flag check will return the same result for all users, rather than enabling/disabling summary on a per-user basis.

If this was unintentional, revert to the previous behavior:

- analytics.is_feature_enabled("summary-enabled", distinct_id=1756993468466) + analytics.is_feature_enabled("summary-enabled", distinct_id=sub)

If this change is intentional (e.g., for testing or global rollout), please add a comment explaining the rationale.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

analytics.is_feature_enabled("summary-enabled", distinct_id=1756993468466)

analytics.is_feature_enabled("summary-enabled", distinct_id=sub)

🤖 Prompt for AI Agents

In src/summary/summary/core/celery_worker.py around line 353, the analytics.is_feature_enabled call uses a hardcoded distinct_id (1756993468466) which prevents per-user feature flagging; change the call to pass the actual user's identifier (e.g., the user's sub/id variable from the request or task context) so each user gets their own flag evaluation, or if the hardcoded id was intentional for testing/global rollout, restore the original dynamic distinct_id and add an inline comment explaining the intentional override and its scope (and consider gating it behind a debug/test flag).

coderabbitai · 2025-09-30T20:54:29Z

src/summary/summary/core/config.py

+    # Langfuse
+    langfuse_secret_key: Optional[str] = None
+    langfuse_public_key: Optional[str] = None
+    langfuse_host: Optional[str] = "https://cloud.langfuse.com"


🧹 Nitpick | 🔵 Trivial

LGTM! Consider marking secrets as sensitive.

The Langfuse configuration fields are correctly defined and follow the existing pattern. The optional nature allows graceful degradation when Langfuse is not configured.

Consider using Pydantic's SecretStr type for langfuse_secret_key to prevent accidental logging of sensitive values:

+from pydantic import SecretStr + class Settings(BaseSettings): # ... - langfuse_secret_key: Optional[str] = None + langfuse_secret_key: Optional[SecretStr] = None

Then access the value with .get_secret_value() when passing to the Langfuse client.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# Langfuse

langfuse_secret_key: Optional[str] = None

langfuse_public_key: Optional[str] = None

langfuse_host: Optional[str] = "https://cloud.langfuse.com"

from pydantic import SecretStr

class Settings(BaseSettings):

# Langfuse

langfuse_secret_key: Optional[SecretStr] = None

langfuse_public_key: Optional[str] = None

langfuse_host: Optional[str] = "https://cloud.langfuse.com"

🤖 Prompt for AI Agents

In src/summary/summary/core/config.py around lines 78 to 81, the Langfuse secret field is currently a plain Optional[str] and should be treated as sensitive; change langfuse_secret_key to Optional[SecretStr] (import SecretStr from pydantic), keep langfuse_public_key and langfuse_host as-is, and update any code that constructs the Langfuse client to call .get_secret_value() on the SecretStr when not None (handling None appropriately) so the raw secret is not accidentally logged or printed.

lebaudantoine

Review in progress

lebaudantoine · 2025-09-30T20:49:53Z

env.d/development/summary.dist

+
+LANGFUSE_SECRET_KEY="your-secret-key"
+LANGFUSE_PUBLIC_KEY="your-public-key"
+LANGFUSE_HOST="https://cloud.langfuse.com"


nitpick: add a blank line at the end of your file.

lebaudantoine · 2025-09-30T20:51:42Z

src/summary/summary/core/celery_worker.py

-        analytics.is_feature_enabled("summary-enabled", distinct_id=sub)
+        analytics.is_feature_enabled("summary-enabled", distinct_id=1756993468466)


wrongly committed

lebaudantoine · 2025-09-30T20:59:41Z

src/summary/summary/core/config.py

+    # Langfuse
+    langfuse_secret_key: Optional[str] = None
+    langfuse_public_key: Optional[str] = None
+    langfuse_host: Optional[str] = "https://cloud.langfuse.com"


It might be better to keep these as None by default, so the SDK can fall back to its own defaults. Defining them here suggests they’re meant to override via environment variables, but since they’re not passed explicitly to the client, it’s a bit confusing.

lebaudantoine · 2025-09-30T21:01:56Z

src/summary/summary/core/celery_worker.py


+    try:
+        langfuse.update_current_trace(
+            user_id=sub or email,


I guess sub is always set

Suggested change

user_id=sub or email,

user_id=sub,

It applies for next calls

Question: Should we use owner_id as the distinct identifier in Langfuse to stay consistent with how we handle it in PostHog?

@observe

add Langfuse observability instruments add trace with @observe around process_audio_transcribe_summarize_v2 and summarize_transcription create span for minio wrap LLM calls in generation spans ensure trace delivery with langfuse.flush

Add a trace on the speach-to-text model for better observability (time spent,errors, model used, lenght of the audio used...)

Create observability class to be more maintenable Change LLM calling from langfuse to openai to hide prompts

Delete useless span for observability minio get_object

sonarqubecloud · 2025-10-03T08:23:16Z

Quality Gate passed

Issues
5 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

➕(summary) add dependency and env variables for langfuse

244ed6e

update depencies to add langfuse add keys and host in config for langfuse to prepare for observabilty

GuittenyMartin linked an issue Sep 25, 2025 that may be closed by this pull request

Setup observability for summary #724

Open

GuittenyMartin force-pushed the setup-langfuse branch 2 times, most recently from 6c13edd to fa07881 Compare September 25, 2025 13:36

lebaudantoine marked this pull request as ready for review September 30, 2025 20:46

coderabbitai bot reviewed Sep 30, 2025

View reviewed changes

lebaudantoine reviewed Sep 30, 2025

View reviewed changes

📈(summary) add observability with langfuse

0f73bcc

add Langfuse observability instruments add trace with @observe around process_audio_transcribe_summarize_v2 and summarize_transcription create span for minio wrap LLM calls in generation spans ensure trace delivery with langfuse.flush

GuittenyMartin force-pushed the setup-langfuse branch from fa07881 to 0f73bcc Compare October 1, 2025 09:59

📈(summary) add langfuse trace on stt call

0d51cdc

Add a trace on the speach-to-text model for better observability (time spent,errors, model used, lenght of the audio used...)

GuittenyMartin force-pushed the setup-langfuse branch from 345e71a to 0d51cdc Compare October 1, 2025 11:30

Martin Guitteny added 4 commits October 1, 2025 14:21

fixup! 📈(summary) add observability with langfuse

4ad6b79

fixup! 📈(summary) add observability with langfuse

983c7fa

fixup! 📈(summary) add observability with langfuse

398afe1

🎨(summary) create observability class

a9e9cff

Create observability class to be more maintenable Change LLM calling from langfuse to openai to hide prompts

GuittenyMartin force-pushed the setup-langfuse branch from ab0a6c4 to a9e9cff Compare October 3, 2025 08:16

📈(summary) delete observability span for minio

7f2308e

Delete useless span for observability minio get_object


		celery.config_from_object("summary.core.celery_config")

		langfuse = get_client()

-langfuse = get_client()
+def get_langfuse_client():
+    """Get Langfuse client if configured, otherwise return None."""
+    if settings.langfuse_secret_key and settings.langfuse_public_key:
+        return get_client()
+    return None
+langfuse = get_langfuse_client()

	analytics.is_feature_enabled("summary-enabled", distinct_id=1756993468466)
	analytics.is_feature_enabled("summary-enabled", distinct_id=sub)

Setup langfuse #716

Are you sure you want to change the base?

Setup langfuse #716

Uh oh!

Conversation

GuittenyMartin commented Sep 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Setup first observability object with langfuse

Uh oh!

lebaudantoine commented Sep 30, 2025

Uh oh!

coderabbitai bot commented Sep 30, 2025

Uh oh!

coderabbitai bot commented Sep 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Estimated code review effort

Pre-merge checks

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

lebaudantoine left a comment

Choose a reason for hiding this comment

Uh oh!

lebaudantoine Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

lebaudantoine Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

GuittenyMartin Oct 1, 2025

Choose a reason for hiding this comment

Uh oh!

lebaudantoine Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

lebaudantoine Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

sonarqubecloud bot commented Oct 3, 2025

Quality Gate passed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

GuittenyMartin commented Sep 22, 2025 •

edited

Loading

coderabbitai bot commented Sep 30, 2025 •

edited

Loading