-
Notifications
You must be signed in to change notification settings - Fork 92
Setup langfuse #716
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Setup langfuse #716
Conversation
update depencies to add langfuse add keys and host in config for langfuse to prepare for observabilty
6c13edd to
fa07881
Compare
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
Walkthrough
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Pre-merge checks❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/summary/summary/core/celery_worker.py (1)
208-460: Consider adding tests for Langfuse integration.The Langfuse integration is well-implemented with proper error handling, but lacks test coverage for the observability paths. Consider adding unit tests to verify:
- Decorator behavior when Langfuse is/isn't configured
- Span and generation context creation
- Trace metadata updates
- Fallback to nullcontext when methods are unavailable
Example test structure:
def test_process_audio_with_langfuse_disabled(mocker): """Test task execution when Langfuse is not configured.""" mocker.patch('summary.core.celery_worker.langfuse', None) # Task should still complete successfully def test_process_audio_with_langfuse_error(mocker): """Test task execution when Langfuse operations fail.""" mock_langfuse = mocker.patch('summary.core.celery_worker.langfuse') mock_langfuse.update_current_trace.side_effect = Exception("Langfuse error") # Task should log warning and continue
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (4)
env.d/development/summary.dist(1 hunks)src/summary/pyproject.toml(1 hunks)src/summary/summary/core/celery_worker.py(10 hunks)src/summary/summary/core/config.py(1 hunks)
🔇 Additional comments (7)
env.d/development/summary.dist (1)
24-27: LGTM! Environment variables correctly configured.The Langfuse environment variables are appropriately defined with placeholder values for development template and match the corresponding config fields in
src/summary/summary/core/config.py.src/summary/summary/core/celery_worker.py (6)
230-243: LGTM! Defensive error handling for observability.The try/except pattern ensures Langfuse trace updates don't break the main workflow. The warning log provides visibility while the bare
passallows graceful degradation.
256-271: LGTM! Robust span instrumentation with fallback.The span context correctly wraps the MinIO operation with appropriate metadata and uses
nullcontext()as a safe fallback when Langfuse is unavailable.
409-435: LGTM! Consistent LLM call instrumentation.All LLM calls have been successfully refactored to use the
call_llm_genhelper with descriptive names for observability. The pattern is applied consistently across all generation steps.
455-459: LGTM! Proper telemetry flush with error handling.The final
langfuse.flush()call ensures traces and spans are sent before task completion. The error handling appropriately prevents observability failures from affecting the main workflow.
395-407: Verify user prompt truncation limit
Line 405 truncates user input to 2,000 characters—this is the only truncation in the service. Ensure this limit covers all prompt types (tldr, plan, part, next_steps, cleaning) without dropping critical context or consider bumping it or using a start/end‐preserving strategy.
16-17: Verify Langfuse OpenAI wrapper usage
Instances at lines 118 and 291 callopenai.OpenAI; ensure this class is the instrumented client provided bylangfuse.openai(not the raw OpenAI SDK) and that yourlangfusedependency exports it correctly.
| "posthog==6.0.3", | ||
| "requests==2.32.4", | ||
| "sentry-sdk[fastapi, celery]==2.30.0", | ||
| "langfuse==3.4.0", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick | 🔵 Trivial
🧩 Analysis chain
Verify the Langfuse version is current and secure.
The pinned version langfuse==3.4.0 should be validated to ensure it's up-to-date and free from known vulnerabilities.
Run the following script to check the latest version and security advisories:
🏁 Script executed:
#!/bin/bash
# Description: Verify langfuse package version and check for security advisories
# Check latest version on PyPI
echo "=== Latest langfuse version on PyPI ==="
curl -s https://pypi.org/pypi/langfuse/json | jq -r '.info.version'
# Check for security advisories
echo -e "\n=== Security advisories for langfuse ==="
gh api graphql -f query='
{
securityVulnerabilities(first: 10, ecosystem: PIP, package: "langfuse") {
nodes {
advisory {
summary
severity
publishedAt
}
vulnerableVersionRange
firstPatchedVersion {
identifier
}
}
}
}'Length of output: 137
Update langfuse to latest version (3.5.2)
In src/summary/pyproject.toml (line 18), change langfuse==3.4.0 to langfuse==3.5.2. No known vulnerabilities.
🤖 Prompt for AI Agents
In src/summary/pyproject.toml around line 18, the dependency pin for langfuse is
outdated; update the version string from langfuse==3.4.0 to langfuse==3.5.2 in
the file, save the change, and then run your dependency install/lock command
(e.g., pip-compile or poetry lock / pip install -r) to ensure the updated
version is recorded and tested.
|
|
||
| celery.config_from_object("summary.core.celery_config") | ||
|
|
||
| langfuse = get_client() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Langfuse client initialization may fail when not configured.
The get_client() call at module level will execute during import, before checking if Langfuse credentials are configured. If langfuse_secret_key and langfuse_public_key are None (which is the default), this could cause initialization errors or create a non-functional client.
Apply this diff to initialize the client conditionally or handle missing configuration:
-langfuse = get_client()
+def get_langfuse_client():
+ """Get Langfuse client if configured, otherwise return None."""
+ if settings.langfuse_secret_key and settings.langfuse_public_key:
+ return get_client()
+ return None
+
+langfuse = get_langfuse_client()Then update all usages to check if langfuse is not None before calling methods, or ensure get_client() gracefully handles missing credentials.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| langfuse = get_client() | |
| def get_langfuse_client(): | |
| """Get Langfuse client if configured, otherwise return None.""" | |
| if settings.langfuse_secret_key and settings.langfuse_public_key: | |
| return get_client() | |
| return None | |
| langfuse = get_langfuse_client() |
🤖 Prompt for AI Agents
In src/summary/summary/core/celery_worker.py around line 51, the module-level
call langfuse = get_client() runs at import time and can fail when langfuse
credentials are not configured; change initialization so get_client() is only
called when credentials exist or wrap it to return None on missing/invalid
config. Specifically, either move get_client() into a lazy initializer (call it
inside functions/tasks that need it) or modify get_client() to detect missing
langfuse_secret_key/langfuse_public_key and return None (or swallow errors)
instead of raising; then update all places that use langfuse to check if
langfuse is not None before invoking methods (or acquire the client via the lazy
initializer each time).
| max_retries=settings.celery_max_retries, | ||
| ) | ||
| def process_audio_transcribe_summarize_v2( | ||
| @observe(name="process-audio", capture_input=True, capture_output=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review PII capture in observe decorator.
The @observe decorator with capture_input=True will capture all function arguments, including potentially sensitive data like email and sub (user identifiers).
Consider either:
- Set
capture_input=Falseto avoid PII capture - Use Langfuse's input redaction/filtering if available
- Apply decorator conditionally when Langfuse is configured
Example for conditional decoration:
+def conditional_observe(**kwargs):
+ """Apply observe decorator only if Langfuse is configured."""
+ def decorator(func):
+ if langfuse is not None:
+ return observe(**kwargs)(func)
+ return func
+ return decorator
+
@celery.task(
bind=True,
autoretry_for=[exceptions.HTTPError],
max_retries=settings.celery_max_retries,
)
-@observe(name="process-audio", capture_input=True, capture_output=False)
+@conditional_observe(name="process-audio", capture_input=False, capture_output=False)
def process_audio_transcribe_summarize_v2(Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In src/summary/summary/core/celery_worker.py around line 208, the @observe
decorator currently uses capture_input=True which will record all function
arguments (including PII like email and sub); change it to capture_input=False
to avoid capturing inputs, or if Langfuse redaction is available, configure the
decorator to redact sensitive fields (e.g., pass a redaction/filters list or use
the SDK’s input sanitization), or apply the decorator only when Langfuse is
enabled by conditionally wrapping the function with @observe based on the
app/config (e.g., check a LANGFUSE_ENABLED flag and decorate at runtime) so PII
is not recorded.
|
|
||
| if ( | ||
| analytics.is_feature_enabled("summary-enabled", distinct_id=sub) | ||
| analytics.is_feature_enabled("summary-enabled", distinct_id=1756993468466) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hardcoded distinct_id breaks per-user feature flagging.
The distinct_id was changed from the user identifier (sub) to a hardcoded value 1756993468466. This means the feature flag check will return the same result for all users, rather than enabling/disabling summary on a per-user basis.
If this was unintentional, revert to the previous behavior:
- analytics.is_feature_enabled("summary-enabled", distinct_id=1756993468466)
+ analytics.is_feature_enabled("summary-enabled", distinct_id=sub)If this change is intentional (e.g., for testing or global rollout), please add a comment explaining the rationale.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| analytics.is_feature_enabled("summary-enabled", distinct_id=1756993468466) | |
| analytics.is_feature_enabled("summary-enabled", distinct_id=sub) |
🤖 Prompt for AI Agents
In src/summary/summary/core/celery_worker.py around line 353, the
analytics.is_feature_enabled call uses a hardcoded distinct_id (1756993468466)
which prevents per-user feature flagging; change the call to pass the actual
user's identifier (e.g., the user's sub/id variable from the request or task
context) so each user gets their own flag evaluation, or if the hardcoded id was
intentional for testing/global rollout, restore the original dynamic distinct_id
and add an inline comment explaining the intentional override and its scope (and
consider gating it behind a debug/test flag).
src/summary/summary/core/config.py
Outdated
| # Langfuse | ||
| langfuse_secret_key: Optional[str] = None | ||
| langfuse_public_key: Optional[str] = None | ||
| langfuse_host: Optional[str] = "https://cloud.langfuse.com" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick | 🔵 Trivial
LGTM! Consider marking secrets as sensitive.
The Langfuse configuration fields are correctly defined and follow the existing pattern. The optional nature allows graceful degradation when Langfuse is not configured.
Consider using Pydantic's SecretStr type for langfuse_secret_key to prevent accidental logging of sensitive values:
+from pydantic import SecretStr
+
class Settings(BaseSettings):
# ...
- langfuse_secret_key: Optional[str] = None
+ langfuse_secret_key: Optional[SecretStr] = NoneThen access the value with .get_secret_value() when passing to the Langfuse client.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Langfuse | |
| langfuse_secret_key: Optional[str] = None | |
| langfuse_public_key: Optional[str] = None | |
| langfuse_host: Optional[str] = "https://cloud.langfuse.com" | |
| from pydantic import SecretStr | |
| class Settings(BaseSettings): | |
| # Langfuse | |
| langfuse_secret_key: Optional[SecretStr] = None | |
| langfuse_public_key: Optional[str] = None | |
| langfuse_host: Optional[str] = "https://cloud.langfuse.com" |
🤖 Prompt for AI Agents
In src/summary/summary/core/config.py around lines 78 to 81, the Langfuse secret
field is currently a plain Optional[str] and should be treated as sensitive;
change langfuse_secret_key to Optional[SecretStr] (import SecretStr from
pydantic), keep langfuse_public_key and langfuse_host as-is, and update any code
that constructs the Langfuse client to call .get_secret_value() on the SecretStr
when not None (handling None appropriately) so the raw secret is not
accidentally logged or printed.
lebaudantoine
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review in progress
env.d/development/summary.dist
Outdated
|
|
||
| LANGFUSE_SECRET_KEY="your-secret-key" | ||
| LANGFUSE_PUBLIC_KEY="your-public-key" | ||
| LANGFUSE_HOST="https://cloud.langfuse.com" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: add a blank line at the end of your file.
| analytics.is_feature_enabled("summary-enabled", distinct_id=sub) | ||
| analytics.is_feature_enabled("summary-enabled", distinct_id=1756993468466) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wrongly committed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolved
src/summary/summary/core/config.py
Outdated
| # Langfuse | ||
| langfuse_secret_key: Optional[str] = None | ||
| langfuse_public_key: Optional[str] = None | ||
| langfuse_host: Optional[str] = "https://cloud.langfuse.com" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be better to keep these as None by default, so the SDK can fall back to its own defaults. Defining them here suggests they’re meant to override via environment variables, but since they’re not passed explicitly to the client, it’s a bit confusing.
|
|
||
| try: | ||
| langfuse.update_current_trace( | ||
| user_id=sub or email, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess sub is always set
| user_id=sub or email, | |
| user_id=sub, |
It applies for next calls
Question: Should we use owner_id as the distinct identifier in Langfuse to stay consistent with how we handle it in PostHog?
add Langfuse observability instruments add trace with @observe around process_audio_transcribe_summarize_v2 and summarize_transcription create span for minio wrap LLM calls in generation spans ensure trace delivery with langfuse.flush
fa07881 to
0f73bcc
Compare
Add a trace on the speach-to-text model for better observability (time spent,errors, model used, lenght of the audio used...)
345e71a to
0d51cdc
Compare
Create observability class to be more maintenable Change LLM calling from langfuse to openai to hide prompts
ab0a6c4 to
a9e9cff
Compare
Delete useless span for observability minio get_object
|



Setup first observability object with langfuse
Create traces for process-audio and summarize-transcription
Create span for stt-call
Declare generations for LLM calls
Add variables in end.d, config and pyproject.toml
This provides clear observability of transcription and summarization separately.