More aggressively mask user code errors when masking enabled #27183

benpankow · 2025-01-16T23:21:29Z

Summary

Test Plan

More unit tests.

benpankow · 2025-01-16T23:21:49Z

More aggressively mask user code errors when masking enabled #27183 👈 (View in Graphite)
master

This stack of pull requests is managed by Graphite. Learn more about stacking.

alangenfeld

I think a summary is warranted for this PR given its complexity

alangenfeld · 2025-01-17T20:03:50Z

python_modules/dagster/dagster/_utils/error.py

+error_id_by_exception: ContextVar[Mapping[int, str]] = ContextVar(
+    "error_id_by_exception", default={}
+)
+
+
+@contextlib.contextmanager
+def redact_user_stacktrace_if_enabled():
+    """Context manager which, if a user has enabled redacting user code errors, logs exceptions raised from within,
+    and clears the stacktrace from the exception. It also marks the exception to be redacted if it was to be persisted
+    or otherwise serialized to be sent to Dagster Plus. This is useful for preventing sensitive information from
+    being leaked in error messages.
+    """
+    if not _should_redact_user_code_error():
+        yield
+    else:
+        try:
+            yield
+        except BaseException as e:
+            exc_info = sys.exc_info()
+
+            # Generate a unique error ID for this error, or re-use an existing one
+            # if this error has already been seen
+            existing_error_id = error_id_by_exception.get().get(id(e))
+
+            if not existing_error_id:
+                error_id = str(uuid.uuid4())
+
+                # Track the error ID for this exception so we can redact it later
+                error_id_by_exception.set({**error_id_by_exception.get(), id(e): error_id})


I think the way you are using the context var here is equivalent to just having a process global dict. What exactly is the intention here and do any of the existing tests validate that?

i believe the goal is to increase the set of cases that this logic can handle (all kinds of exceptions besides DagsterUserCodeExecutionError can be emitted within user code, like KeyboardInterrupt or SystemExit or other DagsterError subclasses) while still only triggering the redaction if the exception was actually raised within a op_execution_error_boundary or user_code_error_boundary. I don't have a strong opinion about global dict vs. contextvar

gibsondan

I'll be out next week so may lean on you alex to do the final accept once you have full context, but this broadly makes sense to me

gibsondan · 2025-01-18T05:17:41Z

python_modules/dagster/dagster/_utils/error.py

+error_id_by_exception: ContextVar[Mapping[int, str]] = ContextVar(
+    "error_id_by_exception", default={}
+)


the name here should probably have 'redacted' in it: redacted_user_code_error_id_by_exception?

gibsondan · 2025-01-18T05:21:18Z

python_modules/dagster/dagster/_utils/error.py

+error_id_by_exception: ContextVar[Mapping[int, str]] = ContextVar(
+    "error_id_by_exception", default={}
+)
+
+
+@contextlib.contextmanager
+def redact_user_stacktrace_if_enabled():
+    """Context manager which, if a user has enabled redacting user code errors, logs exceptions raised from within,
+    and clears the stacktrace from the exception. It also marks the exception to be redacted if it was to be persisted
+    or otherwise serialized to be sent to Dagster Plus. This is useful for preventing sensitive information from
+    being leaked in error messages.
+    """
+    if not _should_redact_user_code_error():
+        yield
+    else:
+        try:
+            yield
+        except BaseException as e:
+            exc_info = sys.exc_info()
+
+            # Generate a unique error ID for this error, or re-use an existing one
+            # if this error has already been seen
+            existing_error_id = error_id_by_exception.get().get(id(e))
+
+            if not existing_error_id:
+                error_id = str(uuid.uuid4())
+
+                # Track the error ID for this exception so we can redact it later
+                error_id_by_exception.set({**error_id_by_exception.get(), id(e): error_id})


i believe the goal is to increase the set of cases that this logic can handle (all kinds of exceptions besides DagsterUserCodeExecutionError can be emitted within user code, like KeyboardInterrupt or SystemExit or other DagsterError subclasses) while still only triggering the redaction if the exception was actually raised within a op_execution_error_boundary or user_code_error_boundary. I don't have a strong opinion about global dict vs. contextvar

gibsondan · 2025-01-18T05:22:31Z

python_modules/dagster/dagster/_utils/error.py

+        if isinstance(e, DagsterUserCodeExecutionError):
+            return SerializableErrorInfo(
+                message=(
+                    f"Error occurred during user code execution, error ID {err_id}. "
+                    "The error has been masked to prevent leaking sensitive information. "
+                    "Search in logs for this error ID for more details."
+                ),
+                stack=[],
+                cls_name="DagsterRedactedUserCodeError",
+                cause=None,
+                context=None,
+            )
+        else:
+            tb_exc = traceback.TracebackException(exc_type, e, tb)


maybe worth explaining the difference between these two cases - with user code errors, you don't even want to show the message - but with other errors (framework errors or interrupts or Failure / RetryRequested raised within the error boundary), the message is not sensitive and can be displayed for clarity, but the traceback is.

benpankow requested a review from gibsondan January 16, 2025 23:39

benpankow added 7 commits January 17, 2025 11:41

aggressively mask user code errors

76ec48d

tests

7f47b4d

more tests

057c5e1

update

9da71cf

special case exception types

67576ee

errid

e8ff613

comments

c6aef4f

benpankow changed the title ~~aggressively mask user code errors~~ More aggressively mask user code errors when masking enabled Jan 17, 2025

benpankow marked this pull request as ready for review January 17, 2025 19:48

benpankow force-pushed the benpankow/user-code-errors-mask-aggressively branch from 333ee23 to c6aef4f Compare January 17, 2025 19:49

gibsondan requested a review from alangenfeld January 17, 2025 19:50

alangenfeld reviewed Jan 17, 2025

View reviewed changes

gibsondan reviewed Jan 18, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More aggressively mask user code errors when masking enabled #27183

More aggressively mask user code errors when masking enabled #27183

benpankow commented Jan 16, 2025 •

edited

Loading

benpankow commented Jan 16, 2025

alangenfeld left a comment

alangenfeld Jan 17, 2025

gibsondan Jan 18, 2025

gibsondan left a comment

gibsondan Jan 18, 2025

gibsondan Jan 18, 2025

gibsondan Jan 18, 2025

More aggressively mask user code errors when masking enabled #27183

Are you sure you want to change the base?

More aggressively mask user code errors when masking enabled #27183

Conversation

benpankow commented Jan 16, 2025 • edited Loading

Summary

Test Plan

benpankow commented Jan 16, 2025

alangenfeld left a comment

Choose a reason for hiding this comment

alangenfeld Jan 17, 2025

Choose a reason for hiding this comment

gibsondan Jan 18, 2025

Choose a reason for hiding this comment

gibsondan left a comment

Choose a reason for hiding this comment

gibsondan Jan 18, 2025

Choose a reason for hiding this comment

gibsondan Jan 18, 2025

Choose a reason for hiding this comment

gibsondan Jan 18, 2025

Choose a reason for hiding this comment

benpankow commented Jan 16, 2025 •

edited

Loading