Skip to content

Add Interruptible Query Execution in Jupyter via KeyboardInterrupt Support #1141

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 56 commits into from
Jun 16, 2025

Conversation

kosiew
Copy link
Contributor

@kosiew kosiew commented Jun 4, 2025

Which issue does this PR close?

Closes #1136

Rationale for this change

Currently, when executing long-running queries in Jupyter notebooks using DataFusion, the interrupt (Ctrl+C) mechanism does not function as expected. Users are forced to restart the kernel, causing workflow disruption and data/context loss. This PR introduces a graceful interrupt mechanism allowing rapid development iterations and better notebook UX.

What changes are included in this PR?

  • Introduces test_collect_interrupted to validate interruptibility via KeyboardInterrupt on long-running queries.
  • Adds polling and signaling logic to safely trigger interrupts in main thread.
  • Replaces all uses of wait_for_future(py, fut)? with wait_for_future(py, fut)?? to properly propagate nested Result types.
  • Updates wait_for_future utility to periodically check for Python signals using py.check_signals() to allow interrupt handling.
  • Enhances resilience and correctness of wait_for_future and associated async/await error propagation.
  • Ensures consistent behavior across contexts like SQL queries, file reads, and execution plans.

Note that in Jupyter, you click the Stop button to interrupt a query:

Jun-04-2025.17-23-59.mp4

Are these changes tested?

✅ Yes

  • A dedicated test test_collect_interrupted ensures interrupt behavior works reliably.
  • Additional tests continue to pass after modifications to async handling logic.

Are there any user-facing changes?

✅ Yes

  • Users can now interrupt long-running DataFusion queries in Jupyter notebooks without terminating the entire kernel session.
  • Graceful recovery is possible — queries can be modified and re-executed post-interrupt.
  • Improved error messages and behavior for missing tables (e.g., PyKeyError for "No table named").

kosiew added 30 commits June 3, 2025 11:30
This reverts commit b8ce3e4.
kosiew added 3 commits June 4, 2025 18:18
- Simplified async handling by removing unnecessary cloning of strings and context in various methods.
- Streamlined the use of `wait_for_future` to directly handle futures without intermediate variables.
- Improved error handling by directly propagating results from futures.
- Enhanced readability by reducing boilerplate code in methods related to reading and writing data.
- Updated the `wait_for_future` function to improve signal checking and future handling.
@kosiew kosiew changed the title Improve async Python integration with better error handling, signal checks, and KeyboardInterrupt support in query execution Add Interruptible Query Execution in Jupyter via KeyboardInterrupt Support Jun 5, 2025
Copy link
Contributor

@timsaucer timsaucer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is amazing!

One suggestion in two places to avoid having a panic.

@kosiew
Copy link
Contributor Author

kosiew commented Jun 13, 2025

@timsaucer
Thanks for the review and nipping the panic.

@kosiew kosiew marked this pull request as draft June 13, 2025 16:33
@kosiew kosiew marked this pull request as ready for review June 14, 2025 06:40
Copy link
Contributor

@timsaucer timsaucer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@timsaucer timsaucer merged commit dc0d35a into apache:main Jun 16, 2025
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Interuptable queries in jupyter notebooks
2 participants