-
Notifications
You must be signed in to change notification settings - Fork 1
Workflow control widget #610
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Implements a new collapsible card widget for displaying and controlling workflow status. Each workflow shows: - Header with title, status badge, timing info, and reset/stop buttons - Configuration section with toolbars grouped by identical configs - Output chips showing available workflow outputs - Commit button when staged config differs from active config Key features: - Expand/collapse via button in header - Gear button opens config modal, remove button unstages sources - Yellow border indicates modified configs, yellow background for unconfigured - Incremental status/timing updates to avoid flicker Also adds get_workflow_registry() method to JobOrchestrator for widget access. Prompt: Please think through #563. job-list-widget.png shows the existing JobStatus(List)Widget which this work will eventually replace. We should keep working ideas and design (buttons should follow other toolbars, as mentioned in issue). Please come up with a suggestion (drawing, or simple html mockup?). As me one question at a time about unclear things. Avoid going into any details that are not relevant for an initial implementation. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
The configuration modal was committing workflows immediately instead of staging changes for review. This caused two issues: - Widget didn't rebuild to show staged config changes - Removing/adding sources via modal had no visible effect until manual commit Changes: - Add WorkflowController.stage_workflow() to stage configs without committing - Add WorkflowStatusListWidget.rebuild_widget() to trigger display updates - Update create_workflow_adapter() to use stage_workflow in callback - Modal now stages changes and rebuilds widget on success - User reviews staged changes and clicks "Commit & Restart" to apply This restores the intended staging/review workflow: 1. Configure sources via modal → staged (not committed) 2. Review changes in widget (yellow border shows modifications) 3. Commit when ready via "Commit & Restart" button Original prompt: Please have a look at the latest commit, we just added WorkflowStatusWidget. screenshot-latest.png shows the current state. A couple of things are not working yet: - Removing a source name from an existing config does not update the row, nor do we get a row with unconfigured source names. - Configuring unconfigured sources does not add a new row. Essentially, configuring does not seem to have any effect. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
Add ability to stop active workflows from the UI, addressing the issue where workflows appear active but backend is not running. Users can now manually clear false active states and regain control. Changes: - Add JobOrchestrator.stop_workflow() to send stop commands and clear local state - Add WorkflowController.stop_workflow() wrapper for controller layer - Implement WorkflowStatusWidget._on_stop_click() to handle stop button clicks - Add comprehensive test suite (6 tests) covering stop workflow behavior - Preserve staged configuration after stop for easy restart - Persist stopped state to config store The stop button: - Sends stop commands to backend (works even if backend is down) - Clears local active job state immediately - Preserves staged config for restart - Updates UI to show stopped state This unblocks stuck workflows where JobOrchestrator believes job is active but no backend status messages arrive (e.g., backend not running). Phase 2 will address the root cause by changing state determination to rely on JobStatus from backend rather than local JobOrchestrator state. Original prompt: "Help me understand how WorkflowStatusWidget determines the job state. Currently I see cases where it displays ACTIVE | Starting... for job where the backend is not up. My guess is that of JobOrchestrator has a "current" configured job it is considered as ACTIVE, even if we do not have a status heartbeat from the backend? And there is no mechanism that auto-starts if the backend is available." Follow-up: "Yes, please implement phase 1. I support JobOrchestrator also needs to notify PlotOrchestrator that the workflow has stopped (so it can remove the plot)?" Decision: Keep current behavior (plots remain visible after stop) for now.
…ruth (Phase 2) Introduce SCHEDULED state to distinguish between workflows where commands have been sent vs. workflows actually running on the backend. This fixes the issue where workflows showed "ACTIVE | Starting..." indefinitely when the backend was not running. State determination now follows this logic: - STOPPED: No job configured in orchestrator - SCHEDULED: Job configured but no backend status messages received - ACTIVE/ERROR/WARNING/PAUSED: Based on backend JobStatus messages Changes: - Add SCHEDULED state to WorkflowWidgetStyles.STATUS_COLORS (blue) - Modify _get_workflow_status() to check backend status as source of truth - Update _get_timing_text() to show "Waiting for backend..." for SCHEDULED - Add 5 tests covering SCHEDULED state transitions and behavior Benefits: - Accurate state: UI reflects actual backend state, not frontend beliefs - No false active state: Backend down = SCHEDULED, not ACTIVE - Clear feedback: User knows workflow is waiting for backend - Unblocks restart: Stop button works to clear SCHEDULED state This addresses the root cause identified in Phase 1 where JobOrchestrator's local state (.current) was treated as the source of truth even when no backend was running. Phase 2.5 will add heartbeat timeout mechanism to auto-transition ACTIVE to SCHEDULED when backend stops responding. Original issue: "I see cases where it displays ACTIVE | Starting... for job where the backend is not up. My guess is that JobOrchestrator has a 'current' configured job it is considered as ACTIVE, even if we do not have a status heartbeat from the backend?"
Implement automatic detection of stale backend status to handle cases where the backend stops responding (crashes, network issues, etc.). Workflows now automatically transition from ACTIVE to SCHEDULED when status heartbeats exceed the timeout threshold (60 seconds). Changes: - Add timestamp tracking in JobService.status_updated() for each status update - Add JobService.is_status_stale() method to check if status is older than timeout - Modify WorkflowStatusWidget to check staleness when determining workflow state - Add STATUS_HEARTBEAT_TIMEOUT_NS constant (60 seconds) - Add 3 tests covering heartbeat timeout behavior Behavior: - Fresh status (< 60s old): Treated as valid, shows ACTIVE/ERROR/WARNING/PAUSED - Stale status (> 60s old): Treated as missing, shows SCHEDULED - Missing status: Treated as SCHEDULED This completes the state management improvements: - Phase 1: Stop button to manually clear false active states - Phase 2: Use backend status as source of truth (SCHEDULED state) - Phase 2.5: Auto-detect stale heartbeats and transition to SCHEDULED Benefits: - Automatic recovery from backend crashes/restarts - Clear feedback when backend becomes unresponsive - No manual intervention needed to detect backend failures - Configurable timeout threshold for different deployment scenarios Original request: "Yes, please implement phase 1. [...] Then summarize our plan for phase 2 to remind me." Follow-up: "Sounds good, please implement. Commit when tested, then proceed with phase 2.5."
Change workflow state from SCHEDULED to PENDING to more accurately reflect
what's actually happening: we send commands to Kafka and wait for backend
response, but we don't actively reschedule or retry.
Rationale:
- SCHEDULED implies active rescheduling mechanism (which we don't have)
- PENDING accurately describes "command sent, waiting for response"
- Backend execution depends on Kafka message persistence and consumer config
- We rely on Kafka's queue, not our own scheduling logic
Changes:
- Rename 'scheduled' to 'pending' in STATUS_COLORS (blue color unchanged)
- Update all docstrings: SCHEDULED → PENDING
- Update all test names and assertions
- Update comments mentioning "scheduled"
No behavioral changes - purely semantic clarification.
The state flow remains:
STOPPED → PENDING → ACTIVE
(command sent, (backend
waiting) confirmed)
Discussion context: "I wonder if SCHEDULED is correct when backend is not
reachable. After all, we never try to re-schedule, right? So should the UI
change? Or should we keep this and (as a future improvement) add a mechanism
that tries to re-schedule every once in a while?"
Decision: Rename to PENDING as it's more accurate. Future enhancement could
add auto-retry mechanism and introduce true SCHEDULED state if needed.
Add "Expand all" and "Collapse all" buttons to the workflow status list header for quickly expanding or collapsing all workflow cards at once. - Add set_expanded() method to WorkflowStatusWidget for programmatic control - Add header row with expand/collapse all buttons aligned to the right - Follow existing button styling patterns (light type, border, hover effect) - Add tests for expand/collapse all functionality Prompt: Please enhance WorkflowStatusListWidget by adding "Expand all" and "Collapse all" buttons. Follow existing styling. 🤖 Generated with [Claude Code](https://claude.ai/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
Replace full widget rebuilds with targeted property updates for smoother expand/collapse transitions: - Add _update_expand_state() to update only button, visibility, and border - Store references to _expand_btn, _header, _body for direct updates - Use pn.io.hold() in expand/collapse all to batch DOM changes Prompt: This (and also open/close of individual cards) is a bit "janky". Can we update in a smarter way? Or would a pn.io.hold help? 🤖 Generated with [Claude Code](https://claude.ai/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
Remove WorkflowController dependency from WorkflowStatusWidget and WorkflowStatusListWidget. The widget now uses JobOrchestrator directly for stop/commit/config operations. Changes: - Revert workflow_controller.py to main (remove stage_workflow and stop_workflow methods added in this branch) - Remove workflow_controller parameter from widget constructors - Widget now calls orchestrator.stop_workflow() directly - Add TODO comments where _build_widget() calls remain (to be replaced with lifecycle subscription callbacks in Phase 2) - Update reduction.py and test fixtures Phase 2 (TODO): Add lifecycle subscriptions to JobOrchestrator for cross-session synchronization. Prompt: "Please look through @docs/developer/plans/workflow-status-widget-architecture-fix.md - please perform the revert, and also strip all "bad"/"offending" code (e.g. in the widget) that needs to replaced (do not implement new bits, leave a TODO), or methods that will no longer be needed." 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Implement the widget lifecycle subscription mechanism in JobOrchestrator so that WorkflowStatusWidget instances across different browser sessions stay synchronized when workflow state changes. Changes: - Add WidgetLifecycleCallbacks dataclass with on_staged_changed, on_workflow_committed, and on_workflow_stopped callbacks - Add subscribe_to_widget_lifecycle() and unsubscribe_from_widget_lifecycle() methods to JobOrchestrator - Add notification calls in stage_config(), clear_staged_configs(), commit_workflow(), and stop_workflow() - Update WorkflowStatusWidget to subscribe to lifecycle events and rebuild via callback instead of direct _build_widget() calls after actions - Add 10 tests for the new subscription mechanism This completes Phase 2 of the WorkflowStatusWidget architecture fix as described in docs/developer/plans/workflow-status-widget-architecture-fix.md Prompt: Please think through @docs/developer/plans/workflow-status-widget-architecture-fix.md and perform the remaining work. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
The WorkflowStatusWidget already rebuilds automatically via its subscription to the JobOrchestrator's lifecycle events. When the modal's start_callback invokes start_workflow() → commit_workflow(), all subscribed widgets are notified and rebuild. The manual rebuild_widget() call was redundant and caused a double rebuild. This change: - Removes the manual rebuild_widget(workflow_id) call - Replaces with explanatory comment about automatic subscription-based rebuild - Improves performance by eliminating the duplicate rebuild Related to workflow-status-widget-architecture-fix.md Phase 2.
Adds a cleanup() method to properly unsubscribe from JobOrchestrator lifecycle events when the widget is destroyed. This prevents memory leaks from lingering subscriptions in dynamic widget scenarios. Changes: - Add cleanup() method to WorkflowStatusWidget - Unsubscribes from orchestrator widget lifecycle events - Documents that JobService cleanup isn't available (pre-existing limitation) - Add test to verify cleanup unsubscribes correctly The cleanup() method should be called when the widget is removed from the UI. In typical usage (widgets live entire session), this isn't critical, but it's good practice for proper resource management. Note: JobService lacks an unregister mechanism, so that subscription cannot be cleaned up. This is a pre-existing issue affecting all widgets using JobService. Related to workflow-status-widget-architecture-fix.md review feedback.
Adds JobOrchestrator.replace_staged_configs() to replace all staged configs in a single operation with a single notification to subscribers. This significantly reduces UI rebuilds during workflow start. Before this change, start_workflow triggered N+2 widget rebuilds: - clear_staged_configs() → rebuild 1 - stage_config() for source 1 → rebuild 2 - stage_config() for source 2 → rebuild 3 - ... - stage_config() for source N → rebuild N+1 - commit_workflow() → rebuild N+2 After this change, it triggers only 2 rebuilds: - replace_staged_configs() → rebuild 1 (all sources) - commit_workflow() → rebuild 2 Changes: - Add replace_staged_configs() method to JobOrchestrator - Update WorkflowController.start_workflow() to use replace_staged_configs() - Add 3 tests verifying single notification, config replacement, and copying This improves performance and reduces UI flicker when starting workflows with multiple sources. Related to workflow-status-widget-architecture-fix.md review feedback.
…nism Adds staging_transaction() context manager to JobOrchestrator to batch multiple staging operations with a single notification. This replaces the bespoke replace_staged_configs() method with a more general, future-proof pattern. Benefits: - Handles any combination of staging operations, not just specific use cases - New operations automatically benefit from batched notifications - Follows standard Python context manager pattern - Cleaner API with fewer methods to maintain Changes: - Add staging_transaction() context manager to JobOrchestrator - Support nested transactions for the same workflow - Reject attempts to nest transactions for different workflows - Update _notify_staged_changed() to defer notifications during transactions - Update WorkflowController.start_workflow() to use transaction - Update WorkflowStatusWidget._on_remove_click() to use transaction - Remove deprecated replace_staged_configs() method - Replace 3 replace_staged_configs tests with 4 comprehensive transaction tests This fixes the N+1 notification problem in both start_workflow (N+2 → 2 notifications) and _on_remove_click (N+1 → 1 notification), and prevents similar issues in future code. Original prompt: Have a look at the latest commit and related code. Think about whether JobOrchestrator should have a transaction mechanism instead of some bespoke methods. Follow-up: Yes, please make a todo list and implement! 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
d6f5183 to
cb5c6fc
Compare
- Add create_workflow_adapter() to JobOrchestrator for creating workflow configuration adapters directly, eliminating need for WorkflowController in the WorkflowStatusWidget flow - WorkflowStatusWidget now handles configuration modal internally: - Creates adapter via orchestrator.create_workflow_adapter() - Manages its own modal container (following PlotGridTabs pattern) - Removes dependency on injected on_configure callback - Simplify reduction.py by removing: - on_configure_workflow callback - _workflow_config_modal_container - _cleanup_workflow_config_modal method - Update tests to reflect new widget API This change aligns WorkflowStatusWidget with the JobOrchestrator-centric architecture, making the widget more self-contained and removing an unnecessary layer of indirection through WorkflowController. Prompt: on_configure_workflow in reduction.py seems misplaced. Can we move this functionality into JobOrchestrator, without using WorkflowController at all? 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
The test assumed widgets start expanded, but commit ff55f1a changed this to start collapsed. Update test to first expand widgets before testing that collapse_all() works. Prompt: "I think the lest commit broke tests/dashboard/widgets/workflow_status_widget_test.py::TestWorkflowStatusListWidget::test_collapse_all_collapses_all_widgets"
Update ConfigurationState schema to store per-source params and aux_source_names instead of a single shared config. This enables WorkflowStatusWidget to correctly manage different configurations for disjoint subsets of sources. Changes: - Add JobConfigState model for per-source configuration - Update ConfigurationState to use jobs: dict[str, JobConfigState] - Add set_selected_sources() method to ConfigurationAdapter for scoping adapters to source subsets - Update initial_source_names, initial_parameter_values, and initial_aux_source_names to respect scoped sources - Update JobOrchestrator._load_configs_from_store to load per-source configs from new schema - Update _save_workflow_config and _get_config_state to use new schema - Update WorkflowController.get_workflow_config similarly - Remove hasattr check in WorkflowStatusWidget._on_gear_click The "expand on load" mechanism (applying single params to all sources) is no longer needed as each source now has its own config stored. Prompt: Please think through JobOrchestrator.create_workflow_adapter - find it suspicious that it uses 'first_job_config = next(iter( staged_jobs.values()))'. WorkflowStatusWidget allows for managing different configs for disjoint subsets of source names. How is it maintaining/getting the correctly configured adapter/ConfigurationWidget for each? 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
The start_callback in create_workflow_adapter was clearing ALL staged configs before staging the selected sources. This caused independently configured sources to lose their configs when another source was configured via a scoped adapter. Remove the clear_staged_configs() call so that stage_config() only updates the selected sources, preserving configs for other sources. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
Wrap _build_widget in pn.io.hold() to batch DOM updates into a single browser transaction. Previously, the clear() followed by extend() caused a visible empty state between operations. Prompt: In workflow_status_widget.py there is significant "flicker" when updating the config of a workflow. Any idea why? 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
Wrap the Apply button handler in pn.io.hold() to batch all DOM updates: config staging, modal close, and container cleanup into a single browser transaction. Prompt: Better, but there is still some odd flicker when closing the modal. No flicker when "cancelling", but when "applying" it seems to whole screen flickers. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
Inverts the data model so ConfigurationState represents a single source's
configuration (what JobConfigState was), and JobOrchestrator manages the
collection. This removes the mismatch where ConfigurationAdapter could only
work with one source's config but received a bundle of all sources.
Key changes:
- ConfigurationState now has params/aux_source_names (single source)
- Removed JobConfigState class (merged into ConfigurationState)
- Removed set_selected_sources() - replaced by initial_source_names param
- create_workflow_adapter() accepts selected_sources parameter directly
- Orchestrator/controller fall back to staged sources for initial selection
The persistence format remains unchanged ({"jobs": {...}}) - only the
Pydantic model semantics changed.
Prompt: "Please help me think through the changes in configuration_adapter.py
relating to JobOrchestrator in this branch. I wonder about the reasoning
behind making ConfigurationState wrap the config of multiple jobs."
Follow-up: User suggested inverting the design so JobOrchestrator stores the
collection and ConfigurationAdapter works with single-source config directly.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <[email protected]>
Member
Author
|
We want to deploy this today and will thus merge, but review is still needed! |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
Implements a workflow status and control widget that replaces the old job-based UI with a workflow-centric interface. Each workflow is displayed as a collapsible card showing status, timing, controls, and configuration staging area.
Closes #563
Example
Notes
This will probably fully replace the dropdown-based workflow creation widget in the sidebar and likely also the
JobStatusListWidget(first tab).Key Features
WorkflowStatusWidget - Per-workflow control panel:
WorkflowStatusListWidget - Manages multiple workflow widgets:
Architecture Enhancements
Widget Lifecycle Subscriptions
Added
WidgetLifecycleCallbackstoJobOrchestratorfor cross-session synchronization. Widgets subscribe to events (on_staged_changed,on_workflow_committed,on_workflow_stopped) so all browser sessions stay synchronized when any session modifies state.Staging Transactions
Added
staging_transaction()context manager to batch multiple staging operations (e.g., clearing and re-staging configs) into a single notification. This prevents N+1 UI rebuilds when replacing all staged configs.Backend Status as Source of Truth
Widget uses
JobServicestatus as authoritative state, rather than maintaining parallel frontend state. This ensures consistency across sessions and prevents drift between UI and backend.Documentation
Added
.claude/rules/dashboard-widgets.mddocumenting the cross-session synchronization pattern to prevent future widgets from breaking multi-session support.