Skip to content

Keyboard recording and display#1648

Draft
bytemines wants to merge 3 commits intoCapSoftware:mainfrom
bytemines:feat/keyboard-recording
Draft

Keyboard recording and display#1648
bytemines wants to merge 3 commits intoCapSoftware:mainfrom
bytemines:feat/keyboard-recording

Conversation

@bytemines
Copy link

@bytemines bytemines commented Mar 6, 2026

Summary

Keyboard keystroke recording and display for the video editor. Extracted from #1615 (keyboard-only — no captions timeline changes).

What it does:

  • Captures keystrokes during screen recording (key down/up with timestamps)
  • Groups rapid typing into display segments with configurable thresholds
  • Renders keyboard overlays via WGPU with fade/bounce animations
  • Provides interactive timeline track (drag, resize, split, delete)
  • Settings panel for font family, weight, size, colors, animation timing
  • Auto-regenerates segments when grouping settings change

Changes from #1615

Extracted keyboard-only code and applied fixes during extraction:

Correctness

  • Fixed O(n²) modifier tracking → O(n) sliding window
  • Shift no longer breaks segments — capitalization stays in same group
  • Added CapsLock toggle tracking with Shift XOR logic
  • Added Shift+symbol mapping (1→!, 2→@, etc. — full US QWERTY)
  • Fixed RMeta (right Command) not recognized as modifier
  • Fixed Space key mismatch (" " vs "Space") — special-key logic now works
  • Backspace works even when show_special_keys is disabled
  • Fixed stale current_keys in modifier group transitions
  • Events sorted defensively inside group_key_events
  • Fixed seconds/ms unit mismatch in display duration (was passed as 0.8s to Rust expecting ms, making all segments minimal length)
  • Fixed split logic unit mismatch (timeOffsetMs compared against seconds)
  • Uppercase keys in Shift combos (⌘+⇧+L not ⌘+⇧+l)
  • Hotkey combos use + separator (⌘+⇧+L instead of ⌘⇧L)

Data integrity

  • SingleSegment recordings now load keyboard data (was silently empty)
  • Crash recovery restores keyboard.json (was hardcoded to None)
  • Split segments properly partition displayText and keys at split point
  • Override fields use skip_serializing_if to reduce JSON size
  • Segment IDs use crypto.randomUUID() instead of Date.now()

Code quality

  • Fixed Clippy violations (collapsible_if, unnecessary_lazy_evaluations, needless_borrow, unnecessary_cast)
  • FromStr for KeyboardPosition returns Err for unknowns + #[derive(Default)]
  • Extracted rendering magic numbers to named constants
  • Removed redundant set_scissor_rect GPU call
  • eprintln!tracing::warn!
  • Unit-suffixed field names throughout: fadeDurationSecs, displayDurationSecs, timeOffsetMs, groupingThresholdMs
  • Renamed "linger duration" → "display duration" for clarity
  • No code comments (project convention)

Frontend (SolidJS)

  • Fixed getSetting type safety (removed Record<string, unknown> cast)
  • selectedSegment wrapped in createMemo (proper reactivity)
  • Added toast notifications on segment generation success/failure
  • Extracted shared MIN_KEYBOARD_SEGMENT_SECS constant
  • Font family dropdown (Monospace default, Sans-Serif, Serif)
  • Font weight dropdown (Normal, Medium, Bold)
  • Auto-regenerate segments when grouping/display settings change
  • Fixed missing Input import in ConfigSidebar (crash when clicking segment)
  • Semantic icons per settings field

Scope

  • Zero captions code — fully independent from captions timeline migration

Architecture

Recording → cursor.rs polls device_query every ~16ms
         → key events flushed to keyboard.json per segment

Editor   → generate_keyboard_segments groups events by threshold
         → KeyboardTrack provides drag/resize/split/delete
         → KeyboardTab configures appearance and behavior

Render   → KeyboardLayer (WGPU) renders text overlay with fade/bounce
         → Included in both editor preview and MP4/GIF export

Follow-up

Mechanical keyboard sounds + click sounds will be a separate PR, using these keystroke timestamps and existing CursorClickEvent data mixed into the audio pipeline.

Test plan

  • cargo test -p cap-project — 17 tests pass
  • cargo check -p cap-desktop — full build passes
  • cargo clippy — clean
  • pnpm typecheck — only pre-existing web app errors
  • Unit audit: all time values verified (seconds vs ms) across full pipeline
  • Manual: record with keyboard enabled, verify overlay in editor
  • Manual: export MP4, verify keyboard overlay in output
  • Manual: split/delete/drag keyboard segments in timeline
  • Manual: test Shift+symbols and CapsLock display
  • Manual: change grouping/display settings and verify auto-regeneration

Surgically extracted from CapSoftware#1615 (keyboard-only, no captions changes)
with bug fixes applied during extraction.

Captures keystrokes during recording via device_query polling,
groups them into segments, renders as WGPU overlays with
fade/bounce animations, and provides interactive timeline editing.
- Modifier-only presses no longer create standalone segments
- Rapid consecutive hotkeys (Cmd+C, Cmd+V, Cmd+Z) merge into
  a single segment instead of creating overlapping tiny ones
- All segments enforce 300ms minimum visible duration
- Added tests for rapid hotkey merging and minimum duration
…tion

- Fix seconds/ms mismatch: displayDuration was passed as seconds to Rust
  expecting ms, causing segments to always have minimal length
- Fix split logic: timeOffset (ms) was compared against split time (seconds)
- Add font family dropdown (Monospace default, Sans-Serif, Serif)
- Auto-regenerate segments when grouping settings change
- Use + separator for hotkey combos (⌘+⇧+L instead of ⌘⇧L)
- Uppercase keys in Shift combos (⌘+⇧+L not ⌘+⇧+l)
- Rename linger -> display throughout codebase for clarity
- Add unit suffixes to field names (fadeDurationSecs, displayDurationSecs,
  timeOffsetMs, groupingThresholdMs) to prevent future mismatches
- Fix missing Input import in ConfigSidebar causing crash on segment click
- Default font changed to System Monospace
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant