Add rough-cut skill#14
Conversation
Skill for turning raw long-take Camtasia recordings into a tight rough cut via camkit: transcribe on-timeline sources with Whisper, detect silences, then cut dead air, filler, false starts, and losing takes. Includes three stdlib-only Python helpers (takes.py, range.py, dump.py) that parse the word-level transcript JSON locally so the full word dump never has to be loaded into model context.
RichardBray
left a comment
There was a problem hiding this comment.
Nice skill - the workflow is detailed and the hard rules (silences, dry-run, bin-only) are exactly the right things to pin down. Overall I think this is close; the comments below are mostly one real bug plus a few robustness/convention nits.
Theme: the helper scripts want to be TypeScript in @camkit/core, not Python. CONTRIBUTING.md:44 says the project is TypeScript throughout, the workspace is Bun (so bun is a guaranteed runtime; python3 isn't), and CONTRIBUTING.md:48 asks for unit tests on transcript parsing. The take-segmentation in takes.py is exactly that kind of parsing logic, and the bug flagged below is the kind a test would have caught. Suggest moving segmentation + degenerate filtering + range query into core behind tests and exposing them as small camkit subcommands (or a single camkit takes / camkit words). Not a hard blocker, but I'd want at least the segmentation in core before relying on it.
Minor: SKILL.md says silences output is silence START-END (DUR), but the actual format (camkit.ts:463) is silence START.00-END.00s (DUR.00s) - has an s suffix on the timestamps. If an agent regex-parses this strictly it'll miss; worth quoting the real format.
Minor: the --force lock-file guidance in step 6 is reasonable given the camkit docs gate, but it's teaching a pattern that's easy to misapply. Maybe call out explicitly that --force should never be scripted/automated - only run after a human-readable camkit docs shows no open docs.
| if cur:takes.append(cur) | ||
| for t in takes: | ||
| s=t[0]['start'];e=t[-1]['end'] | ||
| # drop degenerate tail words with zero-length identical stamps |
There was a problem hiding this comment.
Bug: the degenerate tail words are not actually dropped. The comment says "drop degenerate tail words with zero-length identical stamps", but:
txt = ' '.join(y['word'] for y in t)includes every word, degenerate or not;e = t[-1]['end']is the degenerate words' shared timestamp, not where real speech ends;len(t)counts them.
SKILL.md (the Whisper pads clip ends... note) explicitly warns about exactly this - "20 words all at 223.78" - and tells the agent to end the last keep range before they start. The one helper that's supposed to make takes readable is instead inflating every affected take's reported duration, word count, and pasting the repeated word into the text the model reads.
Fix: strip trailing words whose end - start < epsilon (say 0.05s) before computing e, txt, and len. Better, also detect the frozen-stamp run (N trailing words sharing one start/end). A unit test with a synthetic degenerate tail would lock this in.
There was a problem hiding this comment.
Fixed. Segmentation moved to segmentTakes() in @camkit/core (transcript.ts), which strips degenerate words (end-start < 0.05s) before computing start/end/text/word-count. Covered by unit tests including a synthetic degenerate-tail case. Python script deleted; exposed as camkit takes.
| ### 3. Transcribe + detect silences for each on-timeline source | ||
| For every on-timeline source (run these in parallel — they're independent): | ||
| ```sh | ||
| camkit transcribe "<trec>" --out /tmp/rc/srcN.json # word-level Whisper (OpenAI whisper-1) |
There was a problem hiding this comment.
/tmp/rc/ is never created - camkit transcribe --out /tmp/rc/srcN.json will fail with ENOENT on a clean run. Add mkdir -p /tmp/rc (or pick the dir once at the top and reuse $RC_DIR).
Also: /tmp is cleared on reboot and is shared across users on the same machine (collision risk if two people rough-cut simultaneously). A project-local .camkit/rc/ (added to .gitignore) survives reboots and scopes the scratch to the project; transcripts/silences are reusable for recuts per the Recutting section, so durability matters.
There was a problem hiding this comment.
Fixed. Changed to a project-local $P/.camkit/rc/ with mkdir -p. Survives reboots, scoped to the project, and transcripts stay reusable for recuts.
| camkit status # confirms Camtasia is running + which doc is open | ||
| camkit docs # the open .cmproj name | ||
| ``` | ||
| Resolve its full path (e.g. `find ~ -maxdepth 5 -name "<doc>.cmproj"`). Use it as `--project` for every command, or rely on the read-command fallback to the open project. Keep the path in a shell var. |
There was a problem hiding this comment.
find ~ -maxdepth 5 -name "<doc>.cmproj" scans the whole home dir - slow on a dev machine, and it's working around a CLI gap that's already solved in the library layer.
camtasiaDocPaths() in packages/darwin/src/index.ts:53 already returns the full POSIX path of every open document ({name, path}). camkit docs just doesn't surface it - cmdDocs (camkit.ts:516) calls camtasiaDocs(), names only. Cleaner to switch camkit docs to camtasiaDocPaths() (or add --paths) so the skill gets the path directly from the running app instead of filesystem-scanning for it. Happy to do that in a follow-up PR if you want to keep this one doc-only.
There was a problem hiding this comment.
Fixed. camkit docs now uses camtasiaDocPaths() and prints <name>\t<full path>. The skill captures the path directly from it - no more find ~ scan.
|
|
||
| These recordings are **heavy retake material**: the presenter says each beat many times, restarting, until the last pass is clean. The keeper for a beat is almost always the **final complete clean delivery**; everything before it is false starts to cut. | ||
|
|
||
| Reading 3000+ raw words per source into context is wasteful. Three helper scripts in `scripts/` (run from wherever the `srcN.json` transcripts live, e.g. `python3 <skill>/scripts/takes.py 5`) make it tractable: |
There was a problem hiding this comment.
<skill> is never resolved to a concrete path. After the SKILLS.md symlink step the scripts live at .claude/skills/rough-cut/scripts/; without it they're at skills/rough-cut/scripts/. An agent running this will have to guess, and the takes.py bug above means a wrong path is a silent failure mode. Either hardcode skills/rough-cut/scripts/ (the canonical repo location) or define <skill> once at the top of this file.
There was a problem hiding this comment.
Fixed. The Python scripts are gone, replaced by camkit takes and camkit words subcommands. No <skill> path to resolve.
| @@ -0,0 +1,16 @@ | |||
| import json,sys | |||
There was a problem hiding this comment.
Convention note: these three scripts are Python in a repo that CONTRIBUTING.md:44 says is TypeScript throughout, and they carry real parsing logic (segmentation, the degenerate-word handling above, range query) with no tests - CONTRIBUTING.md:48 asks for unit tests on transcript parsing.
If they stay as scripts, TS under scripts/ (run by the existing Bun runtime) matches the repo and removes a runtime dependency. If the logic matters to the workflow (the dead-air trap suggests it does), it belongs in @camkit/core behind tests and exposed as camkit subcommands - then the skill just calls camkit takes <src>, camkit words <src> A B, and the parsing can't drift from the transcript shape camkit transcribe produces.
There was a problem hiding this comment.
Fixed. Segmentation, degenerate filtering, and range query are now in @camkit/core as TypeScript behind unit tests, exposed as camkit takes <file> [gap] and camkit words <file> <start> <end>. The Python scripts are deleted.
| @@ -0,0 +1,5 @@ | |||
| import json,sys | |||
| n,a,b=sys.argv[1],float(sys.argv[2]),float(sys.argv[3]) | |||
| w=json.load(open(f'src{n}.json'))['words'] | |||
There was a problem hiding this comment.
Minor: if srcN.json is missing or malformed, json.load(open(...)) throws an opaque traceback (FileNotFoundError / KeyError on words). A one-line guard (if not exists: sys.exit('src%s.json not found - run camkit transcribe --out src%s.json' % (n,n))) would save an agent a confused detour. Same applies to takes.py and dump.py.
There was a problem hiding this comment.
Fixed. Scripts deleted. The replacement camkit takes/camkit words subcommands check existsSync and throw clear messages (No such file: <path>, has no word-level "words" array).
RichardBray
left a comment
There was a problem hiding this comment.
Review of the rough-cut skill. The SKILL.md is well-structured and the silence/dry-run safety guidance is thorough. A few issues below - the main one is in takes.py, where a promised filter is missing and corrupts the reported take boundaries.
| s=t[0]['start'];e=t[-1]['end'] | ||
| # drop degenerate tail words with zero-length identical stamps | ||
| txt=' '.join(y['word'] for y in t) |
There was a problem hiding this comment.
The comment on line 14 says "drop degenerate tail words with zero-length identical stamps", but nothing here actually drops them - txt joins every word in t, and e=t[-1]['end'] takes the last word's end, degenerate or not.
This matters: SKILL.md (line 63) explicitly warns Whisper pads clip ends with degenerate zero-length words at a frozen timestamp (e.g. 20 words all at 223.78). When those cluster into the final take, e picks up that frozen stamp and the reported (e-s) duration is wrong. When they land as their own micro-take (gap > gap from the real tail), they print as noise.
Suggested fix - strip them before computing s/e/txt:
t = [y for y in t if y['end'] - y['start'] > 0]
if not t: continueThat makes the comment true and the printed durations match the audible take.
There was a problem hiding this comment.
Fixed. segmentTakes() in core now filters isDegenerate(w) (end-start < 0.05s) before computing s, e, txt, and len. Tested with a synthetic 20-word frozen-stamp tail - the take's reported end matches the last real word, and a pure-degenerate cluster is dropped entirely.
| Loop over them in one backgrounded batch and `wait`; ~45 min across 8 sources finishes in a couple of minutes. | ||
| - `--db` / `--min` tune sensitivity. Start `-35 dB`, `0.4 s`. Adjust if needed (quieter mic → `-30`; only long pauses → `--min 0.8`). | ||
| - The transcript JSON is `{text, words:[{word,start,end}], segments}`. Use word times for content boundaries; use `silences` for pauses. | ||
| - **`silences` output format** is `silence START-END (DUR)` per line (camkit reformats ffmpeg). Parse those, not raw `silence_start:` lines. |
There was a problem hiding this comment.
The documented format doesn't match the actual CLI output. cmdSilences in packages/cli/src/camkit.ts:463 prints:
silence START-ENDs (DURs)
with an s suffix on both the end time and the duration, and a double space before the parens. The doc shows silence START-END (DUR) (no s, single space).
An agent writing a regex off this doc to parse the ranges would mismatch. Worth correcting to the real format, or just saying "parse the two float timestamps on each silence line" without implying an exact literal.
There was a problem hiding this comment.
Fixed. Updated to silence START-ENDs (DURs) matching cmdSilences at camkit.ts:463. Added a concrete example line too.
| camkit status # confirms Camtasia is running + which doc is open | ||
| camkit docs # the open .cmproj name | ||
| ``` | ||
| Resolve its full path (e.g. `find ~ -maxdepth 5 -name "<doc>.cmproj"`). Use it as `--project` for every command, or rely on the read-command fallback to the open project. Keep the path in a shell var. |
There was a problem hiding this comment.
find ~ -maxdepth 5 -name "<doc>.cmproj" walks the entire home directory and can match more than one project (copies, backups, .bak dirs). Slow and ambiguous. Consider scoping to the common Camtasia project root, e.g. find ~/Documents/Camtasia -maxdepth 3 -name "<doc>.cmproj", or noting that camkit docs already returns the full path on macOS so the find is only a fallback.
There was a problem hiding this comment.
Fixed. camkit docs returns full paths now, so the find ~ fallback is gone from the skill entirely.
| import json,sys | ||
| n=sys.argv[1] | ||
| gap=float(sys.argv[2]) if len(sys.argv)>2 else 1.2 | ||
| d=json.load(open(f'src{n}.json')) |
There was a problem hiding this comment.
Minor: json.load(open(...)) leaks the file handle (fine for a one-shot script, but with open(...) as f: is cleaner). Also no arg guard - running takes.py with no source number throws an ugly IndexError rather than a usage line. Same applies to range.py/dump.py. Low priority since these are internal helpers.
There was a problem hiding this comment.
Fixed. Scripts deleted. The TS subcommands use readFileSync (no handle leak) and have arg guards (Usage: messages on missing positionals).
- Move take segmentation + range query into @camkit/core (transcript.ts) with degenerate-word filtering and unit tests. The Python takes.py had a bug where degenerate tail words (Whisper padding) were counted in the take's duration, word count, and text despite the comment saying they were dropped. Now handled correctly with tests. - Add and CLI subcommands replacing the Python helper scripts. Segmentation logic lives in core behind tests, matching CONTRIBUTING.md's TS-throughout convention. - Fix Camtasia is not running, or has no projects open. to use camtasiaDocPaths() so it returns full paths, not just document names. The skill no longer needs to filesystem-scan. - SKILL.md fixes: - Replace Python scripts with camkit takes/words subcommands - /tmp/rc/ → project-local .camkit/rc/ (survives reboots, mkdir -p) - find ~ -maxdepth 5 → camkit docs (returns full paths now) - Silences format corrected to START-ENDs (DURs) matching actual output - --force guidance: never script or automate - Remove unresolved <skill> path placeholder (no more scripts) - Delete skills/rough-cut/scripts/ (takes.py, range.py, dump.py)
What
Adds a
rough-cutskill underskills/for turning raw long-take Camtasia recordings into a tight rough cut usingcamkit.How it works
camkit rebuild(dry-run first) to lay kept ranges in order.Helper scripts
Three stdlib-only Python helpers (
takes.py,range.py,dump.py) parse the word-level transcript JSON locally, so the full 3000+ word dump never loads into model context. No external deps, no uv needed.🤖 Generated with Claude Code