Skip to content

Add rough-cut skill#14

Open
RichardBray wants to merge 2 commits into
mainfrom
add-rough-cut-skill
Open

Add rough-cut skill#14
RichardBray wants to merge 2 commits into
mainfrom
add-rough-cut-skill

Conversation

@RichardBray

Copy link
Copy Markdown
Member

What

Adds a rough-cut skill under skills/ for turning raw long-take Camtasia recordings into a tight rough cut using camkit.

How it works

  1. Find the open Camtasia project, inspect on-timeline sources.
  2. Transcribe each source with Whisper + detect silences via ffmpeg.
  3. Pick the clean final take per beat, drop retakes/filler/dead air.
  4. camkit rebuild (dry-run first) to lay kept ranges in order.

Helper scripts

Three stdlib-only Python helpers (takes.py, range.py, dump.py) parse the word-level transcript JSON locally, so the full 3000+ word dump never loads into model context. No external deps, no uv needed.

🤖 Generated with Claude Code

Skill for turning raw long-take Camtasia recordings into a tight rough
cut via camkit: transcribe on-timeline sources with Whisper, detect
silences, then cut dead air, filler, false starts, and losing takes.

Includes three stdlib-only Python helpers (takes.py, range.py, dump.py)
that parse the word-level transcript JSON locally so the full word dump
never has to be loaded into model context.

@RichardBray RichardBray left a comment

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice skill - the workflow is detailed and the hard rules (silences, dry-run, bin-only) are exactly the right things to pin down. Overall I think this is close; the comments below are mostly one real bug plus a few robustness/convention nits.

Theme: the helper scripts want to be TypeScript in @camkit/core, not Python. CONTRIBUTING.md:44 says the project is TypeScript throughout, the workspace is Bun (so bun is a guaranteed runtime; python3 isn't), and CONTRIBUTING.md:48 asks for unit tests on transcript parsing. The take-segmentation in takes.py is exactly that kind of parsing logic, and the bug flagged below is the kind a test would have caught. Suggest moving segmentation + degenerate filtering + range query into core behind tests and exposing them as small camkit subcommands (or a single camkit takes / camkit words). Not a hard blocker, but I'd want at least the segmentation in core before relying on it.

Minor: SKILL.md says silences output is silence START-END (DUR), but the actual format (camkit.ts:463) is silence START.00-END.00s (DUR.00s) - has an s suffix on the timestamps. If an agent regex-parses this strictly it'll miss; worth quoting the real format.

Minor: the --force lock-file guidance in step 6 is reasonable given the camkit docs gate, but it's teaching a pattern that's easy to misapply. Maybe call out explicitly that --force should never be scripted/automated - only run after a human-readable camkit docs shows no open docs.

Comment thread skills/rough-cut/scripts/takes.py Outdated
if cur:takes.append(cur)
for t in takes:
s=t[0]['start'];e=t[-1]['end']
# drop degenerate tail words with zero-length identical stamps

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: the degenerate tail words are not actually dropped. The comment says "drop degenerate tail words with zero-length identical stamps", but:

  • txt = ' '.join(y['word'] for y in t) includes every word, degenerate or not;
  • e = t[-1]['end'] is the degenerate words' shared timestamp, not where real speech ends;
  • len(t) counts them.

SKILL.md (the Whisper pads clip ends... note) explicitly warns about exactly this - "20 words all at 223.78" - and tells the agent to end the last keep range before they start. The one helper that's supposed to make takes readable is instead inflating every affected take's reported duration, word count, and pasting the repeated word into the text the model reads.

Fix: strip trailing words whose end - start < epsilon (say 0.05s) before computing e, txt, and len. Better, also detect the frozen-stamp run (N trailing words sharing one start/end). A unit test with a synthetic degenerate tail would lock this in.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Segmentation moved to segmentTakes() in @camkit/core (transcript.ts), which strips degenerate words (end-start < 0.05s) before computing start/end/text/word-count. Covered by unit tests including a synthetic degenerate-tail case. Python script deleted; exposed as camkit takes.

Comment thread skills/rough-cut/SKILL.md Outdated
### 3. Transcribe + detect silences for each on-timeline source
For every on-timeline source (run these in parallel — they're independent):
```sh
camkit transcribe "<trec>" --out /tmp/rc/srcN.json # word-level Whisper (OpenAI whisper-1)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/tmp/rc/ is never created - camkit transcribe --out /tmp/rc/srcN.json will fail with ENOENT on a clean run. Add mkdir -p /tmp/rc (or pick the dir once at the top and reuse $RC_DIR).

Also: /tmp is cleared on reboot and is shared across users on the same machine (collision risk if two people rough-cut simultaneously). A project-local .camkit/rc/ (added to .gitignore) survives reboots and scopes the scratch to the project; transcripts/silences are reusable for recuts per the Recutting section, so durability matters.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Changed to a project-local $P/.camkit/rc/ with mkdir -p. Survives reboots, scoped to the project, and transcripts stay reusable for recuts.

Comment thread skills/rough-cut/SKILL.md Outdated
camkit status # confirms Camtasia is running + which doc is open
camkit docs # the open .cmproj name
```
Resolve its full path (e.g. `find ~ -maxdepth 5 -name "<doc>.cmproj"`). Use it as `--project` for every command, or rely on the read-command fallback to the open project. Keep the path in a shell var.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

find ~ -maxdepth 5 -name "<doc>.cmproj" scans the whole home dir - slow on a dev machine, and it's working around a CLI gap that's already solved in the library layer.

camtasiaDocPaths() in packages/darwin/src/index.ts:53 already returns the full POSIX path of every open document ({name, path}). camkit docs just doesn't surface it - cmdDocs (camkit.ts:516) calls camtasiaDocs(), names only. Cleaner to switch camkit docs to camtasiaDocPaths() (or add --paths) so the skill gets the path directly from the running app instead of filesystem-scanning for it. Happy to do that in a follow-up PR if you want to keep this one doc-only.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. camkit docs now uses camtasiaDocPaths() and prints <name>\t<full path>. The skill captures the path directly from it - no more find ~ scan.

Comment thread skills/rough-cut/SKILL.md Outdated

These recordings are **heavy retake material**: the presenter says each beat many times, restarting, until the last pass is clean. The keeper for a beat is almost always the **final complete clean delivery**; everything before it is false starts to cut.

Reading 3000+ raw words per source into context is wasteful. Three helper scripts in `scripts/` (run from wherever the `srcN.json` transcripts live, e.g. `python3 <skill>/scripts/takes.py 5`) make it tractable:

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<skill> is never resolved to a concrete path. After the SKILLS.md symlink step the scripts live at .claude/skills/rough-cut/scripts/; without it they're at skills/rough-cut/scripts/. An agent running this will have to guess, and the takes.py bug above means a wrong path is a silent failure mode. Either hardcode skills/rough-cut/scripts/ (the canonical repo location) or define <skill> once at the top of this file.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. The Python scripts are gone, replaced by camkit takes and camkit words subcommands. No <skill> path to resolve.

Comment thread skills/rough-cut/scripts/takes.py Outdated
@@ -0,0 +1,16 @@
import json,sys

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Convention note: these three scripts are Python in a repo that CONTRIBUTING.md:44 says is TypeScript throughout, and they carry real parsing logic (segmentation, the degenerate-word handling above, range query) with no tests - CONTRIBUTING.md:48 asks for unit tests on transcript parsing.

If they stay as scripts, TS under scripts/ (run by the existing Bun runtime) matches the repo and removes a runtime dependency. If the logic matters to the workflow (the dead-air trap suggests it does), it belongs in @camkit/core behind tests and exposed as camkit subcommands - then the skill just calls camkit takes <src>, camkit words <src> A B, and the parsing can't drift from the transcript shape camkit transcribe produces.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Segmentation, degenerate filtering, and range query are now in @camkit/core as TypeScript behind unit tests, exposed as camkit takes <file> [gap] and camkit words <file> <start> <end>. The Python scripts are deleted.

Comment thread skills/rough-cut/scripts/range.py Outdated
@@ -0,0 +1,5 @@
import json,sys
n,a,b=sys.argv[1],float(sys.argv[2]),float(sys.argv[3])
w=json.load(open(f'src{n}.json'))['words']

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: if srcN.json is missing or malformed, json.load(open(...)) throws an opaque traceback (FileNotFoundError / KeyError on words). A one-line guard (if not exists: sys.exit('src%s.json not found - run camkit transcribe --out src%s.json' % (n,n))) would save an agent a confused detour. Same applies to takes.py and dump.py.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Scripts deleted. The replacement camkit takes/camkit words subcommands check existsSync and throw clear messages (No such file: <path>, has no word-level "words" array).

@RichardBray RichardBray left a comment

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review of the rough-cut skill. The SKILL.md is well-structured and the silence/dry-run safety guidance is thorough. A few issues below - the main one is in takes.py, where a promised filter is missing and corrupts the reported take boundaries.

Comment thread skills/rough-cut/scripts/takes.py Outdated
Comment on lines +13 to +15
s=t[0]['start'];e=t[-1]['end']
# drop degenerate tail words with zero-length identical stamps
txt=' '.join(y['word'] for y in t)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment on line 14 says "drop degenerate tail words with zero-length identical stamps", but nothing here actually drops them - txt joins every word in t, and e=t[-1]['end'] takes the last word's end, degenerate or not.

This matters: SKILL.md (line 63) explicitly warns Whisper pads clip ends with degenerate zero-length words at a frozen timestamp (e.g. 20 words all at 223.78). When those cluster into the final take, e picks up that frozen stamp and the reported (e-s) duration is wrong. When they land as their own micro-take (gap > gap from the real tail), they print as noise.

Suggested fix - strip them before computing s/e/txt:

t = [y for y in t if y['end'] - y['start'] > 0]
if not t: continue

That makes the comment true and the printed durations match the audible take.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. segmentTakes() in core now filters isDegenerate(w) (end-start < 0.05s) before computing s, e, txt, and len. Tested with a synthetic 20-word frozen-stamp tail - the take's reported end matches the last real word, and a pure-degenerate cluster is dropped entirely.

Comment thread skills/rough-cut/SKILL.md Outdated
Loop over them in one backgrounded batch and `wait`; ~45 min across 8 sources finishes in a couple of minutes.
- `--db` / `--min` tune sensitivity. Start `-35 dB`, `0.4 s`. Adjust if needed (quieter mic → `-30`; only long pauses → `--min 0.8`).
- The transcript JSON is `{text, words:[{word,start,end}], segments}`. Use word times for content boundaries; use `silences` for pauses.
- **`silences` output format** is `silence START-END (DUR)` per line (camkit reformats ffmpeg). Parse those, not raw `silence_start:` lines.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documented format doesn't match the actual CLI output. cmdSilences in packages/cli/src/camkit.ts:463 prints:

silence  START-ENDs  (DURs)

with an s suffix on both the end time and the duration, and a double space before the parens. The doc shows silence START-END (DUR) (no s, single space).

An agent writing a regex off this doc to parse the ranges would mismatch. Worth correcting to the real format, or just saying "parse the two float timestamps on each silence line" without implying an exact literal.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Updated to silence START-ENDs (DURs) matching cmdSilences at camkit.ts:463. Added a concrete example line too.

Comment thread skills/rough-cut/SKILL.md Outdated
camkit status # confirms Camtasia is running + which doc is open
camkit docs # the open .cmproj name
```
Resolve its full path (e.g. `find ~ -maxdepth 5 -name "<doc>.cmproj"`). Use it as `--project` for every command, or rely on the read-command fallback to the open project. Keep the path in a shell var.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

find ~ -maxdepth 5 -name "<doc>.cmproj" walks the entire home directory and can match more than one project (copies, backups, .bak dirs). Slow and ambiguous. Consider scoping to the common Camtasia project root, e.g. find ~/Documents/Camtasia -maxdepth 3 -name "<doc>.cmproj", or noting that camkit docs already returns the full path on macOS so the find is only a fallback.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. camkit docs returns full paths now, so the find ~ fallback is gone from the skill entirely.

Comment thread skills/rough-cut/scripts/takes.py Outdated
import json,sys
n=sys.argv[1]
gap=float(sys.argv[2]) if len(sys.argv)>2 else 1.2
d=json.load(open(f'src{n}.json'))

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: json.load(open(...)) leaks the file handle (fine for a one-shot script, but with open(...) as f: is cleaner). Also no arg guard - running takes.py with no source number throws an ugly IndexError rather than a usage line. Same applies to range.py/dump.py. Low priority since these are internal helpers.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Scripts deleted. The TS subcommands use readFileSync (no handle leak) and have arg guards (Usage: messages on missing positionals).

- Move take segmentation + range query into @camkit/core (transcript.ts)
  with degenerate-word filtering and unit tests. The Python takes.py had a
  bug where degenerate tail words (Whisper padding) were counted in the
  take's duration, word count, and text despite the comment saying they
  were dropped. Now handled correctly with tests.

- Add  and  CLI subcommands replacing the
  Python helper scripts. Segmentation logic lives in core behind tests,
  matching CONTRIBUTING.md's TS-throughout convention.

- Fix Camtasia is not running, or has no projects open. to use camtasiaDocPaths() so it returns full paths,
  not just document names. The skill no longer needs to filesystem-scan.

- SKILL.md fixes:
  - Replace Python scripts with camkit takes/words subcommands
  - /tmp/rc/ → project-local .camkit/rc/ (survives reboots, mkdir -p)
  - find ~ -maxdepth 5 → camkit docs (returns full paths now)
  - Silences format corrected to START-ENDs (DURs) matching actual output
  - --force guidance: never script or automate
  - Remove unresolved <skill> path placeholder (no more scripts)

- Delete skills/rough-cut/scripts/ (takes.py, range.py, dump.py)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant