-
Notifications
You must be signed in to change notification settings - Fork 149
Add Intel Mac detection and model validation utilities #210
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
VAD Benchmark ResultsPerformance Comparison
Dataset Details
✅: Average F1-Score above 70% |
Offline VBx Pipeline ResultsSpeaker Diarization Performance (VBx Batch Mode)Optimal clustering with Hungarian algorithm for maximum accuracy
Offline VBx Pipeline Timing BreakdownTime spent in each stage of batch diarization
Speaker Diarization Research ComparisonOffline VBx achieves competitive accuracy with batch processing
Pipeline Details:
🎯 Offline VBx Test • AMI Corpus ES2004a • 1049.0s meeting audio • 344.7s processing • Test runtime: 5m 45s • 12/13/2025, 09:40 PM EST |
100a538 to
48b6043
Compare
Speaker Diarization Benchmark ResultsSpeaker Diarization PerformanceEvaluating "who spoke when" detection accuracy
Diarization Pipeline Timing BreakdownTime spent in each stage of speaker diarization
Speaker Diarization Research ComparisonResearch baselines typically achieve 18-30% DER on standard datasets
Note: RTFx shown above is from GitHub Actions runner. On Apple Silicon with ANE:
🎯 Speaker Diarization Test • AMI Corpus ES2004a • 1049.0s meeting audio • 73.1s diarization time • Test runtime: 1m 55s • 12/13/2025, 09:38 PM EST |
ASR Benchmark Results ✅Status: All benchmarks passed Parakeet v3 (multilingual)
Parakeet v2 (English-optimized)
Streaming (v3)
Streaming (v2)
Streaming tests use 5 files with 0.5s chunks to simulate real-time audio streaming 25 files per dataset • Test runtime: 7m44s • 12/13/2025, 09:43 PM EST RTFx = Real-Time Factor (higher is better) • Calculated as: Total audio duration ÷ Total processing time Expected RTFx Performance on Physical M1 Hardware:• M1 Mac: ~28x (clean), ~25x (other) Testing methodology follows HuggingFace Open ASR Leaderboard |
0086c05 to
a5b3ddf
Compare
Changes TdtDecoderState.update() to copy data into existing arrays instead of replacing array references. Before: hiddenState = newArray (orphans old array, memory accumulates) After: hiddenState.copyData(from: newArray) (reuses same array) This prevents MLMultiArray instances from accumulating over long transcription sessions, which could cause progressive slowdown. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
Update tests to verify that TdtDecoderState.update() reuses existing arrays and copies values into them, rather than checking for object identity with the new arrays from decoder output. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
The previous memcpy-based copy assumed contiguous memory layout, but CoreML output arrays may have different strides than our ANE-aligned arrays. This could cause incorrect data copying and affect WER. Now checks if both arrays are contiguous before using fast memcpy, otherwise falls back to element-by-element copy that respects strides. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
a5b3ddf to
4cca377
Compare
| mutating func update(from decoderOutput: MLFeatureProvider) { | ||
| hiddenState = decoderOutput.featureValue(for: "h_out")?.multiArrayValue ?? hiddenState | ||
| cellState = decoderOutput.featureValue(for: "c_out")?.multiArrayValue ?? cellState | ||
| // Copy data into existing arrays instead of replacing them to avoid memory leaks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we jus t remove the memory optimziation and see how much worse the perf is? irrc its not a huge performance
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about RTFx? latency
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should not affect WER
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RTFx is about the same after running 1000 files on the main vs this branch
- Add SystemInfo.isAppleSilicon and SystemInfo.isIntelMac for architecture detection - Add AsrModels.isModelValid() to validate Parakeet models can load - Returns false on Intel Macs (no ANE support) - Validates all 4 model components (Preprocessor, Encoder, Decoder, Joint) - Uses CPU-only loading to avoid triggering ANE compilation during validation These utilities help apps guard UI for Intel Mac users and validate model integrity. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
87cdff9 to
faeab88
Compare
Replace vDSP/memcpy-based implementations with simple loops: - Remove import Accelerate - Simplify resetData(to:) to use basic loop - Simplify copyData(from:) to use basic loop - Remove isContiguousLayout() helper Benchmarks show the simple implementation is ~8% faster and uses 84% less memory (179 MB vs 1.14 GB peak) than the optimized version. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
BrandonWeng
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm but remove json files pls
## Summary - `SystemInfo.isAppleSilicon` and `SystemInfo.isIntelMac` to detect platform - `AsrModels.isModelValid()` validates all 4 Parakeet components (Preprocessor, Encoder, Decoder, Joint) can load without corruption - Reuse decoder state arrays to prevent memory accumulation during streaming - Handle non-contiguous strides in copyData ### What VoiceInk should do (app side) | Issue | VoiceInk Fix | |-------|-------------| | Intel Mac users selecting Parakeet | Use `SystemInfo.isIntelMac` to hide/disable Parakeet models in UI | | Infinite "Transcribing" hang | Add timeout to transcription calls with user-facing error | | 20-30s delay after sleep | Show "Loading model..." UI during model load (ANE recompilation is Apple's `anecompilerservice`, cannot be sped up) | | Model corruption | Use `AsrModels.isModelValid()` before transcription, prompt re-download if invalid | ---------
## Summary - `SystemInfo.isAppleSilicon` and `SystemInfo.isIntelMac` to detect platform - `AsrModels.isModelValid()` validates all 4 Parakeet components (Preprocessor, Encoder, Decoder, Joint) can load without corruption - Reuse decoder state arrays to prevent memory accumulation during streaming - Handle non-contiguous strides in copyData ### What VoiceInk should do (app side) | Issue | VoiceInk Fix | |-------|-------------| | Intel Mac users selecting Parakeet | Use `SystemInfo.isIntelMac` to hide/disable Parakeet models in UI | | Infinite "Transcribing" hang | Add timeout to transcription calls with user-facing error | | 20-30s delay after sleep | Show "Loading model..." UI during model load (ANE recompilation is Apple's `anecompilerservice`, cannot be sped up) | | Model corruption | Use `AsrModels.isModelValid()` before transcription, prompt re-download if invalid | ---------
## Summary - `SystemInfo.isAppleSilicon` and `SystemInfo.isIntelMac` to detect platform - `AsrModels.isModelValid()` validates all 4 Parakeet components (Preprocessor, Encoder, Decoder, Joint) can load without corruption - Reuse decoder state arrays to prevent memory accumulation during streaming - Handle non-contiguous strides in copyData ### What VoiceInk should do (app side) | Issue | VoiceInk Fix | |-------|-------------| | Intel Mac users selecting Parakeet | Use `SystemInfo.isIntelMac` to hide/disable Parakeet models in UI | | Infinite "Transcribing" hang | Add timeout to transcription calls with user-facing error | | 20-30s delay after sleep | Show "Loading model..." UI during model load (ANE recompilation is Apple's `anecompilerservice`, cannot be sped up) | | Model corruption | Use `AsrModels.isModelValid()` before transcription, prompt re-download if invalid | ---------
## Summary - `SystemInfo.isAppleSilicon` and `SystemInfo.isIntelMac` to detect platform - `AsrModels.isModelValid()` validates all 4 Parakeet components (Preprocessor, Encoder, Decoder, Joint) can load without corruption - Reuse decoder state arrays to prevent memory accumulation during streaming - Handle non-contiguous strides in copyData ### What VoiceInk should do (app side) | Issue | VoiceInk Fix | |-------|-------------| | Intel Mac users selecting Parakeet | Use `SystemInfo.isIntelMac` to hide/disable Parakeet models in UI | | Infinite "Transcribing" hang | Add timeout to transcription calls with user-facing error | | 20-30s delay after sleep | Show "Loading model..." UI during model load (ANE recompilation is Apple's `anecompilerservice`, cannot be sped up) | | Model corruption | Use `AsrModels.isModelValid()` before transcription, prompt re-download if invalid | ---------
## Summary - `SystemInfo.isAppleSilicon` and `SystemInfo.isIntelMac` to detect platform - `AsrModels.isModelValid()` validates all 4 Parakeet components (Preprocessor, Encoder, Decoder, Joint) can load without corruption - Reuse decoder state arrays to prevent memory accumulation during streaming - Handle non-contiguous strides in copyData ### What VoiceInk should do (app side) | Issue | VoiceInk Fix | |-------|-------------| | Intel Mac users selecting Parakeet | Use `SystemInfo.isIntelMac` to hide/disable Parakeet models in UI | | Infinite "Transcribing" hang | Add timeout to transcription calls with user-facing error | | 20-30s delay after sleep | Show "Loading model..." UI during model load (ANE recompilation is Apple's `anecompilerservice`, cannot be sped up) | | Model corruption | Use `AsrModels.isModelValid()` before transcription, prompt re-download if invalid | ---------
## Summary - `SystemInfo.isAppleSilicon` and `SystemInfo.isIntelMac` to detect platform - `AsrModels.isModelValid()` validates all 4 Parakeet components (Preprocessor, Encoder, Decoder, Joint) can load without corruption - Reuse decoder state arrays to prevent memory accumulation during streaming - Handle non-contiguous strides in copyData ### What VoiceInk should do (app side) | Issue | VoiceInk Fix | |-------|-------------| | Intel Mac users selecting Parakeet | Use `SystemInfo.isIntelMac` to hide/disable Parakeet models in UI | | Infinite "Transcribing" hang | Add timeout to transcription calls with user-facing error | | 20-30s delay after sleep | Show "Loading model..." UI during model load (ANE recompilation is Apple's `anecompilerservice`, cannot be sped up) | | Model corruption | Use `AsrModels.isModelValid()` before transcription, prompt re-download if invalid | ---------
## Summary - `SystemInfo.isAppleSilicon` and `SystemInfo.isIntelMac` to detect platform - `AsrModels.isModelValid()` validates all 4 Parakeet components (Preprocessor, Encoder, Decoder, Joint) can load without corruption - Reuse decoder state arrays to prevent memory accumulation during streaming - Handle non-contiguous strides in copyData ### What VoiceInk should do (app side) | Issue | VoiceInk Fix | |-------|-------------| | Intel Mac users selecting Parakeet | Use `SystemInfo.isIntelMac` to hide/disable Parakeet models in UI | | Infinite "Transcribing" hang | Add timeout to transcription calls with user-facing error | | 20-30s delay after sleep | Show "Loading model..." UI during model load (ANE recompilation is Apple's `anecompilerservice`, cannot be sped up) | | Model corruption | Use `AsrModels.isModelValid()` before transcription, prompt re-download if invalid | ---------
## Summary - `SystemInfo.isAppleSilicon` and `SystemInfo.isIntelMac` to detect platform - `AsrModels.isModelValid()` validates all 4 Parakeet components (Preprocessor, Encoder, Decoder, Joint) can load without corruption - Reuse decoder state arrays to prevent memory accumulation during streaming - Handle non-contiguous strides in copyData ### What VoiceInk should do (app side) | Issue | VoiceInk Fix | |-------|-------------| | Intel Mac users selecting Parakeet | Use `SystemInfo.isIntelMac` to hide/disable Parakeet models in UI | | Infinite "Transcribing" hang | Add timeout to transcription calls with user-facing error | | 20-30s delay after sleep | Show "Loading model..." UI during model load (ANE recompilation is Apple's `anecompilerservice`, cannot be sped up) | | Model corruption | Use `AsrModels.isModelValid()` before transcription, prompt re-download if invalid | ---------
Summary
SystemInfo.isAppleSiliconandSystemInfo.isIntelMacto detect platformAsrModels.isModelValid()validates all 4 Parakeet components (Preprocessor, Encoder, Decoder, Joint) can load without corruptionWhat VoiceInk should do (app side)
SystemInfo.isIntelMacto hide/disable Parakeet models in UIanecompilerservice, cannot be sped up)AsrModels.isModelValid()before transcription, prompt re-download if invalid