Add Intel Mac detection and model validation utilities #210

Alex-Wengg · 2025-12-08T17:48:08Z

Summary

SystemInfo.isAppleSilicon and SystemInfo.isIntelMac to detect platform
AsrModels.isModelValid() validates all 4 Parakeet components (Preprocessor, Encoder, Decoder, Joint) can load without corruption
Reuse decoder state arrays to prevent memory accumulation during streaming
Handle non-contiguous strides in copyData

What VoiceInk should do (app side)

Issue	VoiceInk Fix
Intel Mac users selecting Parakeet	Use `SystemInfo.isIntelMac` to hide/disable Parakeet models in UI
Infinite "Transcribing" hang	Add timeout to transcription calls with user-facing error
20-30s delay after sleep	Show "Loading model..." UI during model load (ANE recompilation is Apple's `anecompilerservice`, cannot be sped up)
Model corruption	Use `AsrModels.isModelValid()` before transcription, prompt re-download if invalid

github-actions · 2025-12-08T17:52:44Z

VAD Benchmark Results

Performance Comparison

Dataset	Accuracy	Precision	Recall	F1-Score	RTFx	Files
MUSAN	92.0%	86.2%	100.0%	92.6%	728.9x faster	50
VOiCES	92.0%	86.2%	100.0%	92.6%	806.5x faster	50

Dataset Details

MUSAN: Music, Speech, and Noise dataset - standard VAD evaluation
VOiCES: Voices Obscured in Complex Environmental Settings - tests robustness in real-world conditions

✅: Average F1-Score above 70%

github-actions · 2025-12-08T17:57:20Z

Offline VBx Pipeline Results

Speaker Diarization Performance (VBx Batch Mode)

Optimal clustering with Hungarian algorithm for maximum accuracy

Metric	Value	Target	Status	Description
DER	14.5%	<20%	✅	Diarization Error Rate (lower is better)
RTFx	3.36x	>1.0x	✅	Real-Time Factor (higher is faster)

Offline VBx Pipeline Timing Breakdown

Time spent in each stage of batch diarization

Stage	Time (s)	%	Description
Model Download	15.506	5.0	Fetching diarization models
Model Compile	6.645	2.1	CoreML compilation
Audio Load	0.084	0.0	Loading audio file
Segmentation	32.750	10.5	VAD + speech detection
Embedding	309.125	98.9	Speaker embedding extraction
Clustering (VBx)	2.839	0.9	Hungarian algorithm + VBx clustering
Total	312.526	100	Full VBx pipeline

Speaker Diarization Research Comparison

Offline VBx achieves competitive accuracy with batch processing

Method	DER	Mode	Description
FluidAudio (Offline)	14.5%	VBx Batch	On-device CoreML with optimal clustering
FluidAudio (Streaming)	17.7%	Chunk-based	First-occurrence speaker mapping
Research baseline	18-30%	Various	Standard dataset performance

Pipeline Details:

Mode: Offline VBx with Hungarian algorithm for optimal speaker-to-cluster assignment
Segmentation: VAD-based voice activity detection
Embeddings: WeSpeaker-compatible speaker embeddings
Clustering: PowerSet with VBx refinement
Accuracy: Higher than streaming due to optimal post-hoc mapping

_{🎯 Offline VBx Test • AMI Corpus ES2004a • 1049.0s meeting audio • 344.7s processing • Test runtime: 5m 45s • 12/13/2025, 09:40 PM EST}

github-actions · 2025-12-08T17:57:42Z

Speaker Diarization Benchmark Results

Speaker Diarization Performance

Evaluating "who spoke when" detection accuracy

Metric	Value	Target	Status	Description
DER	15.1%	<30%	✅	Diarization Error Rate (lower is better)
JER	24.9%	<25%	✅	Jaccard Error Rate
RTFx	14.34x	>1.0x	✅	Real-Time Factor (higher is faster)

Diarization Pipeline Timing Breakdown

Time spent in each stage of speaker diarization

Stage	Time (s)	%	Description
Model Download	8.572	11.7	Fetching diarization models
Model Compile	3.674	5.0	CoreML compilation
Audio Load	0.101	0.1	Loading audio file
Segmentation	21.938	30.0	Detecting speech regions
Embedding	36.563	50.0	Extracting speaker voices
Clustering	14.625	20.0	Grouping same speakers
Total	73.198	100	Full pipeline

Speaker Diarization Research Comparison

Research baselines typically achieve 18-30% DER on standard datasets

Method	DER	Notes
FluidAudio	15.1%	On-device CoreML
Research baseline	18-30%	Standard dataset performance

Note: RTFx shown above is from GitHub Actions runner. On Apple Silicon with ANE:

M2 MacBook Air (2022): Runs at 150 RTFx real-time
Performance scales with Apple Neural Engine capabilities

_{🎯 Speaker Diarization Test • AMI Corpus ES2004a • 1049.0s meeting audio • 73.1s diarization time • Test runtime: 1m 55s • 12/13/2025, 09:38 PM EST}

github-actions · 2025-12-08T18:09:51Z

ASR Benchmark Results ✅

Status: All benchmarks passed

Parakeet v3 (multilingual)

Dataset	WER Avg	WER Med	RTFx	Status
test-clean	0.57%	0.00%	3.42x	✅
test-other	1.35%	0.00%	2.49x	✅

Parakeet v2 (English-optimized)

Dataset	WER Avg	WER Med	RTFx	Status
test-clean	0.40%	0.00%	3.52x	✅
test-other	1.00%	0.00%	2.43x	✅

Streaming (v3)

Metric	Value	Description
WER	0.00%	Word Error Rate in streaming mode
RTFx	0.40x	Streaming real-time factor
Avg Chunk Time	2.186s	Average time to process each chunk
Max Chunk Time	2.993s	Maximum chunk processing time
First Token	2.619s	Latency to first transcription token
Total Chunks	31	Number of chunks processed

Streaming (v2)

Metric	Value	Description
WER	0.00%	Word Error Rate in streaming mode
RTFx	0.39x	Streaming real-time factor
Avg Chunk Time	2.257s	Average time to process each chunk
Max Chunk Time	2.953s	Maximum chunk processing time
First Token	2.322s	Latency to first transcription token
Total Chunks	31	Number of chunks processed

_{Streaming tests use 5 files with 0.5s chunks to simulate real-time audio streaming}

_{25 files per dataset • Test runtime: 7m44s • 12/13/2025, 09:43 PM EST}

_{RTFx = Real-Time Factor (higher is better) • Calculated as: Total audio duration ÷ Total processing time
Processing time includes: Model inference on Apple Neural Engine, audio preprocessing, state resets between files, token-to-text conversion, and file I/O
Example: RTFx of 2.0x means 10 seconds of audio processed in 5 seconds (2x faster than real-time)}

Expected RTFx Performance on Physical M1 Hardware:

• M1 Mac: ~28x (clean), ~25x (other)
• CI shows ~0.5-3x due to virtualization limitations

_{Testing methodology follows HuggingFace Open ASR Leaderboard}

Sources/FluidAudio/ASR/AsrModelWarmup.swift

Sources/FluidAudio/DownloadUtils.swift

Sources/FluidAudio/ASR/AsrManager.swift

Changes TdtDecoderState.update() to copy data into existing arrays instead of replacing array references. Before: hiddenState = newArray (orphans old array, memory accumulates) After: hiddenState.copyData(from: newArray) (reuses same array) This prevents MLMultiArray instances from accumulating over long transcription sessions, which could cause progressive slowdown. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>

Update tests to verify that TdtDecoderState.update() reuses existing arrays and copies values into them, rather than checking for object identity with the new arrays from decoder output. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>

The previous memcpy-based copy assumed contiguous memory layout, but CoreML output arrays may have different strides than our ANE-aligned arrays. This could cause incorrect data copying and affect WER. Now checks if both arrays are contiguous before using fast memcpy, otherwise falls back to element-by-element copy that respects strides. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>

Sources/FluidAudio/Shared/SystemInfo.swift

Sources/FluidAudio/ASR/AsrModels.swift

BrandonWeng · 2025-12-13T21:54:35Z

Sources/FluidAudio/ASR/TDT/TdtDecoderState.swift

    mutating func update(from decoderOutput: MLFeatureProvider) {
-        hiddenState = decoderOutput.featureValue(for: "h_out")?.multiArrayValue ?? hiddenState
-        cellState = decoderOutput.featureValue(for: "c_out")?.multiArrayValue ?? cellState
+        // Copy data into existing arrays instead of replacing them to avoid memory leaks.


should we jus t remove the memory optimziation and see how much worse the perf is? irrc its not a huge performance

what about RTFx? latency

this should not affect WER

RTFx is about the same after running 1000 files on the main vs this branch

- Add SystemInfo.isAppleSilicon and SystemInfo.isIntelMac for architecture detection - Add AsrModels.isModelValid() to validate Parakeet models can load - Returns false on Intel Macs (no ANE support) - Validates all 4 model components (Preprocessor, Encoder, Decoder, Joint) - Uses CPU-only loading to avoid triggering ANE compilation during validation These utilities help apps guard UI for Intel Mac users and validate model integrity. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

Replace vDSP/memcpy-based implementations with simple loops: - Remove import Accelerate - Simplify resetData(to:) to use basic loop - Simplify copyData(from:) to use basic loop - Remove isContiguousLayout() helper Benchmarks show the simple implementation is ~8% faster and uses 84% less memory (179 MB vs 1.14 GB peak) than the optimized version. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

BrandonWeng

lgtm but remove json files pls

## Summary - `SystemInfo.isAppleSilicon` and `SystemInfo.isIntelMac` to detect platform - `AsrModels.isModelValid()` validates all 4 Parakeet components (Preprocessor, Encoder, Decoder, Joint) can load without corruption - Reuse decoder state arrays to prevent memory accumulation during streaming - Handle non-contiguous strides in copyData ### What VoiceInk should do (app side) | Issue | VoiceInk Fix | |-------|-------------| | Intel Mac users selecting Parakeet | Use `SystemInfo.isIntelMac` to hide/disable Parakeet models in UI | | Infinite "Transcribing" hang | Add timeout to transcription calls with user-facing error | | 20-30s delay after sleep | Show "Loading model..." UI during model load (ANE recompilation is Apple's `anecompilerservice`, cannot be sped up) | | Model corruption | Use `AsrModels.isModelValid()` before transcription, prompt re-download if invalid | ---------

This comment was marked as outdated.

Sign in to view

Alex-Wengg force-pushed the fix/stalling-issues branch from 100a538 to 48b6043 Compare December 8, 2025 17:57

BrandonWeng reviewed Dec 8, 2025

View reviewed changes

Sources/FluidAudio/ASR/AsrModelWarmup.swift Outdated Show resolved Hide resolved

BrandonWeng reviewed Dec 8, 2025

View reviewed changes

Sources/FluidAudio/ASR/AsrModelWarmup.swift Outdated Show resolved Hide resolved

BrandonWeng reviewed Dec 8, 2025

View reviewed changes

Sources/FluidAudio/DownloadUtils.swift Outdated Show resolved Hide resolved

Alex-Wengg force-pushed the fix/stalling-issues branch from 0086c05 to a5b3ddf Compare December 9, 2025 02:05

Alex-Wengg requested a review from BrandonWeng December 9, 2025 02:23

BrandonWeng reviewed Dec 9, 2025

View reviewed changes

Sources/FluidAudio/ASR/AsrManager.swift Outdated Show resolved Hide resolved

Alex-Wengg and others added 3 commits December 9, 2025 22:43

Alex-Wengg force-pushed the fix/stalling-issues branch from a5b3ddf to 4cca377 Compare December 10, 2025 22:22

Alex-Wengg changed the title ~~Fix: Add timeout support and stalling prevention mechanisms~~ Add Intel Mac detection and model validation utilities Dec 13, 2025

Alex-Wengg requested a review from BrandonWeng December 13, 2025 19:15

BrandonWeng reviewed Dec 13, 2025

View reviewed changes

Sources/FluidAudio/Shared/SystemInfo.swift Outdated Show resolved Hide resolved

BrandonWeng reviewed Dec 13, 2025

View reviewed changes

Sources/FluidAudio/ASR/AsrModels.swift Outdated Show resolved Hide resolved

BrandonWeng reviewed Dec 13, 2025

View reviewed changes

Alex-Wengg force-pushed the fix/stalling-issues branch from 87cdff9 to faeab88 Compare December 13, 2025 22:08

Address PR feedback: throw on Intel Mac, simplify decoder state

efac4d1

Alex-Wengg requested a review from BrandonWeng December 14, 2025 00:34

Alex-Wengg and others added 2 commits December 13, 2025 21:14

Address PR feedback: throw on Intel Mac, simplify decoder state

6be9a4d

BrandonWeng approved these changes Dec 14, 2025

View reviewed changes

remove useless json

8a65768

Alex-Wengg merged commit 6c352d8 into main Dec 14, 2025
9 checks passed

Alex-Wengg deleted the fix/stalling-issues branch December 14, 2025 02:47

Alex-Wengg mentioned this pull request Dec 14, 2025

feat: integrate official swift-huggingface SDK for model downloads #215

Merged

Add Intel Mac detection and model validation utilities #210

Add Intel Mac detection and model validation utilities #210

Uh oh!

Conversation

Alex-Wengg commented Dec 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What VoiceInk should do (app side)

Uh oh!

This comment was marked as outdated.

This comment was marked as outdated.

Uh oh!

github-actions bot commented Dec 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

VAD Benchmark Results

Performance Comparison

Dataset Details

Uh oh!

github-actions bot commented Dec 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Offline VBx Pipeline Results

Speaker Diarization Performance (VBx Batch Mode)

Offline VBx Pipeline Timing Breakdown

Speaker Diarization Research Comparison

Uh oh!

github-actions bot commented Dec 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Speaker Diarization Benchmark Results

Speaker Diarization Performance

Diarization Pipeline Timing Breakdown

Speaker Diarization Research Comparison

Uh oh!

github-actions bot commented Dec 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

ASR Benchmark Results ✅

Parakeet v3 (multilingual)

Parakeet v2 (English-optimized)

Streaming (v3)

Streaming (v2)

Expected RTFx Performance on Physical M1 Hardware:

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

BrandonWeng Dec 13, 2025

Choose a reason for hiding this comment

Uh oh!

BrandonWeng Dec 14, 2025

Choose a reason for hiding this comment

Uh oh!

BrandonWeng Dec 14, 2025

Choose a reason for hiding this comment

Uh oh!

Alex-Wengg Dec 14, 2025

Choose a reason for hiding this comment

Uh oh!

BrandonWeng left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Alex-Wengg commented Dec 8, 2025 •

edited

Loading

github-actions bot commented Dec 8, 2025 •

edited

Loading

github-actions bot commented Dec 8, 2025 •

edited

Loading

github-actions bot commented Dec 8, 2025 •

edited

Loading

github-actions bot commented Dec 8, 2025 •

edited

Loading