Sample: /Users/mikker/dev/Tuna-worktrees/one/tmp/stt-battle/20260430-235718/recording.wav
Duration: 28.80s
Created: 2026-04-30T23:59:00
- **Overall winner:** `whisper small` - **Fastest usable model:** `whisper tiny` - **Best memory efficiency:** `whisper tiny` - **Best accuracy/transcript quality:** `mlx-community/Qwen3-ASR-1.7B-8bit` - **Best balanced default:** `whisper small` - **Avoid:** - `mlx-community/Voxtral-Mini-4B-Realtime-2602-4bit`: too slow at `11.2s` and highest memory at `3.4GB` for no meaningful quality gain. - `tiny.en`: drops the opening “Hey” and changes “things” to “something”; worse than `tiny` with similar speed/memory. - `moonshine-base` / `moonshine-tiny`: both hallucinate the key phrase as “disseminate” / “dishearten,” which is bad for command-like dictation. **Rationale:** `whisper small` is nearly as fast as `base.en` at about `1.09s`, uses a reasonable `866MB`, and gives one of the cleanest transcripts with correct “hard to discern” and “you’ll pass it through.” `Qwen3-ASR-1.7B-8bit` has the most polished transcript, but at `2.36s` and `2.68GB` it is too heavy as the default. `whisper tiny` is the best speed/memory choice at `0.57s` and `240MB`, but “hard to discerning” makes it less reliable for transcript quality.
| Status | Backend | Model | Run | time real | WER | ||
|---|---|---|---|---|---|---|---|
| ok | whisper | base.en | 1 | 1.09s | 1.08s | 353.4 MB | — |
| ok | whisper | large-v3-turbo | 1 | 1.58s | 1.56s | 1.8 GB | — |
| ok | whisper | small | 1 | 1.10s | 1.08s | 826.2 MB | — |
| ok | whisper | small.en | 1 | 1.09s | 1.08s | 810.0 MB | — |
| ok | whisper | tiny | 1 | 0.57s | 0.56s | 228.8 MB | — |
| ok | whisper | tiny.en | 1 | 0.58s | 0.57s | 230.8 MB | — |
| ok | parakeet-rs | mlx-community/parakeet-tdt-0.6b-v3 | 1 | 1.60s | 1.59s | 1.3 GB | — |
| ok | mlx-audio | UsefulSensors/moonshine-base | 1 | 2.33s | 2.31s | 406.1 MB | — |
| ok | mlx-audio | UsefulSensors/moonshine-tiny | 1 | 1.53s | 1.52s | 272.5 MB | — |
| ok | mlx-audio | distil-whisper/distil-large-v3 | 1 | 1.95s | 1.93s | 1.0 GB | — |
| ok | mlx-audio | mlx-community/Qwen3-ASR-0.6B-8bit | 1 | 1.96s | 1.95s | 1.1 GB | — |
| ok | mlx-audio | mlx-community/Qwen3-ASR-1.7B-8bit | 1 | 2.36s | 2.34s | 2.5 GB | — |
| ok | mlx-audio | mlx-community/Voxtral-Mini-4B-Realtime-2602-4bit | 1 | 11.22s | 11.19s | 3.2 GB | — |
| ok | mlx-audio | mlx-community/parakeet-tdt-0.6b-v3 | 1 | 1.74s | 1.72s | 2.4 GB | — |
Status: ok · Wall: 1.09s · RSS: 353.4 MB
Hey, so this is a longer message, and I'm trying to say some things that could be hard to discern, I guess. So I'll keep talking for a bit, and then eventually we should stop and you'll pass it through the models and we'll get a result. So maybe, yeah, that's cool.
/Users/mikker/dev/Tuna-worktrees/one/build/stt-battle-whisper/bin/whisper-cli -m /Users/mikker/Library/Application Support/Tuna/models/whisper/base-en/ggml-base.en.bin -f /Users/mikker/dev/Tuna-worktrees/one/tmp/stt-battle/20260430-235718/recording.wav -otxt -of /Users/mikker/dev/Tuna-worktrees/one/tmp/stt-battle/20260430-235718/runs/whisper/base.en/run-1/transcript -np
output_txt: saving output to '/Users/mikker/dev/Tuna-worktrees/one/tmp/stt-battle/20260430-235718/runs/whisper/base.en/run-1/transcript.txt'
real 1.08
user 0.37
sys 0.11
370606080 maximum resident set size
0 average shared memory size
0 average unshared data size
0 average unshared stack size
24755 page reclaims
11 page faults
0 swaps
0 block input operations
0 block output operations
0 messages sent
0 messages received
0 signals received
11 voluntary context switches
1792 involuntary context switches
4749303339 instructions retired
1479661392 cycles elapsed
347538344 peak memory footprint
Status: ok · Wall: 1.58s · RSS: 1.8 GB
hey so this is a longer message and i'm trying to say some things that could be um hard to discern i guess so i'll keep talking for a bit and then eventually we should stop and you'll pass it through the models and we'll get a result so maybe yeah let's go
/Users/mikker/dev/Tuna-worktrees/one/build/stt-battle-whisper/bin/whisper-cli -m /Users/mikker/Library/Application Support/Tuna/models/whisper/large-v3-turbo/ggml-large-v3-turbo.bin -f /Users/mikker/dev/Tuna-worktrees/one/tmp/stt-battle/20260430-235718/recording.wav -otxt -of /Users/mikker/dev/Tuna-worktrees/one/tmp/stt-battle/20260430-235718/runs/whisper/large-v3-turbo/run-1/transcript -np
output_txt: saving output to '/Users/mikker/dev/Tuna-worktrees/one/tmp/stt-battle/20260430-235718/runs/whisper/large-v3-turbo/run-1/transcript.txt'
real 1.56
user 0.45
sys 0.50
1973075968 maximum resident set size
0 average shared memory size
0 average unshared data size
0 average unshared stack size
122511 page reclaims
7 page faults
0 swaps
0 block input operations
0 block output operations
0 messages sent
0 messages received
0 signals received
63 voluntary context switches
1352 involuntary context switches
8593063728 instructions retired
2980688391 cycles elapsed
1950894096 peak memory footprint
Status: ok · Wall: 1.10s · RSS: 826.2 MB
Hey, so this is a longer message, and I'm trying to say some things that could be hard to discern, I guess. So I'll keep talking for a bit, and then eventually we should stop, and you'll pass it through the models, and we'll get a result. So maybe, yeah, let's go.
/Users/mikker/dev/Tuna-worktrees/one/build/stt-battle-whisper/bin/whisper-cli -m /Users/mikker/Library/Application Support/Tuna/models/whisper/small/ggml-small.bin -f /Users/mikker/dev/Tuna-worktrees/one/tmp/stt-battle/20260430-235718/recording.wav -otxt -of /Users/mikker/dev/Tuna-worktrees/one/tmp/stt-battle/20260430-235718/runs/whisper/small/run-1/transcript -np
output_txt: saving output to '/Users/mikker/dev/Tuna-worktrees/one/tmp/stt-battle/20260430-235718/runs/whisper/small/run-1/transcript.txt'
real 1.08
user 0.44
sys 0.22
866369536 maximum resident set size
0 average shared memory size
0 average unshared data size
0 average unshared stack size
55701 page reclaims
10 page faults
0 swaps
0 block input operations
0 block output operations
0 messages sent
0 messages received
0 signals received
10 voluntary context switches
1886 involuntary context switches
6225475621 instructions retired
2002615476 cycles elapsed
843449616 peak memory footprint
Status: ok · Wall: 1.09s · RSS: 810.0 MB
hey so this is a longer message and I'm trying to say some things that could be hard to discern I guess so I'll keep talking for a bit and then eventually we should stop and you'll pass it through the models and we'll get a result so maybe yeah let's go
/Users/mikker/dev/Tuna-worktrees/one/build/stt-battle-whisper/bin/whisper-cli -m /Users/mikker/Library/Application Support/Tuna/models/whisper/small-en/ggml-small.en.bin -f /Users/mikker/dev/Tuna-worktrees/one/tmp/stt-battle/20260430-235718/recording.wav -otxt -of /Users/mikker/dev/Tuna-worktrees/one/tmp/stt-battle/20260430-235718/runs/whisper/small.en/run-1/transcript -np
output_txt: saving output to '/Users/mikker/dev/Tuna-worktrees/one/tmp/stt-battle/20260430-235718/runs/whisper/small.en/run-1/transcript.txt'
real 1.08
user 0.37
sys 0.21
849297408 maximum resident set size
0 average shared memory size
0 average unshared data size
0 average unshared stack size
54389 page reclaims
7 page faults
0 swaps
0 block input operations
0 block output operations
0 messages sent
0 messages received
0 signals received
33 voluntary context switches
1344 involuntary context switches
5402821704 instructions retired
1732674548 cycles elapsed
826557664 peak memory footprint
Status: ok · Wall: 0.57s · RSS: 228.8 MB
Hey, so this is a longer message and I'm trying to say some things that could be hard to discerning I guess so I'll keep talking for a bit and then eventually we should stop and you'll pass it through the models and we'll get a result so maybe yeah, let's go.
/Users/mikker/dev/Tuna-worktrees/one/build/stt-battle-whisper/bin/whisper-cli -m /Users/mikker/Library/Application Support/Tuna/models/whisper/tiny/ggml-tiny.bin -f /Users/mikker/dev/Tuna-worktrees/one/tmp/stt-battle/20260430-235718/recording.wav -otxt -of /Users/mikker/dev/Tuna-worktrees/one/tmp/stt-battle/20260430-235718/runs/whisper/tiny/run-1/transcript -np
output_txt: saving output to '/Users/mikker/dev/Tuna-worktrees/one/tmp/stt-battle/20260430-235718/runs/whisper/tiny/run-1/transcript.txt'
real 0.56
user 0.32
sys 0.09
239910912 maximum resident set size
0 average shared memory size
0 average unshared data size
0 average unshared stack size
16316 page reclaims
7 page faults
0 swaps
0 block input operations
0 block output operations
0 messages sent
0 messages received
0 signals received
4 voluntary context switches
1861 involuntary context switches
3777757217 instructions retired
1195731908 cycles elapsed
217006944 peak memory footprint
Status: ok · Wall: 0.58s · RSS: 230.8 MB
So this is a longer message and I'm trying to say something that could be hard to decipher and I guess. So I'll keep talking for a bit and then eventually we should stop and you pass it through the models and we'll get a result. So maybe, yeah, let's go.
/Users/mikker/dev/Tuna-worktrees/one/build/stt-battle-whisper/bin/whisper-cli -m /Users/mikker/Library/Application Support/Tuna/models/whisper/tiny-en/ggml-tiny.en.bin -f /Users/mikker/dev/Tuna-worktrees/one/tmp/stt-battle/20260430-235718/recording.wav -otxt -of /Users/mikker/dev/Tuna-worktrees/one/tmp/stt-battle/20260430-235718/runs/whisper/tiny.en/run-1/transcript -np
output_txt: saving output to '/Users/mikker/dev/Tuna-worktrees/one/tmp/stt-battle/20260430-235718/runs/whisper/tiny.en/run-1/transcript.txt'
real 0.57
user 0.30
sys 0.08
242057216 maximum resident set size
0 average shared memory size
0 average unshared data size
0 average unshared stack size
16372 page reclaims
7 page faults
0 swaps
0 block input operations
0 block output operations
0 messages sent
0 messages received
0 signals received
0 voluntary context switches
1825 involuntary context switches
3629426674 instructions retired
1153533865 cycles elapsed
219169608 peak memory footprint
Status: ok · Wall: 1.60s · RSS: 1.3 GB
Hey, so this is a longer message and I'm trying to say some things that could be um hard to discern I guess. So I'll keep talking for a bit and then eventually we should stop and you pass it through the models and we'll get a result. So maybe Yeah, let's go.
/Users/mikker/dev/Tuna-worktrees/one/app/Tuna/Resources/tuna-parakeet-helper --audio /Users/mikker/dev/Tuna-worktrees/one/tmp/stt-battle/20260430-235718/recording.wav --model mlx-community/parakeet-tdt-0.6b-v3 --cache-dir /Users/mikker/Library/Application Support/Tuna/models/parakeet-rs --output-dir /Users/mikker/dev/Tuna-worktrees/one/tmp/stt-battle/20260430-235718/runs/parakeet-rs/mlx-community-parakeet-tdt-0.6b-v3/run-1 --output-format txt
real 1.59
user 5.55
sys 0.22
1422196736 maximum resident set size
0 average shared memory size
0 average unshared data size
0 average unshared stack size
87150 page reclaims
652 page faults
0 swaps
0 block input operations
0 block output operations
0 messages sent
0 messages received
0 signals received
13 voluntary context switches
2036 involuntary context switches
64738024896 instructions retired
17591916368 cycles elapsed
1408845816 peak memory footprint
Status: ok · Wall: 2.33s · RSS: 406.1 MB
Hey, so this is a longer message and I'm trying to say some things that could be hard to disseminate. So I'll keep talking for a bit and then eventually we should stop and you'll pass it through the models and we'll get a result. So maybe yeah, let's go.
/opt/homebrew/bin/uvx --from mlx-audio --prerelease allow python -m mlx_audio.stt.generate --model UsefulSensors/moonshine-base --audio /Users/mikker/dev/Tuna-worktrees/one/tmp/stt-battle/20260430-235718/recording.wav --output-path /Users/mikker/dev/Tuna-worktrees/one/tmp/stt-battle/20260430-235718/runs/mlx-audio/UsefulSensors-moonshine-base/run-1/transcript --format json
Fetching 5 files: 0%| | 0/5 [00:00<?, ?it/s]
Fetching 5 files: 100%|██████████| 5/5 [00:00<00:00, 13148.29it/s]
real 2.31
user 0.99
sys 0.49
425869312 maximum resident set size
0 average shared memory size
0 average unshared data size
0 average unshared stack size
33613 page reclaims
1641 page faults
0 swaps
0 block input operations
0 block output operations
80 messages sent
104 messages received
1 signals received
3588 voluntary context switches
4923 involuntary context switches
1267310049 instructions retired
450649042 cycles elapsed
73564808 peak memory footprint
Status: ok · Wall: 1.53s · RSS: 272.5 MB
Hey, so this is a longer message and I'm trying to say some things that could be hard to dishearten, I guess. So I'll keep talking for a bit and then eventually we should stop and you pass it through the models and we'll get a result. So maybe, yeah, let's go.
/opt/homebrew/bin/uvx --from mlx-audio --prerelease allow python -m mlx_audio.stt.generate --model UsefulSensors/moonshine-tiny --audio /Users/mikker/dev/Tuna-worktrees/one/tmp/stt-battle/20260430-235718/recording.wav --output-path /Users/mikker/dev/Tuna-worktrees/one/tmp/stt-battle/20260430-235718/runs/mlx-audio/UsefulSensors-moonshine-tiny/run-1/transcript --format json
Fetching 5 files: 0%| | 0/5 [00:00<?, ?it/s]
Fetching 5 files: 100%|██████████| 5/5 [00:00<00:00, 10402.54it/s]
real 1.52
user 0.88
sys 0.31
285720576 maximum resident set size
0 average shared memory size
0 average unshared data size
0 average unshared stack size
26131 page reclaims
4 page faults
0 swaps
0 block input operations
0 block output operations
23 messages sent
58 messages received
1 signals received
175 voluntary context switches
2835 involuntary context switches
646004177 instructions retired
224773606 cycles elapsed
67715672 peak memory footprint
Status: ok · Wall: 1.95s · RSS: 1.0 GB
Hey, so this is a longer message and I'm trying to say some things that could be um hard to discern I guess so I'll keep talking for a bit and then eventually we should stop and you'll pass it through the models and we'll get a result so maybe yeah let's go
/opt/homebrew/bin/uvx --from mlx-audio --prerelease allow python -m mlx_audio.stt.generate --model distil-whisper/distil-large-v3 --audio /Users/mikker/dev/Tuna-worktrees/one/tmp/stt-battle/20260430-235718/recording.wav --output-path /Users/mikker/dev/Tuna-worktrees/one/tmp/stt-battle/20260430-235718/runs/mlx-audio/distil-whisper-distil-large-v3/run-1/transcript --format json
Fetching 13 files: 0%| | 0/13 [00:00<?, ?it/s]
Fetching 13 files: 100%|██████████| 13/13 [00:00<00:00, 27664.11it/s]
0%| | 0/2880 [00:00<?, ?frames/s][transformers] Ignoring clean_up_tokenization_spaces=True for BPE tokenizer WhisperTokenizer. The clean_up_tokenization post-processing step is designed for WordPiece tokenizers and is destructive for BPE (it strips spaces before punctuation). Set clean_up_tokenization_spaces=False to suppress this warning, or set clean_up_tokenization_spaces_for_bpe_even_though_it_will_corrupt_output=True to force cleanup anyway.
100%|██████████| 2880/2880 [00:00<00:00, 6059.79frames/s]
100%|██████████| 2880/2880 [00:00<00:00, 6057.19frames/s]
real 1.93
user 0.91
sys 1.05
1115095040 maximum resident set size
0 average shared memory size
0 average unshared data size
0 average unshared stack size
80511 page reclaims
29 page faults
0 swaps
0 block input operations
0 block output operations
23 messages sent
61 messages received
1 signals received
402 voluntary context switches
4210 involuntary context switches
635453865 instructions retired
207129567 cycles elapsed
60473920 peak memory footprint
Status: ok · Wall: 1.96s · RSS: 1.1 GB
Hey, so this is a longer message, and I'm trying to say some things that could be. Um, hard to discern, I guess. So, I'll keep talking for a bit, and then eventually, we should stop, and you'll pass it through the models, and we'll get a result. So. Maybe. Yeah, let's go.
/opt/homebrew/bin/uvx --from mlx-audio --prerelease allow python -m mlx_audio.stt.generate --model mlx-community/Qwen3-ASR-0.6B-8bit --audio /Users/mikker/dev/Tuna-worktrees/one/tmp/stt-battle/20260430-235718/recording.wav --output-path /Users/mikker/dev/Tuna-worktrees/one/tmp/stt-battle/20260430-235718/runs/mlx-audio/mlx-community-Qwen3-ASR-0.6B-8bit/run-1/transcript --format json
Fetching 9 files: 0%| | 0/9 [00:00<?, ?it/s]
Fetching 9 files: 100%|██████████| 9/9 [00:00<00:00, 27858.85it/s]
real 1.95
user 1.12
sys 0.69
1220739072 maximum resident set size
0 average shared memory size
0 average unshared data size
0 average unshared stack size
87227 page reclaims
15 page faults
0 swaps
0 block input operations
0 block output operations
23 messages sent
61 messages received
1 signals received
878 voluntary context switches
7631 involuntary context switches
642267219 instructions retired
212799152 cycles elapsed
58950208 peak memory footprint
Status: ok · Wall: 2.36s · RSS: 2.5 GB
Hey, so this is a longer message, and I'm trying to say some things that could be hard to discern, I guess. So I'll keep talking for a bit, and then eventually we should stop, and you'll pass it through the models, and we'll get a result. So maybe, yeah, let's go.
/opt/homebrew/bin/uvx --from mlx-audio --prerelease allow python -m mlx_audio.stt.generate --model mlx-community/Qwen3-ASR-1.7B-8bit --audio /Users/mikker/dev/Tuna-worktrees/one/tmp/stt-battle/20260430-235718/recording.wav --output-path /Users/mikker/dev/Tuna-worktrees/one/tmp/stt-battle/20260430-235718/runs/mlx-audio/mlx-community-Qwen3-ASR-1.7B-8bit/run-1/transcript --format json
Fetching 9 files: 0%| | 0/9 [00:00<?, ?it/s]
Fetching 9 files: 100%|██████████| 9/9 [00:00<00:00, 21290.88it/s]
real 2.34
user 1.21
sys 1.05
2679914496 maximum resident set size
0 average shared memory size
0 average unshared data size
0 average unshared stack size
176746 page reclaims
0 page faults
0 swaps
0 block input operations
0 block output operations
23 messages sent
59 messages received
1 signals received
142 voluntary context switches
6040 involuntary context switches
634627631 instructions retired
210676088 cycles elapsed
63259200 peak memory footprint
Status: ok · Wall: 11.22s · RSS: 3.2 GB
Hey, so this is a longer message and I'm trying to say some things that could be hard to discern, I guess. So I'll keep talking for a bit and then eventually we should stop and you'll... Pass it through the models and we'll get a result. So maybe yeah let's go
/opt/homebrew/bin/uvx --from mlx-audio --prerelease allow python -m mlx_audio.stt.generate --model mlx-community/Voxtral-Mini-4B-Realtime-2602-4bit --audio /Users/mikker/dev/Tuna-worktrees/one/tmp/stt-battle/20260430-235718/recording.wav --output-path /Users/mikker/dev/Tuna-worktrees/one/tmp/stt-battle/20260430-235718/runs/mlx-audio/mlx-community-Voxtral-Mini-4B-Realtime-2602-4bit/run-1/transcript --format json
Fetching 4 files: 0%| | 0/4 [00:00<?, ?it/s]
Fetching 4 files: 100%|██████████| 4/4 [00:00<00:00, 22580.37it/s]
real 11.19
user 2.89
sys 2.12
3399991296 maximum resident set size
0 average shared memory size
0 average unshared data size
0 average unshared stack size
214489 page reclaims
9 page faults
0 swaps
0 block input operations
0 block output operations
23 messages sent
61 messages received
1 signals received
1613 voluntary context switches
38175 involuntary context switches
635609047 instructions retired
207822264 cycles elapsed
62538304 peak memory footprint
Status: ok · Wall: 1.74s · RSS: 2.4 GB
Hey, so this is a longer message and I'm trying to say some things that could be um hard to discern I guess. So I'll keep talking for a bit and then eventually we should stop and you'll pass it through the models and we'll get a result. So maybe yeah, let's go.
/opt/homebrew/bin/uvx --from mlx-audio --prerelease allow python -m mlx_audio.stt.generate --model mlx-community/parakeet-tdt-0.6b-v3 --audio /Users/mikker/dev/Tuna-worktrees/one/tmp/stt-battle/20260430-235718/recording.wav --output-path /Users/mikker/dev/Tuna-worktrees/one/tmp/stt-battle/20260430-235718/runs/mlx-audio/mlx-community-parakeet-tdt-0.6b-v3/run-1/transcript --format json
Fetching 4 files: 0%| | 0/4 [00:00<?, ?it/s]
Fetching 4 files: 100%|██████████| 4/4 [00:00<00:00, 5036.69it/s]
real 1.72
user 0.76
sys 0.84
2625142784 maximum resident set size
0 average shared memory size
0 average unshared data size
0 average unshared stack size
167192 page reclaims
6 page faults
0 swaps
0 block input operations
0 block output operations
23 messages sent
59 messages received
1 signals received
481 voluntary context switches
4153 involuntary context switches
641281023 instructions retired
224068206 cycles elapsed
63717952 peak memory footprint