Tuna STT Battle Report

Sample: /Users/mikker/dev/Tuna-worktrees/one/tmp/stt-battle/20260430-235718/recording.wav
Duration: 28.80s
Created: 2026-04-30T23:59:00

Agent Review

- **Overall winner:** `whisper small`
- **Fastest usable model:** `whisper tiny`
- **Best memory efficiency:** `whisper tiny`
- **Best accuracy/transcript quality:** `mlx-community/Qwen3-ASR-1.7B-8bit`
- **Best balanced default:** `whisper small`
- **Avoid:**
  - `mlx-community/Voxtral-Mini-4B-Realtime-2602-4bit`: too slow at `11.2s` and highest memory at `3.4GB` for no meaningful quality gain.
  - `tiny.en`: drops the opening “Hey” and changes “things” to “something”; worse than `tiny` with similar speed/memory.
  - `moonshine-base` / `moonshine-tiny`: both hallucinate the key phrase as “disseminate” / “dishearten,” which is bad for command-like dictation.

**Rationale:** `whisper small` is nearly as fast as `base.en` at about `1.09s`, uses a reasonable `866MB`, and gives one of the cleanest transcripts with correct “hard to discern” and “you’ll pass it through.” `Qwen3-ASR-1.7B-8bit` has the most polished transcript, but at `2.36s` and `2.68GB` it is too heavy as the default. `whisper tiny` is the best speed/memory choice at `0.57s` and `240MB`, but “hard to discerning” makes it less reliable for transcript quality.
StatusBackendModelRuntime realWER
okwhisperbase.en11.09s1.08s353.4 MB
okwhisperlarge-v3-turbo11.58s1.56s1.8 GB
okwhispersmall11.10s1.08s826.2 MB
okwhispersmall.en11.09s1.08s810.0 MB
okwhispertiny10.57s0.56s228.8 MB
okwhispertiny.en10.58s0.57s230.8 MB
okparakeet-rsmlx-community/parakeet-tdt-0.6b-v311.60s1.59s1.3 GB
okmlx-audioUsefulSensors/moonshine-base12.33s2.31s406.1 MB
okmlx-audioUsefulSensors/moonshine-tiny11.53s1.52s272.5 MB
okmlx-audiodistil-whisper/distil-large-v311.95s1.93s1.0 GB
okmlx-audiomlx-community/Qwen3-ASR-0.6B-8bit11.96s1.95s1.1 GB
okmlx-audiomlx-community/Qwen3-ASR-1.7B-8bit12.36s2.34s2.5 GB
okmlx-audiomlx-community/Voxtral-Mini-4B-Realtime-2602-4bit111.22s11.19s3.2 GB
okmlx-audiomlx-community/parakeet-tdt-0.6b-v311.74s1.72s2.4 GB

whisper: base.en run 1

Status: ok · Wall: 1.09s · RSS: 353.4 MB

Hey, so this is a longer message, and I'm trying to say some things that could be hard
 to discern, I guess.
 So I'll keep talking for a bit, and then eventually we should stop and you'll pass it
 through the models and we'll get a result.
 So maybe, yeah, that's cool.
Command / logs
/Users/mikker/dev/Tuna-worktrees/one/build/stt-battle-whisper/bin/whisper-cli -m /Users/mikker/Library/Application Support/Tuna/models/whisper/base-en/ggml-base.en.bin -f /Users/mikker/dev/Tuna-worktrees/one/tmp/stt-battle/20260430-235718/recording.wav -otxt -of /Users/mikker/dev/Tuna-worktrees/one/tmp/stt-battle/20260430-235718/runs/whisper/base.en/run-1/transcript -np

output_txt: saving output to '/Users/mikker/dev/Tuna-worktrees/one/tmp/stt-battle/20260430-235718/runs/whisper/base.en/run-1/transcript.txt'
real 1.08
user 0.37
sys 0.11
           370606080  maximum resident set size
                   0  average shared memory size
                   0  average unshared data size
                   0  average unshared stack size
               24755  page reclaims
                  11  page faults
                   0  swaps
                   0  block input operations
                   0  block output operations
                   0  messages sent
                   0  messages received
                   0  signals received
                  11  voluntary context switches
                1792  involuntary context switches
          4749303339  instructions retired
          1479661392  cycles elapsed
           347538344  peak memory footprint

whisper: large-v3-turbo run 1

Status: ok · Wall: 1.58s · RSS: 1.8 GB

hey so this is a longer message and i'm trying to say some things that could be
 um hard to discern i guess so i'll keep talking for a bit and then eventually
 we should stop and you'll pass it through the models and we'll get a result so
 maybe yeah let's go
Command / logs
/Users/mikker/dev/Tuna-worktrees/one/build/stt-battle-whisper/bin/whisper-cli -m /Users/mikker/Library/Application Support/Tuna/models/whisper/large-v3-turbo/ggml-large-v3-turbo.bin -f /Users/mikker/dev/Tuna-worktrees/one/tmp/stt-battle/20260430-235718/recording.wav -otxt -of /Users/mikker/dev/Tuna-worktrees/one/tmp/stt-battle/20260430-235718/runs/whisper/large-v3-turbo/run-1/transcript -np

output_txt: saving output to '/Users/mikker/dev/Tuna-worktrees/one/tmp/stt-battle/20260430-235718/runs/whisper/large-v3-turbo/run-1/transcript.txt'
real 1.56
user 0.45
sys 0.50
          1973075968  maximum resident set size
                   0  average shared memory size
                   0  average unshared data size
                   0  average unshared stack size
              122511  page reclaims
                   7  page faults
                   0  swaps
                   0  block input operations
                   0  block output operations
                   0  messages sent
                   0  messages received
                   0  signals received
                  63  voluntary context switches
                1352  involuntary context switches
          8593063728  instructions retired
          2980688391  cycles elapsed
          1950894096  peak memory footprint

whisper: small run 1

Status: ok · Wall: 1.10s · RSS: 826.2 MB

Hey, so this is a longer message, and I'm trying to say some things that could be hard to discern, I guess.
 So I'll keep talking for a bit, and then eventually we should stop, and you'll pass it through the models, and we'll get a result.
 So maybe, yeah, let's go.
Command / logs
/Users/mikker/dev/Tuna-worktrees/one/build/stt-battle-whisper/bin/whisper-cli -m /Users/mikker/Library/Application Support/Tuna/models/whisper/small/ggml-small.bin -f /Users/mikker/dev/Tuna-worktrees/one/tmp/stt-battle/20260430-235718/recording.wav -otxt -of /Users/mikker/dev/Tuna-worktrees/one/tmp/stt-battle/20260430-235718/runs/whisper/small/run-1/transcript -np

output_txt: saving output to '/Users/mikker/dev/Tuna-worktrees/one/tmp/stt-battle/20260430-235718/runs/whisper/small/run-1/transcript.txt'
real 1.08
user 0.44
sys 0.22
           866369536  maximum resident set size
                   0  average shared memory size
                   0  average unshared data size
                   0  average unshared stack size
               55701  page reclaims
                  10  page faults
                   0  swaps
                   0  block input operations
                   0  block output operations
                   0  messages sent
                   0  messages received
                   0  signals received
                  10  voluntary context switches
                1886  involuntary context switches
          6225475621  instructions retired
          2002615476  cycles elapsed
           843449616  peak memory footprint

whisper: small.en run 1

Status: ok · Wall: 1.09s · RSS: 810.0 MB

hey so this is a longer message and I'm trying to say some things that could be
 hard to discern I guess so I'll keep talking for a bit and then eventually we
 should stop and you'll pass it through the models and we'll get a result so
 maybe yeah let's go
Command / logs
/Users/mikker/dev/Tuna-worktrees/one/build/stt-battle-whisper/bin/whisper-cli -m /Users/mikker/Library/Application Support/Tuna/models/whisper/small-en/ggml-small.en.bin -f /Users/mikker/dev/Tuna-worktrees/one/tmp/stt-battle/20260430-235718/recording.wav -otxt -of /Users/mikker/dev/Tuna-worktrees/one/tmp/stt-battle/20260430-235718/runs/whisper/small.en/run-1/transcript -np

output_txt: saving output to '/Users/mikker/dev/Tuna-worktrees/one/tmp/stt-battle/20260430-235718/runs/whisper/small.en/run-1/transcript.txt'
real 1.08
user 0.37
sys 0.21
           849297408  maximum resident set size
                   0  average shared memory size
                   0  average unshared data size
                   0  average unshared stack size
               54389  page reclaims
                   7  page faults
                   0  swaps
                   0  block input operations
                   0  block output operations
                   0  messages sent
                   0  messages received
                   0  signals received
                  33  voluntary context switches
                1344  involuntary context switches
          5402821704  instructions retired
          1732674548  cycles elapsed
           826557664  peak memory footprint

whisper: tiny run 1

Status: ok · Wall: 0.57s · RSS: 228.8 MB

Hey, so this is a longer message and I'm trying to say some things that could be hard to discerning
 I guess so I'll keep talking for a bit and then eventually we should stop and you'll pass
 it through the models and we'll get a result so maybe yeah, let's go.
Command / logs
/Users/mikker/dev/Tuna-worktrees/one/build/stt-battle-whisper/bin/whisper-cli -m /Users/mikker/Library/Application Support/Tuna/models/whisper/tiny/ggml-tiny.bin -f /Users/mikker/dev/Tuna-worktrees/one/tmp/stt-battle/20260430-235718/recording.wav -otxt -of /Users/mikker/dev/Tuna-worktrees/one/tmp/stt-battle/20260430-235718/runs/whisper/tiny/run-1/transcript -np

output_txt: saving output to '/Users/mikker/dev/Tuna-worktrees/one/tmp/stt-battle/20260430-235718/runs/whisper/tiny/run-1/transcript.txt'
real 0.56
user 0.32
sys 0.09
           239910912  maximum resident set size
                   0  average shared memory size
                   0  average unshared data size
                   0  average unshared stack size
               16316  page reclaims
                   7  page faults
                   0  swaps
                   0  block input operations
                   0  block output operations
                   0  messages sent
                   0  messages received
                   0  signals received
                   4  voluntary context switches
                1861  involuntary context switches
          3777757217  instructions retired
          1195731908  cycles elapsed
           217006944  peak memory footprint

whisper: tiny.en run 1

Status: ok · Wall: 0.58s · RSS: 230.8 MB

So this is a longer message and I'm trying to say something that could be hard to
 decipher and I guess. So I'll keep talking for a bit and then eventually we should
 stop and you pass it through the models and we'll get a result. So maybe, yeah, let's go.
Command / logs
/Users/mikker/dev/Tuna-worktrees/one/build/stt-battle-whisper/bin/whisper-cli -m /Users/mikker/Library/Application Support/Tuna/models/whisper/tiny-en/ggml-tiny.en.bin -f /Users/mikker/dev/Tuna-worktrees/one/tmp/stt-battle/20260430-235718/recording.wav -otxt -of /Users/mikker/dev/Tuna-worktrees/one/tmp/stt-battle/20260430-235718/runs/whisper/tiny.en/run-1/transcript -np

output_txt: saving output to '/Users/mikker/dev/Tuna-worktrees/one/tmp/stt-battle/20260430-235718/runs/whisper/tiny.en/run-1/transcript.txt'
real 0.57
user 0.30
sys 0.08
           242057216  maximum resident set size
                   0  average shared memory size
                   0  average unshared data size
                   0  average unshared stack size
               16372  page reclaims
                   7  page faults
                   0  swaps
                   0  block input operations
                   0  block output operations
                   0  messages sent
                   0  messages received
                   0  signals received
                   0  voluntary context switches
                1825  involuntary context switches
          3629426674  instructions retired
          1153533865  cycles elapsed
           219169608  peak memory footprint

parakeet-rs: mlx-community/parakeet-tdt-0.6b-v3 run 1

Status: ok · Wall: 1.60s · RSS: 1.3 GB

Hey, so this is a longer message and I'm trying to say some things that could be um hard to discern I guess. So I'll keep talking for a bit and then eventually we should stop and you pass it through the models and we'll get a result. So maybe Yeah, let's go.
Command / logs
/Users/mikker/dev/Tuna-worktrees/one/app/Tuna/Resources/tuna-parakeet-helper --audio /Users/mikker/dev/Tuna-worktrees/one/tmp/stt-battle/20260430-235718/recording.wav --model mlx-community/parakeet-tdt-0.6b-v3 --cache-dir /Users/mikker/Library/Application Support/Tuna/models/parakeet-rs --output-dir /Users/mikker/dev/Tuna-worktrees/one/tmp/stt-battle/20260430-235718/runs/parakeet-rs/mlx-community-parakeet-tdt-0.6b-v3/run-1 --output-format txt

real 1.59
user 5.55
sys 0.22
          1422196736  maximum resident set size
                   0  average shared memory size
                   0  average unshared data size
                   0  average unshared stack size
               87150  page reclaims
                 652  page faults
                   0  swaps
                   0  block input operations
                   0  block output operations
                   0  messages sent
                   0  messages received
                   0  signals received
                  13  voluntary context switches
                2036  involuntary context switches
         64738024896  instructions retired
         17591916368  cycles elapsed
          1408845816  peak memory footprint

mlx-audio: UsefulSensors/moonshine-base run 1

Status: ok · Wall: 2.33s · RSS: 406.1 MB

Hey, so this is a longer message and I'm trying to say some things that could be hard to disseminate. So I'll keep talking for a bit and then eventually we should stop and you'll pass it through the models and we'll get a result. So maybe yeah, let's go.
Command / logs
/opt/homebrew/bin/uvx --from mlx-audio --prerelease allow python -m mlx_audio.stt.generate --model UsefulSensors/moonshine-base --audio /Users/mikker/dev/Tuna-worktrees/one/tmp/stt-battle/20260430-235718/recording.wav --output-path /Users/mikker/dev/Tuna-worktrees/one/tmp/stt-battle/20260430-235718/runs/mlx-audio/UsefulSensors-moonshine-base/run-1/transcript --format json


Fetching 5 files:   0%|          | 0/5 [00:00<?, ?it/s]
Fetching 5 files: 100%|██████████| 5/5 [00:00<00:00, 13148.29it/s]
real 2.31
user 0.99
sys 0.49
           425869312  maximum resident set size
                   0  average shared memory size
                   0  average unshared data size
                   0  average unshared stack size
               33613  page reclaims
                1641  page faults
                   0  swaps
                   0  block input operations
                   0  block output operations
                  80  messages sent
                 104  messages received
                   1  signals received
                3588  voluntary context switches
                4923  involuntary context switches
          1267310049  instructions retired
           450649042  cycles elapsed
            73564808  peak memory footprint

mlx-audio: UsefulSensors/moonshine-tiny run 1

Status: ok · Wall: 1.53s · RSS: 272.5 MB

Hey, so this is a longer message and I'm trying to say some things that could be hard to dishearten, I guess. So I'll keep talking for a bit and then eventually we should stop and you pass it through the models and we'll get a result. So maybe, yeah, let's go.
Command / logs
/opt/homebrew/bin/uvx --from mlx-audio --prerelease allow python -m mlx_audio.stt.generate --model UsefulSensors/moonshine-tiny --audio /Users/mikker/dev/Tuna-worktrees/one/tmp/stt-battle/20260430-235718/recording.wav --output-path /Users/mikker/dev/Tuna-worktrees/one/tmp/stt-battle/20260430-235718/runs/mlx-audio/UsefulSensors-moonshine-tiny/run-1/transcript --format json


Fetching 5 files:   0%|          | 0/5 [00:00<?, ?it/s]
Fetching 5 files: 100%|██████████| 5/5 [00:00<00:00, 10402.54it/s]
real 1.52
user 0.88
sys 0.31
           285720576  maximum resident set size
                   0  average shared memory size
                   0  average unshared data size
                   0  average unshared stack size
               26131  page reclaims
                   4  page faults
                   0  swaps
                   0  block input operations
                   0  block output operations
                  23  messages sent
                  58  messages received
                   1  signals received
                 175  voluntary context switches
                2835  involuntary context switches
           646004177  instructions retired
           224773606  cycles elapsed
            67715672  peak memory footprint

mlx-audio: distil-whisper/distil-large-v3 run 1

Status: ok · Wall: 1.95s · RSS: 1.0 GB

Hey, so this is a longer message and I'm trying to say some things that could be um hard to discern I guess so I'll keep talking for a bit and then eventually we should stop and you'll pass it through the models and we'll get a result so maybe yeah let's go
Command / logs
/opt/homebrew/bin/uvx --from mlx-audio --prerelease allow python -m mlx_audio.stt.generate --model distil-whisper/distil-large-v3 --audio /Users/mikker/dev/Tuna-worktrees/one/tmp/stt-battle/20260430-235718/recording.wav --output-path /Users/mikker/dev/Tuna-worktrees/one/tmp/stt-battle/20260430-235718/runs/mlx-audio/distil-whisper-distil-large-v3/run-1/transcript --format json


Fetching 13 files:   0%|          | 0/13 [00:00<?, ?it/s]
Fetching 13 files: 100%|██████████| 13/13 [00:00<00:00, 27664.11it/s]

  0%|          | 0/2880 [00:00<?, ?frames/s][transformers] Ignoring clean_up_tokenization_spaces=True for BPE tokenizer WhisperTokenizer. The clean_up_tokenization post-processing step is designed for WordPiece tokenizers and is destructive for BPE (it strips spaces before punctuation). Set clean_up_tokenization_spaces=False to suppress this warning, or set clean_up_tokenization_spaces_for_bpe_even_though_it_will_corrupt_output=True to force cleanup anyway.

100%|██████████| 2880/2880 [00:00<00:00, 6059.79frames/s]
100%|██████████| 2880/2880 [00:00<00:00, 6057.19frames/s]
real 1.93
user 0.91
sys 1.05
          1115095040  maximum resident set size
                   0  average shared memory size
                   0  average unshared data size
                   0  average unshared stack size
               80511  page reclaims
                  29  page faults
                   0  swaps
                   0  block input operations
                   0  block output operations
                  23  messages sent
                  61  messages received
                   1  signals received
                 402  voluntary context switches
                4210  involuntary context switches
           635453865  instructions retired
           207129567  cycles elapsed
            60473920  peak memory footprint

mlx-audio: mlx-community/Qwen3-ASR-0.6B-8bit run 1

Status: ok · Wall: 1.96s · RSS: 1.1 GB

Hey, so this is a longer message, and I'm trying to say some things that could be. Um, hard to discern, I guess. So, I'll keep talking for a bit, and then eventually, we should stop, and you'll pass it through the models, and we'll get a result. So. Maybe. Yeah, let's go.
Command / logs
/opt/homebrew/bin/uvx --from mlx-audio --prerelease allow python -m mlx_audio.stt.generate --model mlx-community/Qwen3-ASR-0.6B-8bit --audio /Users/mikker/dev/Tuna-worktrees/one/tmp/stt-battle/20260430-235718/recording.wav --output-path /Users/mikker/dev/Tuna-worktrees/one/tmp/stt-battle/20260430-235718/runs/mlx-audio/mlx-community-Qwen3-ASR-0.6B-8bit/run-1/transcript --format json


Fetching 9 files:   0%|          | 0/9 [00:00<?, ?it/s]
Fetching 9 files: 100%|██████████| 9/9 [00:00<00:00, 27858.85it/s]
real 1.95
user 1.12
sys 0.69
          1220739072  maximum resident set size
                   0  average shared memory size
                   0  average unshared data size
                   0  average unshared stack size
               87227  page reclaims
                  15  page faults
                   0  swaps
                   0  block input operations
                   0  block output operations
                  23  messages sent
                  61  messages received
                   1  signals received
                 878  voluntary context switches
                7631  involuntary context switches
           642267219  instructions retired
           212799152  cycles elapsed
            58950208  peak memory footprint

mlx-audio: mlx-community/Qwen3-ASR-1.7B-8bit run 1

Status: ok · Wall: 2.36s · RSS: 2.5 GB

Hey, so this is a longer message, and I'm trying to say some things that could be hard to discern, I guess. So I'll keep talking for a bit, and then eventually we should stop, and you'll pass it through the models, and we'll get a result. So maybe, yeah, let's go.
Command / logs
/opt/homebrew/bin/uvx --from mlx-audio --prerelease allow python -m mlx_audio.stt.generate --model mlx-community/Qwen3-ASR-1.7B-8bit --audio /Users/mikker/dev/Tuna-worktrees/one/tmp/stt-battle/20260430-235718/recording.wav --output-path /Users/mikker/dev/Tuna-worktrees/one/tmp/stt-battle/20260430-235718/runs/mlx-audio/mlx-community-Qwen3-ASR-1.7B-8bit/run-1/transcript --format json


Fetching 9 files:   0%|          | 0/9 [00:00<?, ?it/s]
Fetching 9 files: 100%|██████████| 9/9 [00:00<00:00, 21290.88it/s]
real 2.34
user 1.21
sys 1.05
          2679914496  maximum resident set size
                   0  average shared memory size
                   0  average unshared data size
                   0  average unshared stack size
              176746  page reclaims
                   0  page faults
                   0  swaps
                   0  block input operations
                   0  block output operations
                  23  messages sent
                  59  messages received
                   1  signals received
                 142  voluntary context switches
                6040  involuntary context switches
           634627631  instructions retired
           210676088  cycles elapsed
            63259200  peak memory footprint

mlx-audio: mlx-community/Voxtral-Mini-4B-Realtime-2602-4bit run 1

Status: ok · Wall: 11.22s · RSS: 3.2 GB

Hey, so this is a longer message and I'm trying to say some things that could be hard to discern, I guess. So I'll keep talking for a bit and then eventually we should stop and you'll... Pass it through the models and we'll get a result. So maybe yeah let's go
Command / logs
/opt/homebrew/bin/uvx --from mlx-audio --prerelease allow python -m mlx_audio.stt.generate --model mlx-community/Voxtral-Mini-4B-Realtime-2602-4bit --audio /Users/mikker/dev/Tuna-worktrees/one/tmp/stt-battle/20260430-235718/recording.wav --output-path /Users/mikker/dev/Tuna-worktrees/one/tmp/stt-battle/20260430-235718/runs/mlx-audio/mlx-community-Voxtral-Mini-4B-Realtime-2602-4bit/run-1/transcript --format json


Fetching 4 files:   0%|          | 0/4 [00:00<?, ?it/s]
Fetching 4 files: 100%|██████████| 4/4 [00:00<00:00, 22580.37it/s]
real 11.19
user 2.89
sys 2.12
          3399991296  maximum resident set size
                   0  average shared memory size
                   0  average unshared data size
                   0  average unshared stack size
              214489  page reclaims
                   9  page faults
                   0  swaps
                   0  block input operations
                   0  block output operations
                  23  messages sent
                  61  messages received
                   1  signals received
                1613  voluntary context switches
               38175  involuntary context switches
           635609047  instructions retired
           207822264  cycles elapsed
            62538304  peak memory footprint

mlx-audio: mlx-community/parakeet-tdt-0.6b-v3 run 1

Status: ok · Wall: 1.74s · RSS: 2.4 GB

Hey, so this is a longer message and I'm trying to say some things that could be um hard to discern I guess. So I'll keep talking for a bit and then eventually we should stop and you'll pass it through the models and we'll get a result. So maybe yeah, let's go.
Command / logs
/opt/homebrew/bin/uvx --from mlx-audio --prerelease allow python -m mlx_audio.stt.generate --model mlx-community/parakeet-tdt-0.6b-v3 --audio /Users/mikker/dev/Tuna-worktrees/one/tmp/stt-battle/20260430-235718/recording.wav --output-path /Users/mikker/dev/Tuna-worktrees/one/tmp/stt-battle/20260430-235718/runs/mlx-audio/mlx-community-parakeet-tdt-0.6b-v3/run-1/transcript --format json


Fetching 4 files:   0%|          | 0/4 [00:00<?, ?it/s]
Fetching 4 files: 100%|██████████| 4/4 [00:00<00:00, 5036.69it/s]
real 1.72
user 0.76
sys 0.84
          2625142784  maximum resident set size
                   0  average shared memory size
                   0  average unshared data size
                   0  average unshared stack size
              167192  page reclaims
                   6  page faults
                   0  swaps
                   0  block input operations
                   0  block output operations
                  23  messages sent
                  59  messages received
                   1  signals received
                 481  voluntary context switches
                4153  involuntary context switches
           641281023  instructions retired
           224068206  cycles elapsed
            63717952  peak memory footprint