Tristan Dunn

Simple and Free Speech-to-Text on macOS

After seeing Wispr Flow mentioned a few times, I was curious about using speech-to-text to interact with Claude Code. The idea of paying $12/month and potentially sharing all my prompts wasn’t ideal, so I decided to see what Claude Code could help me build.

With a bit of experimentation, I settled on using sox for audio recording, since it seemed the best at silence detection, and parakeet-mlx for transcribing. I tried various improvements to silence detection, but found turning up my input volume helped the most.

With three commands in a script, I have a decent local speech-to-text solution:

#!/bin/bash
RECORDING_FILE="/tmp/record-recording.wav"
TRANSCRIPT_FILE="/tmp/record-recording.txt"
# Record until silence or 60 seconds.
rec -q "${RECORDING_FILE}" rate 16k pad 0.2 0 silence 1 0.05 1% 1 1.0 1% trim 0 60
# Transcribe and output.
if [ -f "${RECORDING_FILE}" ]; then
parakeet-mlx "${RECORDING_FILE}" --output-format txt --output-dir /tmp --chunk-duration 30 >/dev/null 2>&1
if [ -f "${TRANSCRIPT_FILE}" ]; then
cat "${TRANSCRIPT_FILE}"
fi
fi

The real script is a little more verbose, but this demonstrates the core of it. I use Hammerspoon to trigger the script and type the response for me. I also have it display a recording and transcribing status indicator in the menu bar.

My next step is to decide what to use for transcription on Linux so I can use it on my other machine. I might also create a second script that pipes the results through an LLM for use outside of prompts.

at 10:00 PMai, speech-to-text