After seeing Wispr Flow mentioned a few times, I was curious about using speech-to-text to interact with Claude Code. The idea of paying $12/month and potentially sharing all my prompts wasn’t ideal, so I decided to see what Claude Code could help me build.
With a bit of experimentation, I settled on using sox for audio recording, since it seemed the best at silence detection, and parakeet-mlx for transcribing. I tried various improvements to silence detection, but found turning up my input volume helped the most.
With three commands in a script, I have a decent local speech-to-text solution:
#!/bin/bash
RECORDING_FILE="/tmp/record-recording.wav"TRANSCRIPT_FILE="/tmp/record-recording.txt"
# Record until silence or 60 seconds.rec -q "${RECORDING_FILE}" rate 16k pad 0.2 0 silence 1 0.05 1% 1 1.0 1% trim 0 60
# Transcribe and output.if [ -f "${RECORDING_FILE}" ]; then parakeet-mlx "${RECORDING_FILE}" --output-format txt --output-dir /tmp --chunk-duration 30 >/dev/null 2>&1
if [ -f "${TRANSCRIPT_FILE}" ]; then cat "${TRANSCRIPT_FILE}" fifiThe real script is a little more verbose, but this demonstrates the core of it. I use Hammerspoon to trigger the script and type the response for me. I also have it display a recording and transcribing status indicator in the menu bar.
My next step is to decide what to use for transcription on Linux so I can use it on my other machine. I might also create a second script that pipes the results through an LLM for use outside of prompts.