Entries tagged "speech-to-text"

Voxtype (cache)

Push-to-talk voice-to-text optimized for Wayland, works on any Linux desktop. Hold a key, speak, release. Your words appear at the cursor.

This looks like the perfect Linux alternative to my macOS solution.

January 10th at 10:00 PM — open-source, speech-to-text

Simple and Free Speech-to-Text on macOS

After seeing Wispr Flow mentioned a few times, I was curious about using speech-to-text to interact with Claude Code. The idea of paying $12/month and potentially sharing all my prompts wasn’t ideal, so I decided to see what Claude Code could help me build.

With a bit of experimentation, I settled on using sox for audio recording, since it seemed the best at silence detection, and parakeet-mlx for transcribing. I tried various improvements to silence detection, but found turning up my input volume helped the most.

With three commands in a script, I have a decent local speech-to-text solution:

#!/bin/bash

RECORDING_FILE="/tmp/record-recording.wav"
TRANSCRIPT_FILE="/tmp/record-recording.txt"

# Record until silence or 60 seconds.
rec -q "${RECORDING_FILE}" rate 16k pad 0.2 0 silence 1 0.05 1% 1 1.0 1% trim 0 60

# Transcribe and output.
if [ -f "${RECORDING_FILE}" ]; then
  parakeet-mlx "${RECORDING_FILE}" --output-format txt --output-dir /tmp --chunk-duration 30 >/dev/null 2>&1

  if [ -f "${TRANSCRIPT_FILE}" ]; then
    cat "${TRANSCRIPT_FILE}"
  fi
fi

The real script is a little more verbose, but this demonstrates the core of it. I use Hammerspoon to trigger the script and type the response for me. I also have it display a recording and transcribing status indicator in the menu bar.

My next step is to decide what to use for transcription on Linux so I can use it on my other machine. I might also create a second script that pipes the results through an LLM for use outside of prompts.

December 8th, 2025 at 10:00 PM — ai, speech-to-text

Tristan Dunn

Found 2 entries tagged "speech-to-text".

Voxtype (cache)

Simple and Free Speech-to-Text on macOS