Skip to main content
Chapter 5 Local Power: Voice, Vision & File System Mastery

Voice Input with Whisper & FFmpeg

6 min read Lesson 27 / 65 Preview

Voice: the highest-bandwidth input you have

Typing is slow. Talking to your agent is two to three times faster, and on a phone it is the only humane option. We use Whisper for speech-to-text and FFmpeg for the audio pipeline.

Two flavors of Whisper

  • whisper.cpp — pure C++, runs on CPU, 100% local, free
  • OpenAI Whisper API — cloud, fast, paid, slightly more accurate on noisy audio

For privacy and zero ongoing cost, start with whisper.cpp. We will fall back to the API for hard cases.

Install whisper.cpp

git clone https://github.com/ggerganov/whisper.cpp ~/Code/whisper
cd ~/Code/whisper && make
./models/download-ggml-model.sh base.en

Wire it into OpenClaw

Add a Skill or Tool entry that:

  1. Records audio from the mic with FFmpeg into a temp .wav
  2. Pipes the file through whisper.cpp to produce text
  3. Sends the text into OpenClaw as if it were a typed message

Example FFmpeg recorder line (macOS):

ffmpeg -f avfoundation -i ":0" -t 30 -ar 16000 -ac 1 /tmp/voice.wav

When to choose which model

  • tiny.en — keyword grade, fastest
  • base.en — daily driver, ~1× realtime on a modern laptop
  • small.en / medium.en — meeting transcription, slower but punchy

Try it

Record yourself dictating a one-paragraph task. Confirm OpenClaw receives the transcribed text and acts on it.

Previous Hooks: Proactive Pings When Work Is Done
Engr Mejba Ahmed

Engr Mejba Ahmed

Claude Code Expert · Online

👋

Hey there!

Quick Actions

WhatsApp Instant reply

Chat on WhatsApp

+880 1723 741224 · Instant reply

Popular Questions

Engr Mejba Ahmed is connected
Engr Mejba Ahmed is typing...
Engr Mejba Ahmed avatar

✉ Want me to follow up? Drop your email

Engr Mejba Ahmed avatar

📞 Connect Directly

Choose how you'd like to reach me

WhatsApp

+880 1723 741224

Email

[email protected]

✓ Details sent! I'll get back to you shortly.

Powered by OpenAI

335+

Blog Posts

25

AI Courses

63

Projects

Services & Expertise

Pricing & Process

Learning & Resources

Connect & Support