Speech-to-text transcription using whisper.cpp, OpenAI's Whisper model
Installing and Loading
INSTALL whisper FROM community;
LOAD whisper;
Example
-- Transcribe an audio file
D SELECT whisper_transcribe('audio.wav', 'tiny.en');
┌──────────────────────────────────────────────────┐
│ whisper_transcribe(...) │
│ varchar │
├──────────────────────────────────────────────────┤
│ Hello, this is a test of the whisper extension. │
└──────────────────────────────────────────────────┘
-- Get detailed transcription segments with timestamps
D SELECT * FROM whisper_transcribe_segments('audio.wav', 'tiny.en');
┌────────────┬────────────┬──────────┬────────────────────┬────────────┬──────────┐
│ segment_id │ start_time │ end_time │ text │ confidence │ language │
│ int32 │ double │ double │ varchar │ double │ varchar │
├────────────┼────────────┼──────────┼────────────────────┼────────────┼──────────┤
│ 0 │ 0.00 │ 2.50 │ Hello, this is a │ 0.95 │ en │
│ 1 │ 2.50 │ 4.00 │ test of whisper. │ 0.92 │ en │
└────────────┴────────────┴──────────┴────────────────────┴────────────┴──────────┘
-- List available models
D SELECT model_name, size_mb, is_downloaded FROM whisper_list_models() LIMIT 3;
┌────────────┬─────────┬───────────────┐
│ model_name │ size_mb │ is_downloaded │
│ varchar │ int64 │ boolean │
├────────────┼─────────┼───────────────┤
│ tiny │ 75 │ false │
│ tiny.en │ 75 │ true │
│ base │ 142 │ false │
└────────────┴─────────┴───────────────┘
About whisper
A DuckDB extension for speech-to-text transcription using whisper.cpp, the C/C++ port of OpenAI's Whisper model.
Transcribe audio files directly from SQL queries in DuckDB, making it easy to process and analyze audio data alongside your other data.
Features
- Transcribe audio files (WAV, MP3, FLAC, OGG, and more)
- Live recording and transcription from microphone
- Voice-to-SQL: speak natural language questions, get query results
- Support for all Whisper models (tiny, base, small, medium, large)
- Detailed transcription segments with timestamps and confidence scores
- Automatic language detection or specify target language
- Works with file paths, BLOB data, or remote URLs
Quick Start
Download a Model
Models must be downloaded before use. They are stored in ~/.duckdb/whisper/models/.
mkdir -p ~/.duckdb/whisper/models
curl -L -o ~/.duckdb/whisper/models/ggml-tiny.en.bin \
https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-tiny.en.bin
Check available models and download status:
SELECT * FROM whisper_list_models();
Transcribe an Audio File
-- Simple transcription
SELECT whisper_transcribe('audio.wav', 'tiny.en');
-- Get detailed segments with timestamps
SELECT * FROM whisper_transcribe_segments('audio.wav', 'tiny.en');
-- Translate foreign audio to English
SELECT whisper_translate('german_speech.mp3', 'small');
Example Use Cases
Transcribe Remote Audio
INSTALL httpfs;
LOAD httpfs;
SELECT whisper_transcribe(content, 'tiny.en')
FROM read_blob('https://example.com/audio.mp3');
Batch Transcribe Multiple Files
SELECT file, whisper_transcribe(file, 'tiny.en') as transcript
FROM glob('audio/*.wav');
Search Within Transcriptions
SELECT * FROM whisper_transcribe_segments('meeting.wav', 'base.en')
WHERE text ILIKE '%action item%';
Generate Subtitles (SRT Format)
SELECT
segment_id + 1 as id,
printf('%02d:%02d:%02d,%03d',
(start_time/3600)::int, ((start_time%3600)/60)::int,
(start_time%60)::int, ((start_time - start_time::int) * 1000)::int
) || ' --> ' ||
printf('%02d:%02d:%02d,%03d',
(end_time/3600)::int, ((end_time%3600)/60)::int,
(end_time%60)::int, ((end_time - end_time::int) * 1000)::int
) as timestamp,
trim(text) as text
FROM whisper_transcribe_segments('video.mp4', 'small.en');
Recording (requires microphone)
Setup Audio Device
Before using microphone recording or voice query features:
-- List all available audio input devices
SELECT * FROM whisper_list_devices();
-- Set the device ID (use a device_id from the list above)
SET whisper_device_id = 0;
-- Verify your microphone is working
SELECT whisper_mic_level(3);
Record and Transcribe
-- Record for 5 seconds
SELECT whisper_record(5, 'tiny.en');
-- Record until silence (max 30 seconds)
SELECT whisper_record_auto(30);
-- Record and translate to English
SELECT whisper_record_translate(5, 'small');
Voice-to-SQL (Experimental)
Speak natural language questions about your data and receive SQL query results.
Requires text-to-sql-proxy running locally.
-- Create test data
CREATE TABLE customers (id INT, name VARCHAR, revenue DECIMAL);
INSERT INTO customers VALUES (1, 'Acme', 100000), (2, 'Beta', 50000);
-- Get SQL from voice (doesn't execute)
SELECT whisper_voice_to_sql();
-- Execute voice query directly
FROM whisper_voice_query();
-- Include generated SQL in results
FROM whisper_voice_query_with_sql();
Available Models
| Model | Size | Description |
|---|---|---|
| tiny/tiny.en | ~75MB | Fastest |
| base/base.en | ~142MB | Fast |
| small/small.en | ~466MB | Good balance |
| medium/medium.en | ~1.5GB | High quality |
| large-v1/v2/v3 | ~2.9GB | Best quality |
| large-v3-turbo | ~1.6GB | Fast + accurate |
Models with .en suffix are optimized for English.
Supported Audio Formats
The extension uses FFmpeg for audio decoding: WAV, MP3, FLAC, OGG/Vorbis, AAC/M4A, and many more. Audio is automatically converted to 16kHz mono as required by Whisper.
Function Reference
Transcription Functions
whisper_transcribe(audio, [model])- Transcribes audio and returns the full textwhisper_translate(audio, [model])- Translates audio from any language to Englishwhisper_transcribe_segments(audio, [model], [language], [translate])- Returns table of segments with timestamps
Recording Functions
whisper_list_devices()- Lists available audio input deviceswhisper_record(duration_seconds, [model], [device_id])- Records and transcribeswhisper_record_auto(max_seconds, [silence_seconds], [model], [threshold], [device_id])- Records until silencewhisper_record_translate(duration_seconds, [model], [device_id])- Records and translates to Englishwhisper_mic_level(duration_seconds, [device_id])- Check microphone amplitude levels
Voice-to-SQL Functions
whisper_voice_to_sql([model], [device_id])- Records voice and returns generated SQLwhisper_voice_query([model], [device_id])- Records voice, generates SQL, executes itwhisper_voice_query_with_sql([model], [device_id])- Same as above with SQL columns
Model Management Functions
whisper_list_models()- Lists all available models and download statuswhisper_download_model(model_name)- Returns download instructions
Utility Functions
whisper_version()- Returns extension and whisper.cpp version infowhisper_check_audio(file_path)- Validates that an audio file can be readwhisper_audio_info(file_path)- Returns audio file metadatawhisper_get_config()- Returns current whisper configuration settings
Configuration
Configure settings using standard SET statements:
-- Model settings
SET whisper_model = 'small.en';
SET whisper_model_path = '/custom/path/models';
SET whisper_language = 'en';
SET whisper_threads = 4;
-- Recording settings
SET whisper_device_id = 0;
SET whisper_max_duration = 30;
SET whisper_silence_duration = 2;
SET whisper_silence_threshold = 0.005;
-- Voice query settings
SET whisper_text_to_sql_url = 'http://localhost:8080/generate-sql';
SET whisper_text_to_sql_timeout = 60;
SET whisper_voice_query_show_sql = true;
-- View all whisper settings
SELECT * FROM duckdb_settings() WHERE name LIKE 'whisper_%';
See the GitHub repository for full documentation.
Added Functions
This extension does not add any functions.
Overloaded Functions
This extension does not add any function overloads.
Added Types
This extension does not add any types.
Added Settings
This extension does not add any settings.