Search Shortcut cmd + k | ctrl + k
whisper

Speech-to-text transcription using whisper.cpp, OpenAI's Whisper model

Maintainer(s): tobilg

Installing and Loading

INSTALL whisper FROM community;
LOAD whisper;

Example

-- Transcribe an audio file
D SELECT whisper_transcribe('audio.wav', 'tiny.en');
┌──────────────────────────────────────────────────┐
           whisper_transcribe(...)                
                   varchar                        
├──────────────────────────────────────────────────┤
 Hello, this is a test of the whisper extension.  
└──────────────────────────────────────────────────┘

-- Get detailed transcription segments with timestamps
D SELECT * FROM whisper_transcribe_segments('audio.wav', 'tiny.en');
┌────────────┬────────────┬──────────┬────────────────────┬────────────┬──────────┐
 segment_id  start_time  end_time         text         confidence  language 
   int32       double     double        varchar          double    varchar  
├────────────┼────────────┼──────────┼────────────────────┼────────────┼──────────┤
          0        0.00      2.50  Hello, this is a          0.95  en       
          1        2.50      4.00  test of whisper.          0.92  en       
└────────────┴────────────┴──────────┴────────────────────┴────────────┴──────────┘

-- List available models
D SELECT model_name, size_mb, is_downloaded FROM whisper_list_models() LIMIT 3;
┌────────────┬─────────┬───────────────┐
 model_name  size_mb  is_downloaded 
  varchar     int64      boolean    
├────────────┼─────────┼───────────────┤
 tiny             75  false         
 tiny.en          75  true          
 base            142  false         
└────────────┴─────────┴───────────────┘

About whisper

A DuckDB extension for speech-to-text transcription using whisper.cpp, the C/C++ port of OpenAI's Whisper model.

Transcribe audio files directly from SQL queries in DuckDB, making it easy to process and analyze audio data alongside your other data.

Features

  • Transcribe audio files (WAV, MP3, FLAC, OGG, and more)
  • Live recording and transcription from microphone
  • Voice-to-SQL: speak natural language questions, get query results
  • Support for all Whisper models (tiny, base, small, medium, large)
  • Detailed transcription segments with timestamps and confidence scores
  • Automatic language detection or specify target language
  • Works with file paths, BLOB data, or remote URLs

Quick Start

Download a Model

Models must be downloaded before use. They are stored in ~/.duckdb/whisper/models/.

mkdir -p ~/.duckdb/whisper/models
curl -L -o ~/.duckdb/whisper/models/ggml-tiny.en.bin \
  https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-tiny.en.bin

Check available models and download status:

SELECT * FROM whisper_list_models();

Transcribe an Audio File

-- Simple transcription
SELECT whisper_transcribe('audio.wav', 'tiny.en');

-- Get detailed segments with timestamps
SELECT * FROM whisper_transcribe_segments('audio.wav', 'tiny.en');

-- Translate foreign audio to English
SELECT whisper_translate('german_speech.mp3', 'small');

Example Use Cases

Transcribe Remote Audio

INSTALL httpfs;
LOAD httpfs;

SELECT whisper_transcribe(content, 'tiny.en')
FROM read_blob('https://example.com/audio.mp3');

Batch Transcribe Multiple Files

SELECT file, whisper_transcribe(file, 'tiny.en') as transcript
FROM glob('audio/*.wav');

Search Within Transcriptions

SELECT * FROM whisper_transcribe_segments('meeting.wav', 'base.en')
WHERE text ILIKE '%action item%';

Generate Subtitles (SRT Format)

SELECT
    segment_id + 1 as id,
    printf('%02d:%02d:%02d,%03d',
        (start_time/3600)::int, ((start_time%3600)/60)::int,
        (start_time%60)::int, ((start_time - start_time::int) * 1000)::int
    ) || ' --> ' ||
    printf('%02d:%02d:%02d,%03d',
        (end_time/3600)::int, ((end_time%3600)/60)::int,
        (end_time%60)::int, ((end_time - end_time::int) * 1000)::int
    ) as timestamp,
    trim(text) as text
FROM whisper_transcribe_segments('video.mp4', 'small.en');

Recording (requires microphone)

Setup Audio Device

Before using microphone recording or voice query features:

-- List all available audio input devices
SELECT * FROM whisper_list_devices();

-- Set the device ID (use a device_id from the list above)
SET whisper_device_id = 0;

-- Verify your microphone is working
SELECT whisper_mic_level(3);

Record and Transcribe

-- Record for 5 seconds
SELECT whisper_record(5, 'tiny.en');

-- Record until silence (max 30 seconds)
SELECT whisper_record_auto(30);

-- Record and translate to English
SELECT whisper_record_translate(5, 'small');

Voice-to-SQL (Experimental)

Speak natural language questions about your data and receive SQL query results.

Requires text-to-sql-proxy running locally.

-- Create test data
CREATE TABLE customers (id INT, name VARCHAR, revenue DECIMAL);
INSERT INTO customers VALUES (1, 'Acme', 100000), (2, 'Beta', 50000);

-- Get SQL from voice (doesn't execute)
SELECT whisper_voice_to_sql();

-- Execute voice query directly
FROM whisper_voice_query();

-- Include generated SQL in results
FROM whisper_voice_query_with_sql();

Available Models

Model Size Description
tiny/tiny.en ~75MB Fastest
base/base.en ~142MB Fast
small/small.en ~466MB Good balance
medium/medium.en ~1.5GB High quality
large-v1/v2/v3 ~2.9GB Best quality
large-v3-turbo ~1.6GB Fast + accurate

Models with .en suffix are optimized for English.

Supported Audio Formats

The extension uses FFmpeg for audio decoding: WAV, MP3, FLAC, OGG/Vorbis, AAC/M4A, and many more. Audio is automatically converted to 16kHz mono as required by Whisper.

Function Reference

Transcription Functions

  • whisper_transcribe(audio, [model]) - Transcribes audio and returns the full text
  • whisper_translate(audio, [model]) - Translates audio from any language to English
  • whisper_transcribe_segments(audio, [model], [language], [translate]) - Returns table of segments with timestamps

Recording Functions

  • whisper_list_devices() - Lists available audio input devices
  • whisper_record(duration_seconds, [model], [device_id]) - Records and transcribes
  • whisper_record_auto(max_seconds, [silence_seconds], [model], [threshold], [device_id]) - Records until silence
  • whisper_record_translate(duration_seconds, [model], [device_id]) - Records and translates to English
  • whisper_mic_level(duration_seconds, [device_id]) - Check microphone amplitude levels

Voice-to-SQL Functions

  • whisper_voice_to_sql([model], [device_id]) - Records voice and returns generated SQL
  • whisper_voice_query([model], [device_id]) - Records voice, generates SQL, executes it
  • whisper_voice_query_with_sql([model], [device_id]) - Same as above with SQL columns

Model Management Functions

  • whisper_list_models() - Lists all available models and download status
  • whisper_download_model(model_name) - Returns download instructions

Utility Functions

  • whisper_version() - Returns extension and whisper.cpp version info
  • whisper_check_audio(file_path) - Validates that an audio file can be read
  • whisper_audio_info(file_path) - Returns audio file metadata
  • whisper_get_config() - Returns current whisper configuration settings

Configuration

Configure settings using standard SET statements:

-- Model settings
SET whisper_model = 'small.en';
SET whisper_model_path = '/custom/path/models';
SET whisper_language = 'en';
SET whisper_threads = 4;

-- Recording settings
SET whisper_device_id = 0;
SET whisper_max_duration = 30;
SET whisper_silence_duration = 2;
SET whisper_silence_threshold = 0.005;

-- Voice query settings
SET whisper_text_to_sql_url = 'http://localhost:8080/generate-sql';
SET whisper_text_to_sql_timeout = 60;
SET whisper_voice_query_show_sql = true;

-- View all whisper settings
SELECT * FROM duckdb_settings() WHERE name LIKE 'whisper_%';

See the GitHub repository for full documentation.