Search Shortcut cmd + k | ctrl + k
whisper

Speech-to-text transcription using whisper.cpp, OpenAI's Whisper model

Maintainer(s): tobilg

Installing and Loading

INSTALL whisper FROM community;
LOAD whisper;

Example

-- Transcribe an audio file
D SELECT whisper_transcribe('audio.wav', 'tiny.en');
┌──────────────────────────────────────────────────┐
           whisper_transcribe(...)                
                   varchar                        
├──────────────────────────────────────────────────┤
 Hello, this is a test of the whisper extension.  
└──────────────────────────────────────────────────┘

-- Get detailed transcription segments with timestamps
D SELECT * FROM whisper_transcribe_segments('audio.wav', 'tiny.en');
┌────────────┬────────────┬──────────┬────────────────────┬────────────┬──────────┐
 segment_id  start_time  end_time         text         confidence  language 
   int32       double     double        varchar          double    varchar  
├────────────┼────────────┼──────────┼────────────────────┼────────────┼──────────┤
          0        0.00      2.50  Hello, this is a          0.95  en       
          1        2.50      4.00  test of whisper.          0.92  en       
└────────────┴────────────┴──────────┴────────────────────┴────────────┴──────────┘

-- List available models
D SELECT model_name, size_mb, is_downloaded FROM whisper_list_models() LIMIT 3;
┌────────────┬─────────┬───────────────┐
 model_name  size_mb  is_downloaded 
  varchar     int64      boolean    
├────────────┼─────────┼───────────────┤
 tiny             75  false         
 tiny.en          75  true          
 base            142  false         
└────────────┴─────────┴───────────────┘

About whisper

A DuckDB extension for speech-to-text transcription using whisper.cpp, the C/C++ port of OpenAI's Whisper model.

Transcribe audio files directly from SQL queries in DuckDB, making it easy to process and analyze audio data alongside your other data.

Features

  • Transcribe audio files (WAV, MP3, FLAC, OGG, and more)
  • Live recording and transcription from microphone
  • Voice-to-SQL: speak natural language questions, get query results
  • Support for all Whisper models (tiny, base, small, medium, large)
  • Detailed transcription segments with timestamps and confidence scores
  • Automatic language detection or specify target language
  • Works with file paths, BLOB data, or remote URLs

Quick Start

Download a Model

Models must be downloaded before use. They are stored in ~/.duckdb/whisper/models/.

mkdir -p ~/.duckdb/whisper/models
curl -L -o ~/.duckdb/whisper/models/ggml-tiny.en.bin \
  https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-tiny.en.bin

Check available models and download status:

SELECT * FROM whisper_list_models();

Transcribe an Audio File

-- Simple transcription
SELECT whisper_transcribe('audio.wav', 'tiny.en');

-- Get detailed segments with timestamps
SELECT * FROM whisper_transcribe_segments('audio.wav', 'tiny.en');

-- Translate foreign audio to English
SELECT whisper_translate('german_speech.mp3', 'small');

Example Use Cases

Transcribe Remote Audio

INSTALL httpfs;
LOAD httpfs;

SELECT whisper_transcribe(content, 'tiny.en')
FROM read_blob('https://example.com/audio.mp3');

Batch Transcribe Multiple Files

SELECT file, whisper_transcribe(file, 'tiny.en') as transcript
FROM glob('audio/*.wav');

Search Within Transcriptions

SELECT * FROM whisper_transcribe_segments('meeting.wav', 'base.en')
WHERE text ILIKE '%action item%';

Generate Subtitles (SRT Format)

SELECT
    segment_id + 1 as id,
    printf('%02d:%02d:%02d,%03d',
        (start_time/3600)::int, ((start_time%3600)/60)::int,
        (start_time%60)::int, ((start_time - start_time::int) * 1000)::int
    ) || ' --> ' ||
    printf('%02d:%02d:%02d,%03d',
        (end_time/3600)::int, ((end_time%3600)/60)::int,
        (end_time%60)::int, ((end_time - end_time::int) * 1000)::int
    ) as timestamp,
    trim(text) as text
FROM whisper_transcribe_segments('video.mp4', 'small.en');

Recording (requires microphone)

Setup Audio Device

Before using microphone recording or voice query features:

-- List all available audio input devices
SELECT * FROM whisper_list_devices();

-- Set the device ID (use a device_id from the list above)
SET whisper_device_id = 0;

-- Verify your microphone is working
SELECT whisper_mic_level(3);

Record and Transcribe

-- Record for 5 seconds
SELECT whisper_record(5, 'tiny.en');

-- Record until silence (max 30 seconds)
SELECT whisper_record_auto(30);

-- Record and translate to English
SELECT whisper_record_translate(5, 'small');

Voice-to-SQL (Experimental)

Speak natural language questions about your data and receive SQL query results.

Requires text-to-sql-proxy running locally.

-- Create test data
CREATE TABLE customers (id INT, name VARCHAR, revenue DECIMAL);
INSERT INTO customers VALUES (1, 'Acme', 100000), (2, 'Beta', 50000);

-- Get SQL from voice (doesn't execute)
SELECT whisper_voice_to_sql();

-- Execute voice query directly
FROM whisper_voice_query();

-- Include generated SQL in results
FROM whisper_voice_query_with_sql();

Available Models

Model Size Description
tiny/tiny.en ~75MB Fastest
base/base.en ~142MB Fast
small/small.en ~466MB Good balance
medium/medium.en ~1.5GB High quality
large-v1/v2/v3 ~2.9GB Best quality
large-v3-turbo ~1.6GB Fast + accurate

Models with .en suffix are optimized for English.

Supported Audio Formats

The extension uses FFmpeg for audio decoding: WAV, MP3, FLAC, OGG/Vorbis, AAC/M4A, and many more. Audio is automatically converted to 16kHz mono as required by Whisper.

Function Reference

Transcription Functions

  • whisper_transcribe(audio, [model]) - Transcribes audio and returns the full text
  • whisper_translate(audio, [model]) - Translates audio from any language to English
  • whisper_transcribe_segments(audio, [model], [language], [translate]) - Returns table of segments with timestamps

Recording Functions

  • whisper_list_devices() - Lists available audio input devices
  • whisper_record(duration_seconds, [model], [device_id]) - Records and transcribes
  • whisper_record_auto(max_seconds, [silence_seconds], [model], [threshold], [device_id]) - Records until silence
  • whisper_record_translate(duration_seconds, [model], [device_id]) - Records and translates to English
  • whisper_mic_level(duration_seconds, [device_id]) - Check microphone amplitude levels

Voice-to-SQL Functions

  • whisper_voice_to_sql([model], [device_id]) - Records voice and returns generated SQL
  • whisper_voice_query([model], [device_id]) - Records voice, generates SQL, executes it
  • whisper_voice_query_with_sql([model], [device_id]) - Same as above with SQL columns

Model Management Functions

  • whisper_list_models() - Lists all available models and download status
  • whisper_download_model(model_name) - Returns download instructions

Utility Functions

  • whisper_version() - Returns extension and whisper.cpp version info
  • whisper_check_audio(file_path) - Validates that an audio file can be read
  • whisper_audio_info(file_path) - Returns audio file metadata
  • whisper_get_config() - Returns current whisper configuration settings

Configuration

Configure settings using standard SET statements:

-- Model settings
SET whisper_model = 'small.en';
SET whisper_model_path = '/custom/path/models';
SET whisper_language = 'en';
SET whisper_threads = 4;

-- Recording settings
SET whisper_device_id = 0;
SET whisper_max_duration = 30;
SET whisper_silence_duration = 2;
SET whisper_silence_threshold = 0.005;

-- Voice query settings
SET whisper_text_to_sql_url = 'http://localhost:8080/generate-sql';
SET whisper_text_to_sql_timeout = 60;
SET whisper_voice_query_show_sql = true;

-- View all whisper settings
SELECT * FROM duckdb_settings() WHERE name LIKE 'whisper_%';

See the GitHub repository for full documentation.

Added Functions

This extension does not add any functions.

Overloaded Functions

This extension does not add any function overloads.

Added Types

This extension does not add any types.

Added Settings

This extension does not add any settings.