The Duck Lineage extension automatically captures query lineage events and sends them to an Open Lineage backend.
Maintainer(s):
thijs-s
Installing and Loading
INSTALL duck_lineage FROM community;
LOAD duck_lineage;
Example
-- Point to a running Open Lineage backend
SET duck_lineage_url = 'http://localhost:5000/api/v1/lineage';
-- Run some queries
CREATE TABLE greetings (id INTEGER, message VARCHAR);
INSERT INTO greetings VALUES (1, 'Hello'), (2, 'World');
SELECT * FROM greetings;
-- The lineage information for the above queries will be sent to the Open Lineage backend automatically!
About duck_lineage
The duck_lineage extension automatically captures data lineage from every DuckDB query and emits OpenLineage events to any compatible backend (e.g., Marquez, Atlan, DataHub).
Features:
- Automatic lineage capture — no query modification required
- OpenLineage START/COMPLETE/FAIL event lifecycle for every query
- Input and output dataset extraction from logical query plans
- Schema facets with column names and types for all tracked datasets
- SQL query facet attached to every event
- Output statistics facet (row count) on COMPLETE events
- Lifecycle state change tracking (CREATE, DROP, ALTER, OVERWRITE, RENAME, TRUNCATE)
- Symlinks facet for dataset identity resolution
- File-based source tracking (read_csv, read_parquet, COPY TO)
- DuckLake catalog support with automatic namespace resolution from DATA_PATH
- Asynchronous event delivery via background worker thread
- Exponential backoff retry with configurable max retries
- Configurable event queue with overflow protection
- API key authentication for OpenLineage backends
- Parent run facet via OPENLINEAGE_PARENT_* environment variables
- Debug logging mode
Tracked operations:
- INSERT, UPDATE, DELETE, MERGE
- CREATE TABLE, CREATE TABLE AS, CREATE VIEW, CREATE INDEX
- DROP, ALTER
- COPY TO
- SELECT (read-only lineage)
Configuration (via SET statements):
- duck_lineage_url — OpenLineage backend endpoint
- duck_lineage_namespace — default dataset namespace
- duck_lineage_api_key — authentication key
- duck_lineage_debug — enable debug logging
- duck_lineage_max_retries — retry attempts for failed HTTP requests (default: 3)
- duck_lineage_max_queue_size — max queued events before dropping (default: 10000)
- duck_lineage_timeout — HTTP request timeout in seconds (default: 10)
Limitations:
- Lineage captured from Prepared Statements is less detailed
- Requires an external OpenLineage-compatible backend for event storage
Added Functions
This extension does not add any functions.
Overloaded Functions
This extension does not add any function overloads.
Added Types
This extension does not add any types.
Added Settings
| name | description | input_type | scope | aliases |
|---|---|---|---|---|
| duck_lineage_api_key | API Key for OpenLineage backend | VARCHAR | GLOBAL | [] |
| duck_lineage_debug | Enable debug logging for OpenLineage events | BOOLEAN | GLOBAL | [] |
| duck_lineage_max_queue_size | Maximum number of events to queue before dropping | BIGINT | GLOBAL | [] |
| duck_lineage_max_retries | Maximum retry attempts for failed HTTP requests | BIGINT | GLOBAL | [] |
| duck_lineage_namespace | Namespace for OpenLineage events | VARCHAR | GLOBAL | [] |
| duck_lineage_timeout | HTTP request timeout in seconds | BIGINT | GLOBAL | [] |
| duck_lineage_url | URL of the OpenLineage backend | VARCHAR | GLOBAL | [] |