A DuckDB extension to read and write RDF
Installing and Loading
INSTALL rdf FROM community;
LOAD rdf;
Example
-- 0. Assuming the extension is already installed and loaded
-- 1. Get number of ntriples in a directory
SELECT COUNT(*) FROM read_rdf('data/shards/*.nt');
-- 2. Get subjects and predicates of a turtle file
SELECT subject, predicate FROM read_rdf('test/rdf/tests.ttl');
-- 3. Write a query to turtle RDF, using R2RML mapping
COPY (SELECT empno, ename, deptno FROM emp)
TO 'output.nt'
(FORMAT r2rml, mapping 'mapping.ttl');
-- 4. Execute a full R2RML mapping (with embedded queries) to write RDF
COPY (SELECT 1) TO 'output.nt' (FORMAT r2rml, mapping 'mapping.ttl');
-- 5. Check if an R2RML mapping is valid
SELECT is_valid_r2rml('mapping.ttl');
-- 6. Pivot RDF to a wide table
SELECT * FROM pivot_rdf('data.ttl');
-- 7. Read a SPARQL endpoint
SELECT * FROM read_sparql(
'https://query.wikidata.org/sparql',
'SELECT (COUNT(*) AS ?count) WHERE { ?item wdt:P31 wd:Q5 .}'
);
About rdf
The duck_rdf extension enables DuckDB to read and write RDF (Resource Description Framework)
data directly, using the SERD library for
parsing and serialization.
Supported Formats
Read: Turtle (.ttl), NTriples (.nt), NQuads (.nq), TriG (.trig), and
RDF/XML (.rdf/.xml, experimental — read-only).
Write: NTriples, Turtle, NQuads (via R2RML mapping).
Reading RDF
read_rdf() returns six columns: subject, predicate, object (always populated),
and graph, language_tag, datatype (nullable). It accepts a file path or glob pattern;
multiple matched files are scanned in parallel.
SELECT subject, predicate FROM read_rdf('data.ttl');
SELECT COUNT(*) FROM read_rdf('data/shards/*.nt');
SELECT * FROM read_rdf('data/*.dat', file_type = 'ttl', strict_parsing = false);
Optional parameters:
| Parameter | Default | Description |
|---|---|---|
strict_parsing |
true |
Set to false to allow malformed URIs |
prefix_expansion |
false |
Expand CURIE-form URIs to full URIs (Turtle/TriG only) |
file_type |
auto-detected | Override format: ttl, nt, nq, trig, rdf/xml |
pivot_rdf() takes the same path/glob argument as read_rdf() and returns a pivoted table,
one column per predicate, at least one row per subject. (To operate on arbitrary file sizes
subjects may be repeated if encountered out of sequence). While a pivot is possible in the
SQL domain, it is subject to memory limits which this function aims to avoid by doing two
passes on the RDF, the first profiling the shape of the data using profile_rdf().
The experimental read_sparql(endpoint, query) sends a SPARQL SELECT query to a remote
endpoint and returns the result set as a DuckDB table. Column names are derived from the
SPARQL variable names; all columns are VARCHAR. Unbound variables are returned as empty strings.
-- Count number of humans in wikidata
SELECT * FROM read_sparql(
'https://query.wikidata.org/sparql',
'SELECT (COUNT(*) AS ?count) WHERE { ?item wdt:P31 wd:Q5 .}'
);
Writing RDF (R2RML)
Write RDF using R2RML mapping files with DuckDB's COPY TO
syntax. Two modes are supported:
Inside-out mode — DuckDB drives the query; the mapping has no rr:logicalTable:
COPY (SELECT empno, ename, deptno FROM emp)
TO 'output.nt' (FORMAT r2rml, mapping 'mapping.ttl');
Full R2RML mode — the mapping defines its own queries:
COPY (SELECT 1) TO 'output.nt' (FORMAT r2rml, mapping 'mapping.ttl');
Write options:
| Option | Required | Default | Description |
|---|---|---|---|
mapping |
Yes | — | Path to R2RML mapping file (.ttl) |
rdf_format |
No | ntriples |
Output format: ntriples, turtle, or nquads |
ignore_non_fatal_errors |
No | true |
Raise an exception on the first parse error when false |
Validation Helpers
SELECT is_valid_r2rml('mapping.ttl'); -- validate an R2RML mapping file
SELECT can_call_inside_out('mapping.ttl'); -- check if inside-out mode is supported
Added Functions
| function_name | function_type | description | comment | examples |
|---|---|---|---|---|
| can_call_inside_out | scalar | NULL | NULL | |
| is_valid_r2rml | scalar | NULL | NULL | |
| pivot_rdf | table | NULL | NULL | |
| profile_rdf | table | NULL | NULL | |
| read_rdf | table | NULL | NULL | |
| read_sparql | table | NULL | NULL |
Overloaded Functions
This extension does not add any function overloads.
Added Types
This extension does not add any types.
Added Settings
This extension does not add any settings.