Search Shortcut cmd + k | ctrl + k
rdf

A DuckDB extension to read and write RDF

Maintainer(s): nonodename

Installing and Loading

INSTALL rdf FROM community;
LOAD rdf;

Example

-- 0. Assuming the extension is already installed and loaded

-- 1. Get number of ntriples in a directory
SELECT COUNT(*) FROM read_rdf('data/shards/*.nt');

-- 2. Get subjects and predicates of a turtle file
SELECT subject, predicate FROM read_rdf('test/rdf/tests.ttl');

-- 3. Write a query to turtle RDF, using R2RML mapping
COPY (SELECT empno, ename, deptno FROM emp)
TO 'output.nt'
(FORMAT r2rml, mapping 'mapping.ttl');

-- 4. Execute a full R2RML mapping (with embedded queries) to write RDF
COPY (SELECT 1) TO 'output.nt' (FORMAT r2rml, mapping 'mapping.ttl');

-- 5. Check if an R2RML mapping is valid
SELECT is_valid_r2rml('mapping.ttl');

-- 6. Pivot RDF to a wide table
SELECT * FROM pivot_rdf('data.ttl');

-- 7. Read a SPARQL endpoint
SELECT * FROM read_sparql(
         'https://query.wikidata.org/sparql',
         'SELECT (COUNT(*) AS ?count) WHERE { ?item wdt:P31 wd:Q5 .}'
     );

About rdf

The duck_rdf extension enables DuckDB to read and write RDF (Resource Description Framework) data directly, using the SERD library for parsing and serialization.

Supported Formats

Read: Turtle (.ttl), NTriples (.nt), NQuads (.nq), TriG (.trig), and RDF/XML (.rdf/.xml, experimental — read-only).

Write: NTriples, Turtle, NQuads (via R2RML mapping).

Reading RDF

read_rdf() returns six columns: subject, predicate, object (always populated), and graph, language_tag, datatype (nullable). It accepts a file path or glob pattern; multiple matched files are scanned in parallel.

SELECT subject, predicate FROM read_rdf('data.ttl');
SELECT COUNT(*) FROM read_rdf('data/shards/*.nt');
SELECT * FROM read_rdf('data/*.dat', file_type = 'ttl', strict_parsing = false);

Optional parameters:

Parameter Default Description
strict_parsing true Set to false to allow malformed URIs
prefix_expansion false Expand CURIE-form URIs to full URIs (Turtle/TriG only)
file_type auto-detected Override format: ttl, nt, nq, trig, rdf/xml

pivot_rdf() takes the same path/glob argument as read_rdf() and returns a pivoted table, one column per predicate, at least one row per subject. (To operate on arbitrary file sizes subjects may be repeated if encountered out of sequence). While a pivot is possible in the SQL domain, it is subject to memory limits which this function aims to avoid by doing two passes on the RDF, the first profiling the shape of the data using profile_rdf().

The experimental read_sparql(endpoint, query) sends a SPARQL SELECT query to a remote endpoint and returns the result set as a DuckDB table. Column names are derived from the SPARQL variable names; all columns are VARCHAR. Unbound variables are returned as empty strings.

-- Count number of humans in wikidata
SELECT * FROM read_sparql(
            'https://query.wikidata.org/sparql',
            'SELECT (COUNT(*) AS ?count) WHERE {   ?item wdt:P31 wd:Q5 .}'
        );

Writing RDF (R2RML)

Write RDF using R2RML mapping files with DuckDB's COPY TO syntax. Two modes are supported:

Inside-out mode — DuckDB drives the query; the mapping has no rr:logicalTable:

COPY (SELECT empno, ename, deptno FROM emp)
TO 'output.nt' (FORMAT r2rml, mapping 'mapping.ttl');

Full R2RML mode — the mapping defines its own queries:

COPY (SELECT 1) TO 'output.nt' (FORMAT r2rml, mapping 'mapping.ttl');

Write options:

Option Required Default Description
mapping Yes Path to R2RML mapping file (.ttl)
rdf_format No ntriples Output format: ntriples, turtle, or nquads
ignore_non_fatal_errors No true Raise an exception on the first parse error when false

Validation Helpers

SELECT is_valid_r2rml('mapping.ttl');      -- validate an R2RML mapping file
SELECT can_call_inside_out('mapping.ttl'); -- check if inside-out mode is supported

Added Functions

function_name function_type description comment examples
can_call_inside_out scalar NULL NULL  
is_valid_r2rml scalar NULL NULL  
pivot_rdf table NULL NULL  
profile_rdf table NULL NULL  
read_rdf table NULL NULL  
read_sparql table NULL NULL  

Overloaded Functions

This extension does not add any function overloads.

Added Types

This extension does not add any types.

Added Settings

This extension does not add any settings.