Search Shortcut cmd + k | ctrl + k
hnsw_acorn

HNSW vector search with ACORN-1 filtered search, RaBitQ quantization, metadata joins, and grouped nearest neighbors

Maintainer(s): cigrainger

Installing and Loading

INSTALL hnsw_acorn FROM community;
LOAD hnsw_acorn;

Example

-- Create a table with vectors and categories
CREATE TABLE items AS
SELECT i AS id,
    array_value(random(), random(), random())::FLOAT[3] AS vec,
    (i % 5) AS category
FROM range(1000) t(i);

-- Create HNSW index
CREATE INDEX idx ON items USING HNSW (vec);

-- Filtered vector search: returns exactly 5 results matching the filter
SELECT id, category, array_distance(vec, [0.5, 0.5, 0.5]::FLOAT[3]) AS dist
FROM items
WHERE category = 1
ORDER BY dist
LIMIT 5;

-- Per-group nearest neighbors
SELECT category,
    min_by(id, array_distance(vec, [0.5, 0.5, 0.5]::FLOAT[3]), 3) AS closest
FROM items GROUP BY category;

-- Metadata join (vectors + metadata in separate tables)
CREATE TABLE metadata AS
SELECT i AS id, (i % 5) AS category, 'item_' || i AS name
FROM range(1000) t(i);

SELECT m.name
FROM items JOIN metadata m ON items.id = m.id
WHERE m.category = 2
ORDER BY array_distance(vec, [0.5, 0.5, 0.5]::FLOAT[3])
LIMIT 5;

-- RaBitQ quantization (~21x memory reduction)
CREATE INDEX idx_rq ON items USING HNSW (vec) WITH (quantization = 'rabitq');

About hnsw_acorn

Fork of duckdb-vss that adds ACORN-1 filtered HNSW search (arXiv:2403.04871), RaBitQ binary quantization, metadata join optimization, and per-group nearest neighbor search.

ACORN-1 Filtered Search: The upstream extension applies WHERE clauses after the index returns results, so filtered queries often return fewer rows than LIMIT. This extension pushes filter predicates into the HNSW graph traversal, ensuring correct result counts with high recall.

RaBitQ Quantization: Compresses vectors to 1 bit per dimension with a rescore phase that preserves result quality. ~21x memory reduction at 128 dims. CREATE INDEX idx ON items USING HNSW (vec) WITH (quantization = 'rabitq');

Metadata Joins: When vectors and metadata live in separate tables, the optimizer rewrites JOIN + WHERE + ORDER BY + LIMIT into a single ACORN-1 filtered search. No special syntax needed. SELECT m.title FROM embeddings e JOIN metadata m ON e.id = m.id WHERE m.genre = 'sci-fi' ORDER BY array_distance(e.vec, query) LIMIT 10;

Grouped Nearest Neighbors: Per-group top-K search using standard SQL aggregation. For each distinct group value, runs a separate ACORN-1 filtered search with exact per-group recall. SELECT category, min_by(id, array_distance(vec, query), 5) FROM items GROUP BY category;

Features:

  • ACORN-1 two-hop expansion for graph connectivity under filtering
  • Selectivity-based strategy switching (post-filter / ACORN-1 / brute-force)
  • RaBitQ binary quantization (~21x memory reduction at 128 dims)
  • Metadata join optimization with zone map pruning
  • Per-group ACORN-1 search for grouped aggregations
  • All three distance metrics: L2, cosine, inner product
  • Prepared statement support for parameterized query vectors
  • Configurable thresholds: hnsw_acorn_threshold, hnsw_bruteforce_threshold

Added Functions

function_name function_type description comment examples
hnsw_compact_index pragma NULL NULL  
hnsw_index_scan table NULL NULL  
pragma_hnsw_index_info table NULL NULL  
vss_join table_macro NULL NULL  
vss_match table_macro NULL NULL  

Overloaded Functions

This extension does not add any function overloads.

Added Types

This extension does not add any types.

Added Settings

name description input_type scope aliases
hnsw_acorn_threshold selectivity above which ACORN-1 is skipped (standard HNSW + post-filter used instead) FLOAT GLOBAL []
hnsw_bruteforce_threshold selectivity below which brute-force exact scan is used instead of ACORN-1 FLOAT GLOBAL []
hnsw_ef_search experimental: override the ef_search parameter when scanning HNSW indexes BIGINT GLOBAL []
hnsw_enable_experimental_persistence experimental: enable creating HNSW indexes in persistent databases BOOLEAN GLOBAL []
hnsw_rabitq_oversample rescore oversample factor for RaBitQ-quantized HNSW indexes (default 3) BIGINT GLOBAL []