HNSW vector search with ACORN-1 filtered search, RaBitQ quantization, metadata joins, and grouped nearest neighbors
Installing and Loading
INSTALL hnsw_acorn FROM community;
LOAD hnsw_acorn;
Example
-- Create a table with vectors and categories
CREATE TABLE items AS
SELECT i AS id,
array_value(random(), random(), random())::FLOAT[3] AS vec,
(i % 5) AS category
FROM range(1000) t(i);
-- Create HNSW index
CREATE INDEX idx ON items USING HNSW (vec);
-- Filtered vector search: returns exactly 5 results matching the filter
SELECT id, category, array_distance(vec, [0.5, 0.5, 0.5]::FLOAT[3]) AS dist
FROM items
WHERE category = 1
ORDER BY dist
LIMIT 5;
-- Per-group nearest neighbors
SELECT category,
min_by(id, array_distance(vec, [0.5, 0.5, 0.5]::FLOAT[3]), 3) AS closest
FROM items GROUP BY category;
-- Metadata join (vectors + metadata in separate tables)
CREATE TABLE metadata AS
SELECT i AS id, (i % 5) AS category, 'item_' || i AS name
FROM range(1000) t(i);
SELECT m.name
FROM items JOIN metadata m ON items.id = m.id
WHERE m.category = 2
ORDER BY array_distance(vec, [0.5, 0.5, 0.5]::FLOAT[3])
LIMIT 5;
-- RaBitQ quantization (~21x memory reduction)
CREATE INDEX idx_rq ON items USING HNSW (vec) WITH (quantization = 'rabitq');
About hnsw_acorn
Fork of duckdb-vss that adds ACORN-1 filtered HNSW search (arXiv:2403.04871), RaBitQ binary quantization, metadata join optimization, and per-group nearest neighbor search.
ACORN-1 Filtered Search: The upstream extension applies WHERE clauses after the index returns results, so filtered queries often return fewer rows than LIMIT. This extension pushes filter predicates into the HNSW graph traversal, ensuring correct result counts with high recall.
RaBitQ Quantization: Compresses vectors to 1 bit per dimension with a rescore phase that preserves result quality. ~21x memory reduction at 128 dims. CREATE INDEX idx ON items USING HNSW (vec) WITH (quantization = 'rabitq');
Metadata Joins: When vectors and metadata live in separate tables, the optimizer rewrites JOIN + WHERE + ORDER BY + LIMIT into a single ACORN-1 filtered search. No special syntax needed. SELECT m.title FROM embeddings e JOIN metadata m ON e.id = m.id WHERE m.genre = 'sci-fi' ORDER BY array_distance(e.vec, query) LIMIT 10;
Grouped Nearest Neighbors: Per-group top-K search using standard SQL aggregation. For each distinct group value, runs a separate ACORN-1 filtered search with exact per-group recall. SELECT category, min_by(id, array_distance(vec, query), 5) FROM items GROUP BY category;
Features:
- ACORN-1 two-hop expansion for graph connectivity under filtering
- Selectivity-based strategy switching (post-filter / ACORN-1 / brute-force)
- RaBitQ binary quantization (~21x memory reduction at 128 dims)
- Metadata join optimization with zone map pruning
- Per-group ACORN-1 search for grouped aggregations
- All three distance metrics: L2, cosine, inner product
- Prepared statement support for parameterized query vectors
- Configurable thresholds: hnsw_acorn_threshold, hnsw_bruteforce_threshold
Added Functions
| function_name | function_type | description | comment | examples |
|---|---|---|---|---|
| hnsw_compact_index | pragma | NULL | NULL | |
| hnsw_index_scan | table | NULL | NULL | |
| pragma_hnsw_index_info | table | NULL | NULL | |
| vss_join | table_macro | NULL | NULL | |
| vss_match | table_macro | NULL | NULL |
Overloaded Functions
This extension does not add any function overloads.
Added Types
This extension does not add any types.
Added Settings
| name | description | input_type | scope | aliases |
|---|---|---|---|---|
| hnsw_acorn_threshold | selectivity above which ACORN-1 is skipped (standard HNSW + post-filter used instead) | FLOAT | GLOBAL | [] |
| hnsw_bruteforce_threshold | selectivity below which brute-force exact scan is used instead of ACORN-1 | FLOAT | GLOBAL | [] |
| hnsw_ef_search | experimental: override the ef_search parameter when scanning HNSW indexes | BIGINT | GLOBAL | [] |
| hnsw_enable_experimental_persistence | experimental: enable creating HNSW indexes in persistent databases | BOOLEAN | GLOBAL | [] |
| hnsw_rabitq_oversample | rescore oversample factor for RaBitQ-quantized HNSW indexes (default 3) | BIGINT | GLOBAL | [] |