Query Google Cloud Firestore directly from DuckDB using SQL
Installing and Loading
INSTALL fire_duck_ext FROM community;
LOAD fire_duck_ext;
Example
LOAD fire_duck_ext;
-- Authenticate with a service account JSON file
CREATE SECRET my_firestore (
TYPE firestore,
PROJECT_ID 'my-gcp-project',
SERVICE_ACCOUNT_JSON '/path/to/service-account.json'
);
-- Or use inline JSON for the service account
CREATE SECRET my_firestore (
TYPE firestore,
PROJECT_ID 'my-gcp-project',
SERVICE_ACCOUNT_JSON '{
"type": "service_account",
"project_id": "my-gcp-project",
"private_key_id": "key123abc",
"private_key": "-----BEGIN RSA PRIVATE KEY-----\n...\n-----END RSA PRIVATE KEY-----\n",
"client_email": "sa@my-gcp-project.iam.gserviceaccount.com",
"client_id": "123456789",
"auth_uri": "https://accounts.google.com/o/oauth2/auth",
"token_uri": "https://oauth2.googleapis.com/token"
}'
);
-- Or authenticate with an API key (for dev/testing)
CREATE SECRET my_firestore (
TYPE firestore,
PROJECT_ID 'my-gcp-project',
API_KEY 'AIzaSyYourApiKeyHere'
);
-- Read documents
SELECT * FROM firestore_scan('users');
-- Filter documents
SELECT __document_id, name, email
FROM firestore_scan('users')
WHERE status = 'active';
-- Update documents
CALL firestore_update('users', 'user123', 'status', 'verified');
-- Batch update with DuckDB filtering
SET VARIABLE ids = (
SELECT list(__document_id)
FROM firestore_scan('users')
WHERE status = 'pending'
);
CALL firestore_update_batch('users', getvariable('ids'), 'status', 'reviewed', 'updated_at', now());
-- Insert documents from a subquery (auto-generated IDs)
CALL firestore_insert('users', (SELECT name, age FROM read_csv('new_users.csv')));
-- Insert with explicit document IDs from a column
CALL firestore_insert('users',
(SELECT * FROM read_csv('new_users.csv')),
document_id := 'user_id');
-- Insert with explicit document ID selected from a column
CALL firestore_insert('users',
(SELECT 'Alice' AS name, 30 AS age, 'alice_user_id' AS user_id),
document_id := 'user_id');
-- Target a specific named database
CREATE SECRET my_named_db (
TYPE firestore,
PROJECT_ID 'my-gcp-project',
SERVICE_ACCOUNT_JSON '/path/to/credentials.json',
DATABASE 'my-database'
);
-- Target multiple databases with a single secret
CREATE SECRET my_multi_db (
TYPE firestore,
PROJECT_ID 'my-gcp-project',
SERVICE_ACCOUNT_JSON '/path/to/credentials.json',
DATABASES ['(default)', 'analytics-db']
);
-- Target multiple databases with an API key
CREATE SECRET my_multi_db (
TYPE firestore,
PROJECT_ID 'my-gcp-project',
API_KEY 'AIzaSyYourApiKeyHere',
DATABASES ['(default)', 'analytics-db']
);
-- Connect to a specific database for the session
CALL firestore_connect('analytics-db');
-- Collection group queries
SELECT __document_id, name, email
FROM firestore_scan('~data_group')
WHERE status = 'active';
About fire_duck_ext
The fire_duck_ext extension enables direct SQL access to Google Cloud Firestore collections from DuckDB. It supports reading data with firestore_scan(), writing with firestore_update() and firestore_delete(), batch operations, array transforms, collection group queries, and DuckDB secret management for credential storage. Filter pushdown optimizes queries by sending supported filters directly to Firestore.
Authentication
Secrets require PROJECT_ID and one of SERVICE_ACCOUNT_JSON or
API_KEY.
SERVICE_ACCOUNT_JSON accepts either a file path or the full inline
JSON text of a Google Cloud service account key. The JSON must contain
at least type, project_id, private_key, and client_email.
API_KEY provides unauthenticated access suitable for development,
testing, or public databases. API key auth does not support
batchWrite, so batch operations fall back to individual requests.
If the GOOGLE_APPLICATION_CREDENTIALS environment variable is set,
the extension automatically creates an internal secret on startup that
matches all databases (DATABASE '*'), so no CREATE SECRET is needed.
Type Mapping
string→ VARCHARinteger→ BIGINTdouble→ DOUBLEboolean→ BOOLEANtimestamp→ TIMESTAMParray→ LISTmap→ VARCHAR (JSON string)vector→ ARRAY(DOUBLE, N) (embedding)null→ NULLgeoPoint→ STRUCT(latitude DOUBLE, longitude DOUBLE)reference→ VARCHARbytes→ BLOB
Schema Inference
When you call firestore_scan(), the extension automatically infers the
schema by sampling up to 100 documents from the collection. For each
field, it determines the DuckDB type using a voting system: the most
common non-null Firestore type across the sampled documents wins. Fields
that do not appear in every sampled document are marked nullable.
For array fields, element types are also inferred by sampling elements across documents and selecting the most common element type. If a field is only ever null across all samples, it defaults to VARCHAR.
A virtual __document_id column (VARCHAR) is always added as the first
column, containing the Firestore document ID (or the full document path
for collection group queries).
Documents with heterogeneous field types are handled gracefully at read time: values that cannot be converted to the inferred column type are returned as NULL.
Null Semantics
Both missing fields and explicit Firestore null values appear as NULL in DuckDB. There is no way to distinguish between the two on read.
Writing NULL to a field sets it to an explicit Firestore null value; it does not delete the field from the document.
WHERE field IS NULL is not pushed down to Firestore because Firestore's
IS_NULL operator only matches fields that exist and are explicitly null,
while DuckDB treats missing fields as NULL. WHERE field IS NOT NULL is
pushed down safely.
Batch Operations
Batch updates and deletes group writes into requests of up to 500 operations each using Firestore's batchWrite API. Note that batchWrite is not atomic: individual writes within the batch may succeed or fail independently. If batchWrite is unavailable (e.g., API key auth), the extension falls back to individual operations automatically.
Database Targeting
By default, secrets target the (default) Firestore database. Use
DATABASE 'name' for a single named database, DATABASE '*' as a
wildcard that matches any database, or DATABASES ['db1', 'db2'] to
allow a single secret to match multiple named databases. You cannot
specify both DATABASE and DATABASES in the same secret.
Use CALL firestore_connect('db-name') to set the session database.
All subsequent queries use this database until
CALL firestore_disconnect() is called. A per-query database :=
parameter always takes priority over the connected session database.
Collection Group Queries
Use the ~collection prefix to query across all subcollections with a
given name. For example, firestore_scan('~orders') reads all documents
from every subcollection named "orders" regardless of parent path.
Missing Documents
By default, firestore_scan() includes phantom/missing documents —
documents that have no fields but exist as parent paths for
subcollections. This matches Firebase Console behavior and is controlled
by show_missing (default: true). Set show_missing := false to
only return documents with fields. When a collection contains only
phantom documents, the result includes just the __document_id column.
Added Functions
| function_name | function_type | description | comment | examples |
|---|---|---|---|---|
| firestore_scan | table | Read all documents from a Firestore collection or collection group. | NULL | [SELECT * FROM firestore_scan('users');] |
| firestore_insert | table | Insert documents into a Firestore collection from a subquery. | NULL | [CALL firestore_insert('users', (SELECT name, age FROM read_csv('new_users.csv')));] |
| firestore_update | table | Update fields on a single Firestore document. | NULL | [CALL firestore_update('users', 'user123', 'status', 'verified');] |
| firestore_delete | table | Delete a single Firestore document. | NULL | [CALL firestore_delete('users', 'user123');] |
| firestore_update_batch | table | Batch update multiple Firestore documents by ID list. | NULL | [CALL firestore_update_batch('users', ['id1', 'id2'], 'status', 'reviewed');] |
| firestore_delete_batch | table | Batch delete multiple Firestore documents by ID list. | NULL | [CALL firestore_delete_batch('users', ['id1', 'id2']);] |
| firestore_array_union | table | Add elements to an array field without duplicates. | NULL | [CALL firestore_array_union('users', 'user123', 'tags', ['vip', 'active']);] |
| firestore_array_remove | table | Remove elements from an array field. | NULL | [CALL firestore_array_remove('users', 'user123', 'tags', ['inactive']);] |
| firestore_array_append | table | Append elements to an array field (allows duplicates). | NULL | [CALL firestore_array_append('users', 'user123', 'log', ['event1']);] |
| firestore_clear_cache | table | Clear the cached schema for all or a specific collection. | NULL | [CALL firestore_clear_cache();] |
| firestore_disconnect | table | NULL | NULL | NULL |
| firestore_connect | table | NULL | NULL | NULL |
Added Settings
| name | description | input_type | scope | aliases |
|---|---|---|---|---|
| firestore_schema_cache_ttl | Schema cache TTL in seconds (0 to disable caching) | BIGINT | GLOBAL | [] |