Parse and analyze source code ASTs from 20+ programming languages with tree-sitter grammars
Installing and Loading
INSTALL sitting_duck FROM community;
LOAD sitting_duck;
Example
-- Parse Python code and extract function definitions
SELECT name, type, depth
FROM parse_ast('
def hello():
return "hello world"
', 'python')
WHERE type = 'function_definition';
-- Parse JavaScript from a file
SELECT type, name, start_row, end_row
FROM read_ast('src/**/*.js')
WHERE type IN ('function_declaration', 'class_declaration');
-- Analyze code complexity by counting AST nodes
SELECT file_path, COUNT(*) as node_count
FROM read_ast('src/**/*.py')
GROUP BY file_path
ORDER BY node_count DESC;
-- Find all function calls in TypeScript
SELECT name, file_path, start_row
FROM read_ast('app.ts', language := 'typescript')
WHERE type = 'call_expression';
-- Cross-language analysis
SELECT language, COUNT(*) as definitions
FROM read_ast(['src/**/*.py', 'src/**/*.js', 'src/**/*.ts', 'src/**/*.go'])
WHERE type LIKE '%definition%' OR type LIKE '%declaration%'
GROUP BY language;
About sitting_duck
Sitting Duck is a DuckDB extension for parsing source code into Abstract Syntax Trees (ASTs) using tree-sitter grammars. It provides a SQL interface to analyze code structure across 20+ programming languages.
Core Table Functions:
read_ast(file_pattern, language := NULL)- Parse source files into AST rowsparse_ast(content, language)- Parse source code strings into AST rows
All pasring is highly flexible via parameters (see: https://github.com/teaguesterling/sitting_duck/blob/main/docs/api/parameters.md)
Supported Languages (27): | Category | Languages | |———-|———–| | Web | JavaScript, TypeScript, HTML, CSS | | Systems | C, C++, Go, Rust, Zig | | Scripting | Python, Ruby, PHP, Lua, R, Bash | | Enterprise | Java, C#, Kotlin, Swift | | Mobile | Dart | | Infrastructure | HCL (Terraform), JSON, TOML, GraphQL | | Documentation | SQL, Markdown |
AST Schema Fields: See: https://github.com/teaguesterling/sitting_duck/blob/main/docs/api/output-schema.md
type- Node type (e.g., function_definition, class_declaration)name- Extracted identifier name when applicablefile_path- Source file pathlanguage- Detected or specified languagestart_line,start_column- Node start positionend_end,end_column- Node end positiondepth- Nesting depth in the ASTparent_type- Type of parent nodedescendant_count- Number of descendant nodespeek- Customizable preview of node sourcesemantic_type- Normalized node type across all langaugesqualified_name- Extracted name and available scope qualifierssignature_type- Extracted type string for variables, expressions, and functions as availableparameters- Extraacted function parameters, arguments, parent clases, etc. as availablemodifiers- Class, variable, parameter modifiers as availableannotations- Decorators and annotations as avaialble
See https://github.com/teaguesterling/sitting_duck/blob/main/docs/native_extraction_semantics.md for additional details on "native" extraction for each langauge. Note: this is an active area of development.
Key Features:
- Automatic language detection from file extensions
- Glob pattern support for multi-file analysis
- Semantic type normalization across languages (https://github.com/teaguesterling/sitting_duck/blob/main/docs/api/semantic-types.md)
- Native, locally aware annotation of key node types for easy handling (https://github.com/teaguesterling/sitting_duck/blob/main/docs/native_extraction_semantics.md)
- Efficient tree-sitter parsing with pre-generated grammars
- DuckDB native SQL integration for complex queries
Perfect for code analysis, refactoring tools, documentation generation, codebase exploration, and AI-assisted development workflows.
Added Functions
| function_name | function_type | description | comment | examples |
|---|---|---|---|---|
| ast_get_source | macro | NULL | NULL | |
| ast_get_source_numbered | macro | NULL | NULL | |
| ast_supported_languages | table | NULL | NULL | |
| get_kind | scalar | NULL | NULL | |
| get_line | macro | NULL | NULL | |
| get_lines_text | macro | NULL | NULL | |
| get_searchable_types | scalar | NULL | NULL | |
| get_super_kind | scalar | NULL | NULL | |
| is_arithmetic | macro | NULL | NULL | |
| is_assignment | macro | NULL | NULL | |
| is_block | macro | NULL | NULL | |
| is_boolean_literal | macro | NULL | NULL | |
| is_call | scalar | NULL | NULL | |
| is_class_definition | macro | NULL | NULL | |
| is_comparison | macro | NULL | NULL | |
| is_conditional | macro | NULL | NULL | |
| is_control_flow | scalar | NULL | NULL | |
| is_definition | scalar | NULL | NULL | |
| is_function_call | macro | NULL | NULL | |
| is_function_definition | macro | NULL | NULL | |
| is_identifier | scalar | NULL | NULL | |
| is_jump | macro | NULL | NULL | |
| is_kind | scalar | NULL | NULL | |
| is_list | macro | NULL | NULL | |
| is_literal | macro | NULL | NULL | |
| is_logical | macro | NULL | NULL | |
| is_loop | macro | NULL | NULL | |
| is_member_access | macro | NULL | NULL | |
| is_module_definition | macro | NULL | NULL | |
| is_number_literal | macro | NULL | NULL | |
| is_semantic_type | scalar | NULL | NULL | |
| is_string_literal | macro | NULL | NULL | |
| is_type_definition | macro | NULL | NULL | |
| is_variable_definition | macro | NULL | NULL | |
| kind_code | scalar | NULL | NULL | |
| parse_ast | table | NULL | NULL | |
| parse_ast_flat | table | NULL | NULL | |
| parse_ast_hierarchical | table | NULL | NULL | |
| read_ast | table | NULL | NULL | |
| read_ast_flat | table | NULL | NULL | |
| read_ast_hierarchical | table | NULL | NULL | |
| read_ast_hierarchical_new | table | NULL | NULL | |
| read_lines | table_macro | NULL | NULL | |
| read_lines_context | table_macro | NULL | NULL | |
| read_lines_range | table_macro | NULL | NULL | |
| semantic_type_code | scalar | NULL | NULL | |
| semantic_type_to_string | scalar | NULL | NULL |