Architecture Overview
db9 is a PostgreSQL-compatible distributed SQL database. Your data is stored in a distributed TiKV cluster, and each database (tenant) gets its own isolated keyspace.
Query Execution Pipeline
When you send a SQL query to db9, it follows this pipeline:
- Parse — SQL text is parsed into an abstract syntax tree
- Route — The statement is classified as DDL, DML, SELECT, transaction control, or settings
- Expand Views — View references are recursively inlined (up to 64 levels deep)
- Analyze — Semantic analysis resolves column names, infers types, and checks scope rules
- Optimize — Cost-based optimizer chooses join order, access methods (table scan vs. index scan), and physical operators
- Execute — A pull-based (Volcano) operator tree streams results one row at a time
- Return — Results are sent back over the PostgreSQL wire protocol
Wire Protocol Compatibility
db9 implements the PostgreSQL wire protocol (v3). You can connect with any PostgreSQL client:
psql- Any PostgreSQL ORM or driver (psycopg2, node-postgres, JDBC, etc.)
- Standard connection tools and GUIs
Both Simple Query and Extended Query (Parse/Bind/Describe/Execute) protocols are supported.
Query Optimization
The query optimizer uses table statistics from ANALYZE for cardinality estimation, cost-based join reordering (DPccp algorithm), predicate pushdown, subquery decorrelation, and index scan selection when beneficial.
Data Types
db9 supports 23 data types. All data is stored in UTF-8 encoding.
| SQL Type | Aliases | Description |
|---|---|---|
BOOLEAN | BOOL | True/false values |
INTEGER | INT, INT4 | 4-byte signed integer |
BIGINT | INT8 | 8-byte signed integer |
DOUBLE PRECISION | FLOAT8 | 8-byte IEEE 754 floating point |
NUMERIC | NUMERIC(p,s), DECIMAL | Arbitrary-precision number (up to 28-29 significant digits) |
TEXT | — | Variable-length UTF-8 string |
VARCHAR(n) | CHARACTER VARYING(n) | Length-limited text (max n characters) |
BYTEA | — | Raw byte array |
TIMESTAMP | TIMESTAMP WITHOUT TIME ZONE | Timestamp (millisecond precision) |
TIMESTAMPTZ | TIMESTAMP WITH TIME ZONE | Timestamp with timezone (millisecond precision) |
DATE | — | Calendar date (days) |
TIME | TIME WITHOUT TIME ZONE | Time of day (microsecond precision) |
INTERVAL | — | Time interval (months + milliseconds) |
UUID | — | 128-bit UUID |
JSON | — | JSON data (original text preserved) |
JSONB | — | JSON data in canonical binary form |
SERIAL | — | Auto-incrementing 4-byte integer |
BIGSERIAL | — | Auto-incrementing 8-byte integer |
type[] | — | Array of any supported type |
vector(N) | — | Dense float vector of N dimensions (pgvector-compatible) |
TSVECTOR | — | Full-text search document representation |
TSQUERY | — | Full-text search query |
NAME | — | 63-byte identifier (PostgreSQL system type) |
SMALLINT / INT2, REAL / FLOAT4, CHAR(n) (fixed-length). Use INTEGER, DOUBLE PRECISION, and VARCHAR(n) instead.
Interval Representation
Intervals store months and sub-month components separately, enabling calendar-aware arithmetic (e.g., adding 1 month to January 31 gives February 28/29). The sub-month component is stored in milliseconds.
Type Coercion
The type system uses two coercion strategies matching PostgreSQL behavior:
UNION / CASE / COALESCE / VALUES ("Text wins")
When mixing Text with a typed value, the result is Text. Example: SELECT 1 UNION SELECT 'a' resolves to TEXT.
Comparisons ("Non-Text wins")
When comparing Text with a typed value, the text literal is coerced to the typed side. Example: WHERE col = '42' coerces '42' to the column's type.
Numeric Promotion Hierarchy
Int32 → Int64 → Float64 → Numeric. When two numeric types are mixed, the higher-precedence type wins.
Temporal Promotion
Date+Timestamp=TimestampTimestamp+TimestampTz=TimestampTzDate+TimestampTz=TimestampTz
CAST Contexts
| Context | When Used | Behavior |
|---|---|---|
| Explicit | CAST(x AS type) or x::type | Most permissive. Allows rounding (Float to Int), Bool↔Int, VARCHAR truncation |
| Assignment | INSERT/UPDATE column coercion | Medium. Rejects fractional Float→Int, rejects Bool↔Int, errors on VARCHAR overflow |
| Implicit | Comparison coercion | Strictest. Similar to Assignment |
| Cast | Explicit | Assignment |
|---|---|---|
2.7::INTEGER | Rounds to 3 | Rejects (fraction) |
TRUE::INTEGER | Returns 1 | Rejects |
42::BOOLEAN | Returns true | Rejects |
'hello'::VARCHAR(3) | Truncates to "hel" | Errors if > 3 chars |
NULL to any type | Always succeeds | Always succeeds |
SQL Reference: DDL
Data Definition Language statements for creating and managing database objects.
CREATE TABLE
CREATE TABLE [IF NOT EXISTS] table_name (
column_name type [constraints],
...
[table_constraints]
);
-- Create from query result
CREATE TABLE new_table AS SELECT ... FROM source;
SELECT ... INTO new_table FROM source;
Column constraints: NOT NULL, DEFAULT, PRIMARY KEY, UNIQUE, REFERENCES (foreign key), CHECK
Table constraints: PRIMARY KEY (col, ...), UNIQUE (col, ...), FOREIGN KEY (col) REFERENCES table(col) with ON DELETE/ON UPDATE actions, CHECK (expr)
SERIAL and BIGSERIAL columns are supported for auto-incrementing IDs. Self-referential foreign keys are supported.
CREATE INDEX
-- B-tree index (default)
CREATE INDEX idx_name ON table (column);
-- Unique index
CREATE UNIQUE INDEX idx_name ON table (column);
-- GIN index (for JSONB, full-text search, arrays)
CREATE INDEX idx_name ON table USING GIN (column);
-- GiST index
CREATE INDEX idx_name ON table USING GIST (column);
-- Expression index
CREATE INDEX idx_name ON table (lower(column));
-- Partial index
CREATE INDEX idx_name ON table (column) WHERE active = true;
-- Non-blocking index creation
CREATE INDEX CONCURRENTLY idx_name ON table (column);
Index methods: btree (default), GIN, GiST. IF NOT EXISTS is supported.
CREATE VIEW
CREATE VIEW view_name AS SELECT ...;
CREATE OR REPLACE VIEW view_name AS SELECT ...;
CREATE OR REPLACE VIEW cannot drop or rename existing columns; only appending new columns is allowed.
CREATE MATERIALIZED VIEW
CREATE MATERIALIZED VIEW mv_name AS SELECT ...;
REFRESH MATERIALIZED VIEW mv_name;
ALTER TABLE
| Operation | Syntax |
|---|---|
| Add column | ALTER TABLE t ADD COLUMN col type |
| Drop column | ALTER TABLE t DROP COLUMN col |
| Rename column | ALTER TABLE t RENAME COLUMN old TO new |
| Rename table | ALTER TABLE t RENAME TO new_name |
| Set default | ALTER TABLE t ALTER COLUMN col SET DEFAULT val |
| Drop default | ALTER TABLE t ALTER COLUMN col DROP DEFAULT |
| Set not null | ALTER TABLE t ALTER COLUMN col SET NOT NULL |
| Drop not null | ALTER TABLE t ALTER COLUMN col DROP NOT NULL |
| Change type | ALTER TABLE t ALTER COLUMN col SET DATA TYPE type [USING expr] |
| Add constraint | ALTER TABLE t ADD CONSTRAINT name ... |
| Drop constraint | ALTER TABLE t DROP CONSTRAINT name |
| Rename constraint | ALTER TABLE t RENAME CONSTRAINT old TO new |
DROP Statements
DROP TABLE [IF EXISTS] table_name [CASCADE];
DROP INDEX [IF EXISTS] index_name;
DROP VIEW [IF EXISTS] view_name [CASCADE];
DROP MATERIALIZED VIEW [IF EXISTS] mv_name [CASCADE];
TRUNCATE TABLE table_name;
CASCADE transitively resolves and drops all dependent views and materialized views.
Other DDL
| Statement | Supported |
|---|---|
CREATE SCHEMA | Yes |
CREATE SEQUENCE / DROP SEQUENCE | Yes |
CREATE TYPE (enum, composite) / ALTER TYPE / DROP TYPE | Yes |
CREATE FUNCTION / DROP FUNCTION | Yes |
CREATE TRIGGER / DROP TRIGGER | Yes |
CREATE COLLATION / DROP COLLATION | Yes |
CREATE DATABASE / DROP DATABASE / ALTER DATABASE | Yes |
CREATE EXTENSION / DROP EXTENSION | Yes |
SQL Reference: DML
Data Manipulation Language statements for inserting, updating, and deleting data.
INSERT
-- Single row
INSERT INTO users (name, email) VALUES ('Alice', 'alice@example.com');
-- Multi-row
INSERT INTO users (name, email) VALUES
('Alice', 'alice@example.com'),
('Bob', 'bob@example.com');
-- INSERT ... SELECT
INSERT INTO archive SELECT * FROM users WHERE active = false;
-- With RETURNING
INSERT INTO users (name) VALUES ('Alice') RETURNING id, name;
INSERT ON CONFLICT (Upsert)
-- Skip conflicting rows
INSERT INTO users (email, name) VALUES ('alice@example.com', 'Alice')
ON CONFLICT (email) DO NOTHING;
-- Upsert by column
INSERT INTO users (email, name) VALUES ('alice@example.com', 'Alice Updated')
ON CONFLICT (email) DO UPDATE SET name = excluded.name;
-- Upsert by constraint name
INSERT INTO users (email, name) VALUES ('alice@example.com', 'Alice')
ON CONFLICT ON CONSTRAINT users_email_key DO UPDATE SET name = excluded.name;
The excluded pseudo-table is available in DO UPDATE SET clauses, referencing the row that was proposed for insertion.
UPDATE
UPDATE users SET name = 'Bob' WHERE id = 1;
UPDATE users SET name = 'Bob' WHERE id = 1 RETURNING *;
DELETE
DELETE FROM users WHERE id = 1;
DELETE FROM users WHERE id = 1 RETURNING *;
Foreign Key Referential Actions
Supported actions for ON DELETE and ON UPDATE: CASCADE, SET NULL, SET DEFAULT, RESTRICT, NO ACTION. Recursive cascade operations include cycle detection. MATCH SIMPLE semantics (PostgreSQL default): if any referencing column is NULL, the FK check is skipped.
SQL Reference: Queries
SELECT query features, JOINs, window functions, CTEs, subqueries, and set operations.
JOINs
| Join Type | Supported |
|---|---|
INNER JOIN | Yes |
LEFT [OUTER] JOIN | Yes |
RIGHT [OUTER] JOIN | Yes |
FULL [OUTER] JOIN | Yes |
CROSS JOIN | Yes |
Semi join (via EXISTS) | Yes |
Anti join (via NOT EXISTS) | Yes |
Both hash join and nested-loop join implementations are available. The optimizer uses cost-based join reordering (DPccp algorithm).
Aggregation
GROUP BYwith hash-based aggregationHAVINGclauseDISTINCTaggregates:COUNT(DISTINCT col)FILTERclause:COUNT(*) FILTER (WHERE x > 0)GROUPING()function
Window Functions
SELECT
name,
department,
salary,
ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS rank
FROM employees;
Supported ranking functions: ROW_NUMBER, RANK, DENSE_RANK, NTILE, PERCENT_RANK, CUME_DIST
Supported access functions: LAG, LEAD, FIRST_VALUE, LAST_VALUE, NTH_VALUE
All standard aggregate functions can also be used as window functions.
Common Table Expressions (CTEs)
-- Standard CTE
WITH active_users AS (
SELECT * FROM users WHERE active = true
)
SELECT * FROM active_users;
-- Recursive CTE
WITH RECURSIVE tree AS (
SELECT id, parent_id, name, 1 AS depth
FROM categories WHERE parent_id IS NULL
UNION ALL
SELECT c.id, c.parent_id, c.name, t.depth + 1
FROM categories c JOIN tree t ON c.parent_id = t.id
)
SELECT * FROM tree;
Subqueries
- Scalar subqueries
EXISTS/NOT EXISTSIN/NOT INwith subqueryANY/ALLwith subquery- Array subqueries
- Correlated subqueries (with subquery decorrelation optimization)
Set Operations
| Operation | Supported |
|---|---|
UNION / UNION ALL | Yes |
INTERSECT / INTERSECT ALL | Yes |
EXCEPT / EXCEPT ALL | Yes |
Other Query Features
SELECT DISTINCTandSELECT DISTINCT ON (expr)ORDER BYwithASC/DESCandNULLS FIRST/NULLS LASTLIMITandOFFSETEXPLAINandEXPLAIN ANALYZEfor query plansPREPARE name AS .../EXECUTE name (params)/DEALLOCATE namegenerate_series()table functionSHOW TABLES
SQL Reference: Transactions
Transaction control, savepoints, isolation semantics, and autocommit behavior.
Transaction Control
BEGIN;
-- ... your SQL statements ...
COMMIT; -- persist all changes atomically
ROLLBACK; -- discard all changes
Autocommit
Single statements outside an explicit BEGIN block run in implicit autocommit mode. The system begins a transaction, executes the statement, and commits (or rolls back on error). Retryable TiKV errors trigger automatic retry with backoff.
Savepoints
BEGIN;
INSERT INTO users (name) VALUES ('Alice');
SAVEPOINT sp1;
INSERT INTO users (name) VALUES ('Bob');
ROLLBACK TO SAVEPOINT sp1; -- Bob's insert is undone
RELEASE SAVEPOINT sp1; -- destroys savepoint, merges state
COMMIT; -- only Alice is committed
Savepoint behavior matches PostgreSQL: duplicate names are allowed (most recent is targeted), RELEASE destroys the named savepoint and all later ones, ROLLBACK TO re-establishes the savepoint for reuse.
Isolation Level
Default transaction mode is pessimistic with Snapshot Isolation provided by TiKV. Each transaction sees a consistent snapshot of the database at its start time.
Sequence Behavior
Sequence advances (nextval) are non-transactional — they survive transaction rollbacks, matching PostgreSQL behavior.
Failed Transaction State
When an error occurs in a transaction, it enters a failed state. Only ROLLBACK, COMMIT, and END are accepted. All other statements return an error. This matches PostgreSQL behavior.
SQL Reference: COPY
Bulk data import and export.
-- Import from file (via client)
COPY table_name FROM STDIN;
-- Import from Parquet (with parquet extension)
COPY table_name FROM '/data/file.parquet' FORMAT parquet;
Text, CSV, and Parquet formats are supported for COPY FROM.
Built-in Functions
Comprehensive reference of all built-in functions, organized by category.
UPPER, LOWER, LENGTH, CONCAT, CONCAT_WS, LEFT, RIGHT, TRIM, BTRIM, LTRIM, RTRIM, LPAD, RPAD, REPEAT, REPLACE, REVERSE, INITCAP, ASCII, CHR, STRPOS, SPLIT_PART, TRANSLATE, QUOTE_IDENT, QUOTE_LITERAL, QUOTE_NULLABLE, SUBSTRING, OVERLAY, POSITION, FORMATABS, CEIL/CEILING, FLOOR, ROUND, TRUNC, SQRT, CBRT, POWER, EXP, LN, LOG, MOD, SIGN, PI, DEGREES, RADIANS, SIN, COS, TAN, ASIN, ACOS, ATAN, ATAN2, DIV, WIDTH_BUCKET, HASHTEXTDATE_PART / EXTRACT, DATE_TRUNC, DATE, AGE, TO_CHAR, NOW, CURRENT_TIMESTAMP, CURRENT_DATE, CURRENT_TIME, CLOCK_TIMESTAMP, TO_TIMESTAMP, MAKE_DATE, MAKE_TIME, MAKE_TIMESTAMP, TIMEZONECOUNT, SUM, AVG, MIN, MAX, STRING_AGG, ARRAY_AGG, BOOL_AND, BOOL_OR, EVERY, JSON_AGG, JSONB_AGGROW_NUMBER, RANK, DENSE_RANK, NTILE, PERCENT_RANK, CUME_DIST, LAG, LEAD, FIRST_VALUE, LAST_VALUE, NTH_VALUEAll aggregate functions can also be used as window functions.
JSONB_BUILD_OBJECT, JSONB_BUILD_ARRAY, JSONB_TYPEOF, JSONB_ARRAY_LENGTH, JSONB_EXISTS, JSONB_EXISTS_ANY, JSONB_EXISTS_ALL, JSONB_OBJECT_KEYS, JSONB_EXTRACT_PATH, JSONB_EXTRACT_PATH_TEXT, JSONB_PRETTY, JSONB_SET, JSONB_ARRAY_ELEMENTS, JSONB_ARRAY_ELEMENTS_TEXT, JSONB_EACH, JSONB_EACH_TEXT, TO_JSON, TO_JSONB, ROW_TO_JSON, JSON_BUILD_OBJECT, JSON_BUILD_ARRAY| Operator | Description | Example |
|---|---|---|
-> | Get JSON element by key/index | data->'name' |
->> | Get JSON element as text | data->>'name' |
#> | Get element by path | data#>'{addr,city}' |
#>> | Get element by path as text | data#>>'{addr,city}' |
#- | Delete key/path | data #- '{addr}' |
@> | Contains | data @> '{"a":1}' |
<@ | Contained by | '{"a":1}' <@ data |
? | Key exists | data ? 'email' |
?| | Any key exists | data ?| array['a','b'] |
?& | All keys exist | data ?& array['a','b'] |
ARRAY_LENGTH, ARRAY_UPPER, ARRAY_LOWER, CARDINALITY, ARRAY_POSITION, ARRAY_CAT, ARRAY_APPEND, ARRAY_PREPEND, ARRAY_REMOVE, ARRAY_TO_STRING, STRING_TO_ARRAY, UNNESTREGEXP_REPLACE, REGEXP_MATCHES, REGEXP_MATCH, REGEXP_SPLIT_TO_ARRAY, REGEXP_SPLIT_TO_TABLE, REGEXP_COUNT, REGEXP_LIKE, REGEXP_SUBSTR, REGEXP_INSTR| Operator | Description |
|---|---|
~ | Matches regex (case-sensitive) |
~* | Matches regex (case-insensitive) |
!~ | Does not match (case-sensitive) |
!~* | Does not match (case-insensitive) |
TO_TSVECTOR, TO_TSQUERY, PLAINTO_TSQUERYOperator: @@ (text search match)
GEN_RANDOM_UUID, UUID_GENERATE_V4, UUIDV7ENCODE, DECODE, MD5, SHA256, DIGESTL2_DISTANCE, INNER_PRODUCT, COSINE_DISTANCE, VECTOR_DIMS, VECTOR_NORMOperators: <-> (L2 distance), <=> (cosine distance), <#> (negative inner product)
NEXTVAL, CURRVAL, SETVALCOALESCE, NULLIF, GREATEST, LEAST, CASE WHEN ... THEN ... ELSE ... ENDPG_TYPEOF, VERSION, CURRENT_USER, CURRENT_DATABASE, CURRENT_SCHEMA, FORMAT_TYPE, PG_GET_EXPR, HAS_TABLE_PRIVILEGE, HAS_SCHEMA_PRIVILEGE, HAS_DATABASE_PRIVILEGE, PG_TOTAL_RELATION_SIZE, PG_RELATION_SIZE, PG_TABLE_SIZE, PG_SIZE_PRETTY, PG_ENCODING_TO_CHAR, PG_COLUMN_SIZE, OBJ_DESCRIPTION, COL_DESCRIPTION, PG_BACKEND_PID, GENERATE_SERIESPG_BACKGROUND_LAUNCH, PG_BACKGROUND_RESULTLaunch SQL statements asynchronously in background sessions.
Extensions
Available extensions that add capabilities to your database. Install with CREATE EXTENSION name;
| Extension | Description |
|---|---|
http | HTTP client functions for making web requests from SQL. Provides http_get(), http_post(), http_put(), http_delete(), http_head(), http_patch(). Returns status, content_type, headers (JSONB), and content. SSRF protection built in. Requires superuser. |
fs9 | Embedded filesystem stored in the database. Read, write, and manage files via SQL. Supports CSV, TSV, JSONL, Parquet, and plain text. Table functions and scalar functions (fs9_read, fs9_write, fs9_exists, fs9_size, fs9_mtime, fs9_remove). |
pg_cron | pg_cron-compatible scheduled job management. Schedule recurring SQL with cron expressions. Exposes cron.job, cron.job_run_details, and cron.running_jobs virtual tables. |
parquet | Parquet file import. Read Parquet files from HTTP URLs or the embedded filesystem via read_parquet() and COPY FROM ... FORMAT parquet. |
uuid-ossp | UUID generation functions. uuid_generate_v4() is also available built-in without the extension. |
hstore | Key-value type. Metadata-only for client compatibility. |
See the SDK Docs for detailed usage examples of the HTTP, fs9, and vector search extensions.
System Catalog
System views for introspecting database objects.
Detailed documentation for
pg_catalog and information_schema views is in progress.
Authentication & Roles
Connection format, TLS, and role-based access control.
Connecting
Connect using the standard PostgreSQL connection string format:
psql "postgresql://<db_id>.<user>:<password>@pg.staging.db9.ai:5433/postgres"
All connections use TLS (sslmode=require).
RBAC
CREATE ROLE/ALTER ROLEGRANT/REVOKEprivileges on tables and schemasSET ROLEfor session role switching- Privilege checks enforced on every table access
Detailed RBAC documentation with permission matrices and examples is in progress.
Advanced SQL
PL/pgSQL, sequences, triggers, and other advanced features.
Documentation for PL/pgSQL functions, sequences, triggers, custom types, and collations is in progress.
Session Parameters
Session-level configuration with SET/SHOW.
SET variable = value; -- session-scoped
SET LOCAL variable = value; -- transaction-scoped
SHOW variable;
RESET variable;
-- Functions
SELECT current_setting('variable_name');
SELECT set_config('variable_name', 'value', false);
Statement timeout enforcement is supported.
Complete list of supported GUC (Grand Unified Configuration) parameters is in progress.
Limits & Constraints
Resource limits and constraints across the db9 engine and extensions.
Engine Limits
| Resource | Limit |
|---|---|
| Statement timeout | 60s (default) |
| Idle transaction timeout | 60s (default) |
| View nesting depth | 64 levels |
| Pending portals per connection | 32 |
| Timestamp precision | Milliseconds (not microseconds) |
| Encoding | UTF-8 only |
HTTP Extension Limits
| Resource | Limit |
|---|---|
| Connect timeout | 1s |
| Total timeout | 5s |
| Max request body | 256 KiB |
| Max response body | 1 MiB |
| Max calls per statement | 100 |
| Max concurrent per tenant | 20 |
fs9 Extension Limits
| Resource | Limit |
|---|---|
| Max file size | 10 MB |
| Max glob total size | 100 MB |
| Max file traversal | 10,000 files |
| Max directory recursion | 10 levels |
pg_cron Limits
| Resource | Limit |
|---|---|
| Job execution timeout | 30 minutes (default) |
Unsupported PostgreSQL Features
SMALLINT/INT2type — useINTEGERREAL/FLOAT4type — useDOUBLE PRECISIONCHAR(n)fixed-length type — useVARCHAR(n)LISTEN/NOTIFYasync notifications- Microsecond timestamp precision (millisecond only)