Full-Text Search
DB9 supports full-text search with language-specific tokenizers.
Available Tokenizers
Section titled “Available Tokenizers”| Tokenizer | Aliases | Description |
|---|---|---|
jieba | chinese, zhparser | Chinese word segmentation (jieba-rs) |
chinese_ngram | zhparser_ngram | Chinese + bigram overlay for multi-char words |
simple | - | Whitespace tokenizer for English/Latin |
GIN Indexes for Full-Text Search
Section titled “GIN Indexes for Full-Text Search”GIN (Generalized Inverted Index) indexes are the standard way to accelerate full-text search queries. Create a GIN index on a tsvector column or expression to avoid sequential scans:
-- Index on a tsvector columnCREATE INDEX idx_tsv ON documents USING GIN (tsv);
-- Index on an expressionCREATE INDEX idx_content_fts ON documentsUSING GIN (to_tsvector('simple', content));The optimizer uses GIN indexes automatically when the @@ operator matches an indexed expression.
Chinese Text Search (jieba)
Section titled “Chinese Text Search (jieba)”-- Create indexCREATE INDEX idx_content_fts ON documentsUSING gin(to_tsvector('jieba', content));
-- SearchSELECT * FROM documentsWHERE to_tsvector('jieba', content) @@ plainto_tsquery('jieba', '关键词');English Text Search
Section titled “English Text Search”-- Create indexCREATE INDEX idx_content_fts ON documentsUSING gin(to_tsvector('simple', content));
-- SearchSELECT * FROM documentsWHERE to_tsvector('simple', content) @@ to_tsquery('simple', 'keyword');Ranking Results
Section titled “Ranking Results”SELECT content, ts_rank(to_tsvector('jieba', content), plainto_tsquery('jieba', '搜索词')) as rankFROM documentsWHERE to_tsvector('jieba', content) @@ plainto_tsquery('jieba', '搜索词')ORDER BY rank DESCLIMIT 10;Query Types
Section titled “Query Types”| Function | Description | Example |
|---|---|---|
plainto_tsquery | Simple phrase | plainto_tsquery('jieba', '人工智能') |
to_tsquery | Boolean operators | to_tsquery('simple', 'cat & dog') |
phraseto_tsquery | Exact phrase | phraseto_tsquery('simple', 'hello world') |
websearch_to_tsquery | Google-style | websearch_to_tsquery('simple', '"exact" -exclude') |
Boolean Search
Section titled “Boolean Search”-- AND: both terms must matchSELECT * FROM documentsWHERE tsv @@ to_tsquery('simple', 'database & performance');
-- OR: either term matchesSELECT * FROM documentsWHERE tsv @@ to_tsquery('simple', 'postgres | mysql');
-- NOT: exclude termSELECT * FROM documentsWHERE tsv @@ to_tsquery('simple', 'database & !oracle');
-- Prefix matchingSELECT * FROM documentsWHERE tsv @@ to_tsquery('simple', 'data:*');Highlight Search Results
Section titled “Highlight Search Results”SELECT ts_headline('jieba', content, plainto_tsquery('jieba', '数据库'), 'StartSel=<b>, StopSel=</b>, MaxWords=50' ) as highlightedFROM documentsWHERE to_tsvector('jieba', content) @@ plainto_tsquery('jieba', '数据库');Search with Weights
Section titled “Search with Weights”-- Prioritize title matches over bodyALTER TABLE articles ADD COLUMN tsv tsvector;UPDATE articles SET tsv = setweight(to_tsvector('jieba', title), 'A') || setweight(to_tsvector('jieba', body), 'B');
-- Search with weighted rankingSELECT title, ts_rank(tsv, q) as rankFROM articles, plainto_tsquery('jieba', '搜索词') qWHERE tsv @@ qORDER BY rank DESC;Limits
Section titled “Limits”| Limit | Value |
|---|---|
Max tsvector value size | 1 MB |
Max lexemes per tsvector | ~264,000 |
| Supported text search configurations | 3 (jieba, chinese_ngram, simple) |
| GIN index build strategy | Sequential (full table scan) |
See Limits and Quotas for the complete list.
Next Steps
Section titled “Next Steps”- RAG with Built-in Embeddings — Combine FTS with vector search for hybrid retrieval
- Vector Search — Embedding and HNSW indexes for semantic search
- Extensions Overview — All 9 built-in extensions
- Limits and Quotas — All operational limits