createIndex
Type Definition
export function createIndex( index_class: IndexClass, contents: unknown[], opt: IndexOpt = {},): StaticSeekIndex | StaticSeekError;
Parameters
-
index_class
: Specifies the search algorithm implementation-
LinearIndex (Default)
- Best for small to medium-sized content
- Simple and reliable
- Good balance of performance and accuracy
-
GPULinearIndex
- WebGPU-accelerated fuzzy search
- 2-10x faster for larger datasets
- Gracefully falls back to LinearIndex when WebGPU is unavailable
-
HybridTrieBigramInvertedIndex
- ~100x faster search performance
- Ideal for larger datasets
- Trade-offs:
- Slower index generation
- Higher false positive rate for CJK-like languages
- Less precise fuzzy search for CJK-like languages
- Limited result metadata
-
-
contents
: Array of JavaScript objects to be indexed- Supports string fields and string array fields
- Nested arrays containing strings are excluded from search
-
opt
: Configuration options for indexing and searching (optional)export type Path = string;export type IndexOpt = {key_fields?: Path[];search_targets?: Path[];distance?: number;weights?: [Path, number][];};key_fields
: Fields to include in search resultssearch_targets
: Fields to index for searchingdistance
: Default edit distance for fuzzy searchweights
: Field-specific weights for ranking
Return Value
The function returns either a StaticSeekIndex
object or StaticSeekError
if validation fails.
Configuring Indexing
When creating an index, you can control indexing and search behavior by specifying the opt
parameter. Here’s an example data structure:
const array_of_articles = [ { slug: "introduction-to-js", content: "JavaScript is a versatile programming language widely used for web development...", data: { title: "Introduction to JavaScript", description: "Learn the basics of JavaScript, a powerful language for modern web applications.", tags: ["javascript", "web", "programming"] } }, // ...];
key_fields
To include specific fields in search results, use the key_fields
option:
const index = createIndex(LinearIndex, array_of_articles, { key_fields: ['slug', 'data.title']});
search_targets
You can limit which fields are indexed using the search_targets
option:
const index = createIndex(LinearIndex, array_of_articles, { search_targets: ['data.title', 'data.description', 'data.tags']});
distance
Use the distance
option to specify the default edit distance for fuzzy searches. If the distance
option is not specified, an edit distance of 1 (allowing up to one character mismatch) is used.
const index = createIndex(LinearIndex, array_of_articles, { distance: 2 // Set default edit distance for all searches});
If you set the edit distance to 0, fuzzy search is disabled and exact match is performed instead.
weights
To assign different weights to specific fields for ranking, use the weights
option:
const index = createIndex(LinearIndex, array_of_articles, { weights: [ ['data.title', 2], // Boost title field ['data.description', 0.5] // Lower weight for description field ]});
The default weight is 1. Higher weights increase the relevance of the field in search results.
How to specify fields
There are two ways to specify fields:
- Full path: Use the full path to the field, starting from the root object:
key_fields: ['slug', 'data.title'],search_targets: ['data.title', 'data.description', 'data.tags'],weight: [['data.title', 2]]
- Field name only: If the field is a leaf, you can specify the field name directly:
key_fields: ['slug', 'title'],search_targets: ['title', 'description', 'tags'],weight: [['title', 2]]
You can specify a intermediate node to select all leaf of the intermediate node.
Selecting Another Index
The full-text search functionality provided by LinearIndex
is sufficient for most static site use cases. However, if you require even faster search performance, alternative indexing methods can be utilized to optimize speed and efficiency.
GPU Linear Index
import { GPULinearIndex, createIndex, search, StaticSeekError } from "staticseek";
const index = createIndex(GPULinearIndex, array_of_articles);
By leveraging GPULinearIndex
, fuzzy searches can be offloaded to the GPU, significantly improving performance. This method can achieve several times the speed of LinearIndex
. The usage remains identical to LinearIndex
, making it easy to switch between implementations.
If a GPU is not available in the execution environment, GPULinearIndex
will automatically fall back to LinearIndex
, ensuring compatibility across different devices.
Hybrid Trie Bigram Inverted Index
import { HybridTrieBigramInvertedIndex, createIndex, search, StaticSeekError } from "staticseek";
const index = createIndex(HybridTrieBigramInvertedIndex, array_of_articles);
The HybridTrieBigramInvertedIndex
offers a 10x - 100x search speed improvement compared to LinearIndex
. The API usage remains the same, making integration seamless.
However, this increased speed comes at a cost, introducing several trade-offs:
- Longer Indexing Time: Index creation is slower, taking approximately 2 seconds for 100 articles.
- Higher Search Noise for CJK Language: False positives (irrelevant results appearing in search results) become more frequent in languages such as Chinese, Japanese, and Korean (CJK).
- Reduced Accuracy for CJK Languages: Fuzzy searches in CJK may produce noisier results, matching unintended terms.
- Limited Result Metadata: Some search result details, such as exact match position (
pos
) and surrounding text (wordaround
), are unavailable. - Incomplete TF-IDF Scoring: Currently, only term frequency (TF) is calculated and weights are not refrected, leading to less refined ranking.
Despite these drawbacks, HybridTrieBigramInvertedIndex
ensures fast search performance across all devices, delivering a smooth user experience. If prioritizing responsiveness is critical, this index type is a good choice.