Architecture
The pg_semantic_cache extension is implemented in pure C using the PostgreSQL extension API (PGXS). This implementation approach provides several benefits for performance and compatibility.
The extension provides the following benefits:
- The small binary size is approximately 100KB versus 2-5MB for Rust versions.
- Fast build times range from 10-30 seconds versus 2-5 minutes for Rust.
- Immediate compatibility works with new PostgreSQL versions immediately.
- Standard packaging is compatible with all PostgreSQL packaging tools.
How It Works
The following diagram illustrates the semantic cache workflow:
graph LR
A[Query] --> B[Generate Embedding]
B --> C{Cache Lookup}
C -->|Hit| D[Return Cached Result]
C -->|Miss| E[Execute Query]
E --> F[Store Result + Embedding]
F --> G[Return Result]
The semantic cache operates through the following workflow:
- The application generates an embedding by converting query text into a vector embedding using a preferred model (OpenAI, Cohere, etc.).
- The extension checks the cache by searching for semantically similar cached queries using cosine similarity.
- On a cache hit, if a similar query exists above the similarity threshold, the extension returns the cached result.
- On a cache miss, the extension executes the actual query and caches the result with the embedding for future use.
- Automatic maintenance evicts expired entries based on TTL and configured policies.
Getting Help
The following resources are available for assistance:
- Browse the documentation for detailed information.
- Report issues at GitHub Issues.
- See Use Cases for practical implementation examples.
- Check the FAQ for answers to common questions.