Version:

pg_semantic_cache

Welcome to pg_semantic_cache

Semantic query result caching for PostgreSQL using vector embeddings - making expensive queries fast through intelligent reuse.

Overview

pg_semantic_cache is a PostgreSQL extension that implements semantic query result caching using vector embeddings. Unlike traditional query caching that relies on exact string matching, pg_semantic_cache understands the meaning of queries through vector similarity, enabling cache hits even when queries are phrased differently.

This extension is particularly valuable for:

AI/LLM Applications: Cache expensive LLM API calls and RAG (Retrieval Augmented Generation) results
Analytics Workloads: Reuse results from complex analytical queries with similar parameters
External API Queries: Cache results from expensive external data sources
Database Query Optimization: Reduce load on expensive database operations

Key Features

Semantic Matching: Uses pgvector for similarity-based cache lookups
Flexible TTL: Per-entry time-to-live configuration
Tag-Based Management: Organize and invalidate cache entries by tags
Multiple Eviction Policies: LRU, LFU, and TTL-based automatic eviction
Cost Tracking: Monitor and report on query cost savings
Configurable Dimensions: Support for various embedding models (768, 1536, 3072+ dimensions)
Multiple Index Types: IVFFlat (fast) or HNSW (accurate) vector indexes
Comprehensive Monitoring: Built-in statistics, views, and health metrics

How It Works

graph LR
    A[Query] --> B[Generate Embedding]
    B --> C{Cache Lookup}
    C -->|Hit| D[Return Cached Result]
    C -->|Miss| E[Execute Query]
    E --> F[Store Result + Embedding]
    F --> G[Return Result]

Generate Embedding: Convert your query text into a vector embedding using your preferred model (OpenAI, Cohere, etc.)
Check Cache: Search for semantically similar cached queries using cosine similarity
Cache Hit: If a similar query exists above the similarity threshold, return the cached result
Cache Miss: Execute the actual query, cache the result with its embedding for future use
Automatic Maintenance: Expired entries are evicted based on TTL and configured policies

Quick Start

Prerequisites

PostgreSQL 14, 15, 16, 17, or 18
pgvector extension installed
C compiler (gcc or clang)
PostgreSQL development headers

Installation

# Clone the repository
git clone https://github.com/pgedge/pg_semantic_cache.git
cd pg_semantic_cache

# Build and install
make clean
make
sudo make install

Setup

-- Install required extensions
CREATE EXTENSION IF NOT EXISTS vector;
CREATE EXTENSION IF NOT EXISTS pg_semantic_cache;

-- Verify installation
SELECT * FROM semantic_cache.cache_health;

Simple Example

-- Cache a query result with its embedding
SELECT semantic_cache.cache_query(
    query_text := 'What was our Q4 2024 revenue?',
    query_embedding := '[0.123, 0.456, ...]'::text,  -- From your embedding model
    result_data := '{"answer": "Q4 2024 revenue was $2.4M"}'::jsonb,
    ttl_seconds := 1800,  -- 30 minutes
    tags := ARRAY['llm', 'revenue']
);

-- Retrieve with a semantically similar query
SELECT * FROM semantic_cache.get_cached_result(
    query_embedding := '[0.124, 0.455, ...]'::text,  -- Slightly different query
    similarity_threshold := 0.95  -- 95% similarity required
);

Why Use pg_semantic_cache?

Traditional Caching vs Semantic Caching

Traditional Cache	Semantic Cache
"What was Q4 revenue?" ❌ Miss	"What was Q4 revenue?" ✅ Hit
"Show Q4 revenue" ❌ Miss	"Show Q4 revenue" ✅ Hit
"Q4 revenue please" ❌ Miss	"Q4 revenue please" ✅ Hit

Cost Savings Example

For an LLM application making 10,000 queries per day: - Without caching: $200/day (at $0.02 per query) - With 80% cache hit rate: $40/day - Savings: $160/day or $58,400/year

Architecture

pg_semantic_cache is implemented in pure C using the PostgreSQL extension API (PGXS), providing:

Small Binary Size: ~100KB vs 2-5MB for Rust-based extensions
Fast Build Times: 10-30 seconds vs 2-5 minutes
Immediate Compatibility: Works with new PostgreSQL versions immediately
Standard Packaging: Compatible with all PostgreSQL packaging tools

Performance

Lookup Time: < 5ms for most queries with IVFFlat index
Scalability: Handles 100K+ cached entries efficiently
Throughput: Thousands of cache lookups per second
Storage: Configurable cache size limits with automatic eviction

Getting Help

Documentation: Browse the sections in the navigation menu
Issues: Report bugs at GitHub Issues
Examples: See Use Cases for practical implementations
FAQ: Check the FAQ for common questions

Next Steps

Installation Guide - Detailed installation instructions
Configuration - Configure dimensions, indexes, and policies
Functions Reference - Complete function documentation
Use Cases - Practical examples and integration patterns
Monitoring - Track performance and optimize cache usage

Pro Tip

Start with the default IVFFlat index and 1536 dimensions (OpenAI ada-002). You can always reconfigure later with set_vector_dimension() and rebuild_index().