Version:

Configuration

pg_semantic_cache provides flexible configuration options for vector dimensions, index types, and cache behavior.

Vector Dimensions

The extension supports configurable embedding dimensions to match your chosen embedding model.

Supported Dimensions

Dimension	Common Models
768	BERT, Sentence Transformers (base)
1024	Sentence Transformers (large)
1536	OpenAI ada-002, text-embedding-ada-002
3072	OpenAI text-embedding-3-large
Custom	Any dimension supported by your model

Setting Dimensions

Rebuild Required

Changing dimensions requires rebuilding the index, which clears all cached data.

-- Set vector dimension (default: 1536)
SELECT semantic_cache.set_vector_dimension(768);

-- Rebuild index to apply changes (WARNING: clears cache)
SELECT semantic_cache.rebuild_index();

-- Verify new dimension
SELECT semantic_cache.get_vector_dimension();

Initial Setup for Custom Dimensions

If you know your embedding model before installation:

-- Right after CREATE EXTENSION
CREATE EXTENSION pg_semantic_cache;

-- Immediately configure dimensions
SELECT semantic_cache.set_vector_dimension(768);
SELECT semantic_cache.rebuild_index();

-- Now start caching

Vector Index Types

Choose between IVFFlat (fast, approximate) or HNSW (accurate, slower build).

IVFFlat Index (Default)

Best for most use cases - fast lookups with good recall.

Characteristics: - Lookup Speed: Very fast (< 5ms typical) - Build Time: Fast - Recall: Good (95%+) - Memory: Moderate - Best For: Production caches with frequent updates

-- Set index type
SELECT semantic_cache.set_index_type('ivfflat');
SELECT semantic_cache.rebuild_index();

IVFFlat Parameters (set during init_schema()):

-- Default configuration
lists = 100  -- For < 100K entries

-- For larger caches, increase lists
-- Adjust in the init_schema() function or manually:
DROP INDEX IF EXISTS semantic_cache.idx_cache_entries_embedding;
CREATE INDEX idx_cache_entries_embedding
ON semantic_cache.cache_entries
USING ivfflat (query_embedding vector_cosine_ops)
WITH (lists = 1000);  -- For 100K-1M entries

HNSW Index

More accurate but slower to build - requires pgvector 0.5.0+.

Characteristics: - Lookup Speed: Fast (1-3ms typical) - Build Time: Slower - Recall: Excellent (98%+) - Memory: Higher - Best For: Read-heavy caches with infrequent updates

-- Set index type (requires pgvector 0.5.0+)
SELECT semantic_cache.set_index_type('hnsw');
SELECT semantic_cache.rebuild_index();

HNSW Parameters:

-- Adjust manually for optimal performance
DROP INDEX IF EXISTS semantic_cache.idx_cache_entries_embedding;
CREATE INDEX idx_cache_entries_embedding
ON semantic_cache.cache_entries
USING hnsw (query_embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);

Index Comparison

Feature	IVFFlat	HNSW
Speed	⚡⚡⚡	⚡⚡
Accuracy	✓✓	✓✓✓
Build Time	⚡⚡⚡	⚡
Memory	💾	💾💾
Updates	Fast	Slower

Cache Configuration

The extension stores configuration in the semantic_cache.cache_config table.

View Current Configuration

SELECT * FROM semantic_cache.cache_config ORDER BY key;

Key Configuration Parameters

max_cache_size_mb

Maximum cache size in megabytes before auto-eviction triggers.

-- Set to 2GB
UPDATE semantic_cache.cache_config
SET value = '2000'
WHERE key = 'max_cache_size_mb';

-- Or default: 1000 MB

default_ttl_seconds

Default time-to-live for cached entries (can be overridden per query).

-- Set default to 2 hours
UPDATE semantic_cache.cache_config
SET value = '7200'
WHERE key = 'default_ttl_seconds';

-- Default: 3600 (1 hour)

eviction_policy

Automatic eviction strategy when cache size limit is reached.

-- Options: 'lru', 'lfu', 'ttl'
UPDATE semantic_cache.cache_config
SET value = 'lru'
WHERE key = 'eviction_policy';

Eviction Policies:

lru: Least Recently Used - evicts oldest accessed entries
lfu: Least Frequently Used - evicts least accessed entries
ttl: Time To Live - evicts entries closest to expiration

similarity_threshold

Default similarity threshold for cache hits (0.0 - 1.0).

-- More strict matching (fewer hits, more accurate)
UPDATE semantic_cache.cache_config
SET value = '0.98'
WHERE key = 'similarity_threshold';

-- More lenient matching (more hits, less accurate)
UPDATE semantic_cache.cache_config
SET value = '0.90'
WHERE key = 'similarity_threshold';

-- Default: 0.95 (recommended)

Production Configurations

High-Throughput Configuration

For applications with thousands of queries per second:

-- Use IVFFlat with optimized lists
SELECT semantic_cache.set_index_type('ivfflat');
SELECT semantic_cache.rebuild_index();

-- Increase cache size
UPDATE semantic_cache.cache_config SET value = '5000' WHERE key = 'max_cache_size_mb';

-- Use LRU for fast eviction
UPDATE semantic_cache.cache_config SET value = 'lru' WHERE key = 'eviction_policy';

-- Shorter TTL to keep cache fresh
UPDATE semantic_cache.cache_config SET value = '1800' WHERE key = 'default_ttl_seconds';

PostgreSQL settings:

# postgresql.conf
shared_buffers = 8GB
effective_cache_size = 24GB
work_mem = 512MB
maintenance_work_mem = 2GB

High-Accuracy Configuration

For applications requiring maximum precision:

-- Use HNSW for best recall
SELECT semantic_cache.set_index_type('hnsw');
SELECT semantic_cache.rebuild_index();

-- Strict similarity threshold
UPDATE semantic_cache.cache_config SET value = '0.98' WHERE key = 'similarity_threshold';

-- Longer TTL for stable results
UPDATE semantic_cache.cache_config SET value = '14400' WHERE key = 'default_ttl_seconds';

LLM/AI Application Configuration

Optimized for caching expensive AI API calls:

-- OpenAI ada-002 dimensions
SELECT semantic_cache.set_vector_dimension(1536);
SELECT semantic_cache.rebuild_index();

-- Balance between accuracy and coverage
UPDATE semantic_cache.cache_config SET value = '0.93' WHERE key = 'similarity_threshold';

-- Cache longer (AI responses stable)
UPDATE semantic_cache.cache_config SET value = '7200' WHERE key = 'default_ttl_seconds';

-- Large cache for many queries
UPDATE semantic_cache.cache_config SET value = '10000' WHERE key = 'max_cache_size_mb';

Analytics Query Configuration

For caching expensive analytical queries:

-- Use standard dimensions
SELECT semantic_cache.set_vector_dimension(768);
SELECT semantic_cache.rebuild_index();

-- Moderate similarity (query variations common)
UPDATE semantic_cache.cache_config SET value = '0.90' WHERE key = 'similarity_threshold';

-- Short TTL (data changes frequently)
UPDATE semantic_cache.cache_config SET value = '900' WHERE key = 'default_ttl_seconds';

-- LFU policy (popular queries cached longer)
UPDATE semantic_cache.cache_config SET value = 'lfu' WHERE key = 'eviction_policy';

Monitoring Configuration Impact

Check Index Performance

-- View index usage
SELECT
    schemaname,
    tablename,
    indexname,
    idx_scan,
    idx_tup_read,
    idx_tup_fetch
FROM pg_stat_user_indexes
WHERE schemaname = 'semantic_cache';

Measure Lookup Times

-- Enable timing
\timing on

-- Test lookup
SELECT * FROM semantic_cache.get_cached_result(
    '[0.1, 0.2, ...]'::text,
    0.95
);

Target: < 5ms for most queries

Cache Hit Rate

-- Monitor hit rate with current config
SELECT * FROM semantic_cache.cache_stats();

Target: > 70% for effective caching

Configuration Best Practices

Start Simple

Begin with defaults (1536 dimensions, IVFFlat, 0.95 threshold) and adjust based on monitoring.

Test Before Production

Always test configuration changes in development before applying to production.

Tuning Checklist

[ ] Choose dimension matching your embedding model
[ ] Select index type based on workload (IVFFlat for most cases)
[ ] Set similarity threshold based on accuracy requirements
[ ] Configure cache size based on available memory
[ ] Choose eviction policy matching access patterns
[ ] Set TTL based on data freshness requirements
[ ] Monitor hit rate and adjust as needed

Common Mistakes

❌ Using wrong dimensions

-- Extension configured for 1536, but sending 768-dim vectors
-- Result: Error or poor performance

✓ Match model dimensions

SELECT semantic_cache.set_vector_dimension(768);  -- Match your model
SELECT semantic_cache.rebuild_index();

❌ Too strict threshold

UPDATE semantic_cache.cache_config SET value = '0.99' WHERE key = 'similarity_threshold';
-- Result: Very low hit rate

✓ Balanced threshold

UPDATE semantic_cache.cache_config SET value = '0.93' WHERE key = 'similarity_threshold';
-- Allows reasonable variation

❌ Forgetting to rebuild

SELECT semantic_cache.set_vector_dimension(768);
-- Forgot: SELECT semantic_cache.rebuild_index();
-- Result: Old index still in use!

Next Steps

Functions Reference - Learn about all configuration functions
Monitoring - Track performance and tune configuration
Use Cases - See configuration examples in practice