Version:

Frequently Asked Questions

General Questions

What is semantic caching?

Semantic caching uses vector embeddings to understand the meaning of queries, not just exact text matching. When you search for "What was Q4 revenue?", the cache can return results for semantically similar queries like "Show Q4 revenue" or "Q4 revenue please" even though the exact text is different.

Traditional caching requires exact string matches, while semantic caching matches based on similarity scores (typically 90-98%).

Why use pg_semantic_cache instead of a traditional cache like Redis?

Use pg_semantic_cache when: - Queries are phrased differently but mean the same thing (LLM applications, natural language queries) - You need semantic understanding of query similarity - You're already using PostgreSQL and want tight integration - You need persistent caching with complex querying capabilities

Use traditional caching (Redis, Memcached) when: - You need exact key-value matching - Sub-millisecond latency is critical - Queries are deterministic and rarely vary - You need distributed caching across multiple services

Use both: pg_semantic_cache for semantic matching + Redis for hot-path exact matches!

How does it compare to application-level caching?

Feature	pg_semantic_cache	Application Cache
Semantic Matching	✅ Yes	❌ No
Database Integration	✅ Native	⚠️ Requires sync
Multi-language	✅ Yes	⚠️ Per-instance
Persistence	✅ Automatic	⚠️ Manual
Vector Operations	✅ Optimized	❌ Not available
Shared Across Apps	✅ Yes	❌ No

Is it production-ready?

Yes! pg_semantic_cache is: - Written in C using stable PostgreSQL APIs - Tested with PostgreSQL 14-18 - Used in production environments - Small, focused codebase (~900 lines) - No complex dependencies (just pgvector)

Installation & Setup

Do I need to install pgvector separately?

Yes, pgvector is a required dependency. Install it before pg_semantic_cache:

# Install pgvector
git clone https://github.com/pgvector/pgvector.git
cd pgvector
make && sudo make install

# Then install pg_semantic_cache
cd ../pg_semantic_cache
make && sudo make install

Can I use it with managed PostgreSQL services?

It depends on the service:

Self-hosted PostgreSQL: ✅ Yes
AWS RDS: ✅ Yes (if you can install extensions)
Azure Database for PostgreSQL: ✅ Yes (flexible server)
Google Cloud SQL: ⚠️ Check extension support
Supabase: ✅ Yes (pgvector supported)
Neon: ✅ Yes (pgvector supported)

Check if your provider supports custom C extensions and pgvector.

What PostgreSQL versions are supported?

PostgreSQL 14, 15, 16, 17, and 18 are fully supported and tested.

How do I upgrade the extension?

-- Drop and recreate (WARNING: clears cache)
DROP EXTENSION pg_semantic_cache CASCADE;
CREATE EXTENSION pg_semantic_cache;

-- Or use ALTER EXTENSION (when upgrade scripts available)
ALTER EXTENSION pg_semantic_cache UPDATE TO '0.4.0';

Performance

How fast are cache lookups?

Target: < 5ms for most queries

Typical Performance: - IVFFlat index: 2-5ms - HNSW index: 1-3ms - Without index: 50-500ms (don't do this!)

Factors affecting speed: - Cache size (more entries = slightly slower) - Vector dimension (1536 vs 3072) - Index type and parameters - PostgreSQL configuration (work_mem)

-- Test your lookup speed
\timing on
SELECT * FROM semantic_cache.get_cached_result('[...]'::text, 0.95);

How much storage does it use?

Storage per entry: - Vector embedding: ~6KB (1536 dimensions) - Result data: Varies (your cached JSONB) - Metadata: ~200 bytes - Total: 6KB + your data size

Example: - 10K entries with 10KB results each = ~160MB - 100K entries with 5KB results each = ~1.1GB

What's the maximum cache size?

There's no hard limit, but practical considerations:

< 100K entries: Excellent performance with default settings
100K - 1M entries: Increase IVFFlat lists parameter
> 1M entries: Consider partitioning or HNSW index

Configure max size:

UPDATE semantic_cache.cache_config
SET value = '5000'  -- 5GB
WHERE key = 'max_cache_size_mb';

Does it work with large result sets?

Yes, but consider: - Large results (> 1MB) consume more storage - Serializing/deserializing large JSONB has overhead - Consider caching aggregated results instead of full datasets

-- Don't cache this:
SELECT * FROM huge_table;  -- 100MB result

-- Cache this instead:
SELECT COUNT(*), AVG(value), summary_stats
FROM huge_table;  -- 1KB result

Embeddings

What embedding models can I use?

Any embedding model that produces fixed-dimension vectors:

Popular Models: - OpenAI text-embedding-ada-002 (1536 dim) - OpenAI text-embedding-3-small (1536 dim) - OpenAI text-embedding-3-large (3072 dim) - Cohere embed-english-v3.0 (1024 dim) - Sentence Transformers all-MiniLM-L6-v2 (384 dim) - Sentence Transformers all-mpnet-base-v2 (768 dim)

Configure dimension:

SELECT semantic_cache.set_vector_dimension(768);
SELECT semantic_cache.rebuild_index();

Do I need to generate embeddings myself?

Yes. pg_semantic_cache stores and searches embeddings, but doesn't generate them.

Typical workflow: 1. Generate embedding using your chosen model/API 2. Pass embedding to cache_query() or get_cached_result() 3. Extension handles similarity search

See Use Cases for integration examples.

Can I change embedding models later?

Yes, but you need to rebuild the cache:

-- Change dimension
SELECT semantic_cache.set_vector_dimension(3072);

-- Rebuild (WARNING: clears all cached data)
SELECT semantic_cache.rebuild_index();

-- Re-cache entries with new embeddings

What similarity threshold should I use?

Recommendations:

0.98-0.99: Nearly identical queries (financial data, strict matching)
0.95-0.97: Very similar queries (recommended starting point)
0.90-0.94: Similar queries (good for exploratory queries)
0.85-0.89: Somewhat related (use with caution)
< 0.85: Too lenient (likely irrelevant results)

Start with 0.95 and adjust based on your hit rate: - Hit rate too low? Lower threshold (0.92) - Getting irrelevant results? Raise threshold (0.97)

Configuration

How do I choose between IVFFlat and HNSW?

Use IVFFlat (default) when: - Cache updates frequently - Build time matters - < 100K entries - Good enough recall (95%+)

Use HNSW when: - Maximum accuracy needed - Cache mostly read-only - Have pgvector 0.5.0+ - Can afford slower builds

-- Switch to HNSW
SELECT semantic_cache.set_index_type('hnsw');
SELECT semantic_cache.rebuild_index();

What TTL should I set?

Depends on data freshness requirements:

-- Real-time data (stock prices, weather)
ttl_seconds := 60  -- 1 minute

-- Dynamic data (user dashboards, reports)
ttl_seconds := 1800  -- 30 minutes

-- Semi-static data (analytics, LLM responses)
ttl_seconds := 7200  -- 2 hours

-- Static data (reference data)
ttl_seconds := NULL  -- Never expires

How often should I run maintenance?

Recommended Schedule:

-- Every 15 minutes: Evict expired entries
SELECT semantic_cache.evict_expired();

-- Every hour: Auto-eviction based on policy
SELECT semantic_cache.auto_evict();

-- Daily: Analyze tables
ANALYZE semantic_cache.cache_entries;

Set up with pg_cron:

SELECT cron.schedule('cache-evict', '*/15 * * * *',
    'SELECT semantic_cache.evict_expired()');

Troubleshooting

Why is my hit rate so low?

Common causes:

Threshold too high

-- Lower from 0.95 to 0.90
SELECT * FROM semantic_cache.get_cached_result('[...]'::text, 0.90);

TTL too short

-- Check average entry lifetime
SELECT AVG(EXTRACT(EPOCH FROM (NOW() - created_at))) / 3600 as avg_age_hours
FROM semantic_cache.cache_entries;

Poor embedding quality
Use better embedding model
Ensure consistent embedding generation

Cache too small

-- Check if entries being evicted too quickly
SELECT * FROM semantic_cache.cache_stats();

Cache lookups are returning no results

Debugging steps:

-- 1. Check cache has entries
SELECT COUNT(*) FROM semantic_cache.cache_entries;

-- 2. Check for expired entries
SELECT COUNT(*) FROM semantic_cache.cache_entries
WHERE expires_at IS NULL OR expires_at > NOW();

-- 3. Try very low threshold
SELECT * FROM semantic_cache.get_cached_result('[...]'::text, 0.70);

-- 4. Check vector dimension
SELECT semantic_cache.get_vector_dimension();

-- 5. Manually check similarity
SELECT
    query_text,
    (1 - (query_embedding <=> '[...]'::vector)) as similarity
FROM semantic_cache.cache_entries
ORDER BY similarity DESC
LIMIT 5;

Extension won't load

ERROR:  could not open extension control file

Solution:

# Check installation
ls -l $(pg_config --sharedir)/extension/pg_semantic_cache*

# Reinstall
cd pg_semantic_cache
sudo make install

# Verify pgvector installed
ls -l $(pg_config --pkglibdir)/vector.so

Build errors

fatal error: postgres.h: No such file or directory

Solution:

# Debian/Ubuntu
sudo apt-get install postgresql-server-dev-17

# RHEL/Rocky
sudo yum install postgresql17-devel

# macOS
brew install postgresql@17
export PATH="/opt/homebrew/opt/postgresql@17/bin:$PATH"

Out of memory errors

ERROR:  out of memory

Solutions:

Increase work_mem
```
SET work_mem = '512MB';
```

Reduce cache size

SELECT semantic_cache.evict_lru(5000);  -- Keep only 5K entries

Lower vector dimension

SELECT semantic_cache.set_vector_dimension(768);  -- Use smaller model
SELECT semantic_cache.rebuild_index();

Best Practices

Should I cache everything?

No! Cache queries that are: - ✅ Expensive (slow execution) - ✅ Frequently repeated (similar queries) - ✅ Tolerant of slight staleness - ✅ Semantically searchable

Don't cache: - ❌ Simple key-value lookups (use Redis) - ❌ Real-time critical data - ❌ Unique, one-off queries - ❌ Queries that must be current

How do I test if caching helps?

-- Measure query time without cache
\timing on
SELECT expensive_query();
-- Time: 450.234 ms

-- With cache (first call - miss)
SELECT * FROM semantic_cache.get_cached_result('[...]'::text, 0.95);
-- Time: 3.456 ms (cache miss) + 450.234 ms (execution)

-- With cache (subsequent calls - hit)
SELECT * FROM semantic_cache.get_cached_result('[...]'::text, 0.95);
-- Time: 3.456 ms (cache hit!)

-- Speedup: 450 / 3.5 = 128x faster!

Should I use tags?

Yes! Tags are useful for: - Organization: Group by feature (ARRAY['dashboard', 'sales']) - Bulk invalidation: invalidate_cache(tag := 'user_123') - Analytics: SELECT * FROM semantic_cache.cache_by_tag - Debugging: Find entries by category

-- Tag everything
SELECT semantic_cache.cache_query(
    query_text,
    embedding,
    result,
    3600,
    ARRAY['app_name', 'feature_name', 'user_id']
);

Still Have Questions?

GitHub Issues: Report bugs or ask questions
Discussions: Community discussions

Frequently Asked Questions

General Questions

What is semantic caching?

Why use pg_semantic_cache instead of a traditional cache like Redis?

How does it compare to application-level caching?

Is it production-ready?

Installation & Setup

Do I need to install pgvector separately?

Can I use it with managed PostgreSQL services?

What PostgreSQL versions are supported?

How do I upgrade the extension?

Performance

How fast are cache lookups?

How much storage does it use?

What's the maximum cache size?

Does it work with large result sets?

Embeddings

What embedding models can I use?

Do I need to generate embeddings myself?

Can I change embedding models later?

What similarity threshold should I use?

Configuration

How do I choose between IVFFlat and HNSW?

What TTL should I set?

How often should I run maintenance?

Troubleshooting

Why is my hit rate so low?

Cache lookups are returning no results

Extension won't load

Build errors

Out of memory errors

Best Practices

Should I cache everything?

How do I test if caching helps?

Should I use tags?

See Also

Still Have Questions?