Skip to content

Configuration

This guide describes how to configure pg_semantic_cache for your use case, including vector dimensions, index types, and cache behavior.

Start Simple

When configuring semantic caching, begin with simple defaults such as 1536 dimensions, IVFFlat index, and 0.95 threshold, and adjust your system based on monitoring.

Test Before Production

Always test configuration changes in development before applying to production!

Vector Dimensions

The extension supports configurable embedding dimensions to match your chosen embedding model. The pg_semantic_cache extension supports the following dimensions and associated models:

Dimension Common Models
768 BERT, Sentence Transformers (base)
1024 Sentence Transformers (large)
1536 OpenAI ada-002, text-embedding-ada-002
3072 OpenAI text-embedding-3-large
Custom Any dimension supported by your model

Setting Dimensions

Rebuild Required

Changing dimensions requires rebuilding the index, which clears all cached data.

In the following example, the set_vector_dimension function changes the vector dimension to 768, and the rebuild_index function applies the change:

SELECT semantic_cache.set_vector_dimension(768);
SELECT semantic_cache.rebuild_index();
SELECT semantic_cache.get_vector_dimension();

Initial Setup For Custom Dimensions

If you know your embedding model before installation, configure the dimensions immediately after creating the extension.

In the following example, the dimensions are set to 768 right after creating the extension:

CREATE EXTENSION pg_semantic_cache;
SELECT semantic_cache.set_vector_dimension(768);
SELECT semantic_cache.rebuild_index();

Vector Index Types

Choose between IVFFlat for fast approximate searches or HNSW for accurate searches with slower build times.

IVFFlat Index (Default)

The IVFFlat index is best for most use cases and provides fast lookups with good recall.

The index provides:

  • very fast lookups (typically under 5ms).
  • fast build times.
  • excellent recall (95% or higher).
  • moderate memory usage.

This index is best for production caches with frequent updates.

In the following example, the set_index_type function sets the index type to IVFFlat:

SELECT semantic_cache.set_index_type('ivfflat');
SELECT semantic_cache.rebuild_index();

In the following example, the IVFFlat index is configured with 1000 lists for caches with 100K to 1M entries:

DROP INDEX IF EXISTS semantic_cache.idx_cache_entries_embedding;
CREATE INDEX idx_cache_entries_embedding
ON semantic_cache.cache_entries
USING ivfflat (query_embedding vector_cosine_ops)
WITH (lists = 1000);

HNSW Index

The HNSW index is more accurate but slower to build and requires pgvector 0.5.0 or later.

Characteristics include the following:

  • Lookup Speed is fast at 1-3ms typically.
  • Build Time is slower.
  • Recall is excellent at 98% or higher.
  • Memory usage is higher.
  • Best For read-heavy caches with infrequent updates.

In the following example, the set_index_type function sets the index type to HNSW:

SELECT semantic_cache.set_index_type('hnsw');
SELECT semantic_cache.rebuild_index();

In the following example, the HNSW index is configured with m=16 and ef_construction=64 for optimal performance:

DROP INDEX IF EXISTS semantic_cache.idx_cache_entries_embedding;
CREATE INDEX idx_cache_entries_embedding
ON semantic_cache.cache_entries
USING hnsw (query_embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);

Index Comparison

The following table compares the performance characteristics of IVFFlat and HNSW indexes:

Feature IVFFlat HNSW
Speed Very Fast Fast
Accuracy Good Excellent
Build Time Very Fast Slow
Memory Moderate High
Updates Fast Slower

Cache Configuration

The extension stores configuration details in the semantic_cache.cache_config table.

View Current Configuration

Use the following command to view the current configuration:

SELECT * FROM semantic_cache.cache_config ORDER BY key;

Key Configuration Parameters

Use the following configuration parameters to control cache settings.

max_cache_size_mb

Use max_cache_size_mb to specify the maximum cache size in megabytes before auto-eviction triggers.

In the following example, the maximum cache size is set to 2GB:

UPDATE semantic_cache.cache_config
SET value = '2000'
WHERE key = 'max_cache_size_mb';

default_ttl_seconds

Use default_ttl_seconds to specify the default time-to-live for cached entries, which can be overridden per query.

In the following example, the default TTL is set to 2 hours:

UPDATE semantic_cache.cache_config
SET value = '7200'
WHERE key = 'default_ttl_seconds';

eviction_policy

Use eviction_policy to specify the automatic eviction strategy when the cache size limit is reached.

In the following example, the eviction policy is set to LRU:

UPDATE semantic_cache.cache_config
SET value = 'lru'
WHERE key = 'eviction_policy';

Eviction policies include the following options:

  • The lru policy evicts the least recently used entries.
  • The lfu policy evicts the least frequently used entries.
  • The ttl policy evicts entries closest to expiration.

similarity_threshold

Use similarity_threshold to specify the default similarity threshold for cache hits, with values from 0.0 to 1.0.

In the following example, the similarity threshold is set to 0.98 for more strict matching:

UPDATE semantic_cache.cache_config
SET value = '0.98'
WHERE key = 'similarity_threshold';

In the following example, the similarity threshold is set to 0.90 for more lenient matching:

UPDATE semantic_cache.cache_config
SET value = '0.90'
WHERE key = 'similarity_threshold';

Production Configurations

The following sections detail configuration settings useful in a production environment.

High-Throughput Configuration

Use the following configuration options for applications with thousands of queries per second.

In the following example, the cache is configured for high throughput with IVFFlat index, large cache size, LRU eviction, and short TTL:

SELECT semantic_cache.set_index_type('ivfflat');
SELECT semantic_cache.rebuild_index();

UPDATE semantic_cache.cache_config SET value = '5000'
WHERE key = 'max_cache_size_mb';

UPDATE semantic_cache.cache_config SET value = 'lru'
WHERE key = 'eviction_policy';

UPDATE semantic_cache.cache_config SET value = '1800'
WHERE key = 'default_ttl_seconds';

In the following example, PostgreSQL is configured with settings optimized for high throughput:

shared_buffers = 8GB
effective_cache_size = 24GB
work_mem = 512MB
maintenance_work_mem = 2GB

High-Accuracy Configuration

Use the following configuration for applications requiring maximum precision.

In the following example, the cache is configured for high accuracy with HNSW index, strict similarity threshold, and longer TTL:

SELECT semantic_cache.set_index_type('hnsw');
SELECT semantic_cache.rebuild_index();

UPDATE semantic_cache.cache_config SET value = '0.98'
WHERE key = 'similarity_threshold';

UPDATE semantic_cache.cache_config SET value = '14400'
WHERE key = 'default_ttl_seconds';

LLM/AI Application Configuration

Use the following configuration settings to optimize caching for expensive AI API calls.

In the following example, the cache is configured for LLM applications with OpenAI ada-002 dimensions, balanced threshold, long TTL, and large cache size:

SELECT semantic_cache.set_vector_dimension(1536);
SELECT semantic_cache.rebuild_index();

UPDATE semantic_cache.cache_config SET value = '0.93'
WHERE key = 'similarity_threshold';

UPDATE semantic_cache.cache_config SET value = '7200'
WHERE key = 'default_ttl_seconds';

UPDATE semantic_cache.cache_config SET value = '10000'
WHERE key = 'max_cache_size_mb';

Analytics Query Configuration

The following configuration is well-suited for caching expensive analytical queries.

In the following example, the cache is configured for analytics with standard dimensions, moderate threshold, short TTL, and LFU policy:

SELECT semantic_cache.set_vector_dimension(768);
SELECT semantic_cache.rebuild_index();

UPDATE semantic_cache.cache_config SET value = '0.90'
WHERE key = 'similarity_threshold';

UPDATE semantic_cache.cache_config SET value = '900'
WHERE key = 'default_ttl_seconds';

UPDATE semantic_cache.cache_config SET value = 'lfu'
WHERE key = 'eviction_policy';

Monitoring Configuration Impact

You can use system queries to optimize cache usage.

Check Index Performance

Use the following query to view index usage statistics:

-- View index usage
SELECT
    schemaname,
    tablename,
    indexname,
    idx_scan,
    idx_tup_read,
    idx_tup_fetch
FROM pg_stat_user_indexes
WHERE schemaname = 'semantic_cache';

Measure Lookup Times

Use the following commands to measure lookup performance.

In the following example, the \timing command enables timing before testing lookup performance:

\timing on
SELECT * FROM semantic_cache.get_cached_result(
    '[0.1, 0.2, ...]'::text,
    0.95
);

Target performance is less than 5ms for most queries.

Cache Hit Rate

Use the following query to monitor cache hit rate.

In the following example, the cache_stats function monitors the cache hit rate:

SELECT * FROM semantic_cache.cache_stats();

Target hit rate is greater than 70% for effective caching.

Tuning Checklist

Follow this checklist when tuning your cache configuration:

  • Choose a dimension matching your embedding model.
  • Select an index type based on workload, using IVFFlat for most cases.
  • Set a similarity threshold based on accuracy requirements.
  • Configure cache size based on available memory.
  • Choose an eviction policy matching access patterns.
  • Set TTL based on data freshness requirements.
  • Monitor hit rate and adjust as needed.

Common Mistakes

The following common mistakes have simple remediations.

Using Wrong Dimensions

If the extension is configured for 1536 dimensions but you send 768 dimension vectors, the result is an error or poor performance.

You should use matching model dimensions.

In the following example, the vector dimension is set to match the model:

SELECT semantic_cache.set_vector_dimension(768);
SELECT semantic_cache.rebuild_index();

Too Strict Threshold

If the similarity threshold is set too high at 0.99, the result is a very low hit rate.

Use a more balanced threshold.

In the following example, the threshold is set to 0.93 to allow reasonable variation:

UPDATE semantic_cache.cache_config SET value = '0.93'
WHERE key = 'similarity_threshold';

Forgetting To Rebuild

If you set the vector dimension but forget to rebuild the index, the old index is still in use. You should rebuild your cache to use the new index.

In the following example, the index is rebuilt after changing the dimension:

SELECT semantic_cache.set_vector_dimension(768);
SELECT semantic_cache.rebuild_index();