Monitoring
This guide provides comprehensive information about monitoring and optimizing pg_semantic_cache performance.
Quick Health Check
The following sections describe how to perform a quick health check on your semantic cache.
In the following example, the cache_health view provides an overview
of cache performance metrics:
SELECT * FROM semantic_cache.cache_health;
The query produces output similar to the following example:
total_entries | expired_entries | total_size | avg_access_count | total_hits | total_misses | hit_rate_pct
---------------+-----------------+------------+------------------+------------+--------------+--------------
1543 | 23 | 145 MB | 5.78 | 8921 | 2103 | 80.93
Key Metrics
The following sections describe the key metrics for monitoring cache performance and effectiveness.
Cache Hit Rate
The cache hit rate is the most important metric for measuring cache effectiveness.
In the following example, the query calculates the current hit rate with a rating based on performance thresholds:
-- Get current hit rate
SELECT
total_hits,
total_misses,
(total_hits + total_misses) as total_queries,
hit_rate_percent,
CASE
WHEN hit_rate_percent >= 80 THEN 'Excellent'
WHEN hit_rate_percent >= 60 THEN 'Good'
WHEN hit_rate_percent >= 40 THEN 'Fair'
ELSE 'Poor'
END as rating
FROM semantic_cache.cache_stats();
The following list shows target hit rates for different use cases:
- LLM and AI applications should achieve 70 to 85 percent hit rates.
- Analytics workloads should achieve 60 to 75 percent hit rates.
- API caching should achieve 75 to 90 percent hit rates.
- Real-time data should achieve 40 to 60 percent hit rates.
Cache Size and Growth
The cache size and growth metrics help you monitor storage usage and identify growth trends.
In the following example, the query calculates the current cache size and entry count statistics:
SELECT
COUNT(*) as total_entries,
pg_size_pretty(SUM(result_size_bytes)::BIGINT) as total_size,
pg_size_pretty(AVG(result_size_bytes)::BIGINT) as avg_entry_size,
pg_size_pretty(MAX(result_size_bytes)::BIGINT) as largest_entry,
pg_size_pretty(MIN(result_size_bytes)::BIGINT) as smallest_entry
FROM semantic_cache.cache_entries;
In the following example, the queries create a tracking table and log cache size over time to identify growth trends:
CREATE TABLE IF NOT EXISTS monitoring.cache_size_history (
timestamp TIMESTAMPTZ DEFAULT NOW(),
entry_count BIGINT,
total_bytes BIGINT
);
INSERT INTO monitoring.cache_size_history (entry_count, total_bytes)
SELECT COUNT(*), SUM(result_size_bytes)
FROM semantic_cache.cache_entries;
SELECT
timestamp,
entry_count,
pg_size_pretty(total_bytes) as size,
entry_count - LAG(entry_count) OVER (ORDER BY timestamp)
as entry_delta,
pg_size_pretty((total_bytes - LAG(total_bytes)
OVER (ORDER BY timestamp))::BIGINT) as size_delta
FROM monitoring.cache_size_history
ORDER BY timestamp DESC
LIMIT 20;
Access Patterns
The access pattern metrics help you understand which cache entries are most valuable to your application.
In the following example, the query identifies the most frequently accessed cache entries:
SELECT
id,
LEFT(query_text, 60) as query_preview,
access_count,
pg_size_pretty(result_size_bytes::BIGINT) as size,
created_at,
last_accessed_at,
EXTRACT(EPOCH FROM (NOW() - created_at)) / 3600 as age_hours
FROM semantic_cache.cache_entries
ORDER BY access_count DESC
LIMIT 20;
In the following example, the query groups cache entries by access frequency to show the distribution of cache usage:
SELECT
CASE
WHEN access_count = 0 THEN '0 (Never)'
WHEN access_count BETWEEN 1 AND 5 THEN '1-5 (Low)'
WHEN access_count BETWEEN 6 AND 20 THEN '6-20 (Medium)'
WHEN access_count BETWEEN 21 AND 100 THEN '21-100 (High)'
ELSE '100+ (Very High)'
END as access_range,
COUNT(*) as entry_count,
pg_size_pretty(SUM(result_size_bytes)::BIGINT) as total_size,
ROUND(AVG(access_count), 2) as avg_accesses
FROM semantic_cache.cache_entries
GROUP BY 1
ORDER BY 1;
Entry Age and Freshness
The entry age metrics help you monitor how old cached entries are and identify stale data.
In the following example, the query groups cache entries by age to show the distribution of entry freshness:
SELECT
CASE
WHEN age_minutes < 5 THEN '< 5 min'
WHEN age_minutes < 30 THEN '5-30 min'
WHEN age_minutes < 60 THEN '30-60 min'
WHEN age_minutes < 360 THEN '1-6 hours'
WHEN age_minutes < 1440 THEN '6-24 hours'
ELSE '> 24 hours'
END as age_range,
COUNT(*) as entry_count,
pg_size_pretty(SUM(result_size_bytes)::BIGINT) as total_size
FROM (
SELECT
EXTRACT(EPOCH FROM (NOW() - created_at)) / 60 as age_minutes,
result_size_bytes
FROM semantic_cache.cache_entries
) ages
GROUP BY 1
ORDER BY 1;
Built-in Monitoring Views
The extension provides several built-in views for monitoring cache performance and health.
cache_health
The cache_health view provides real-time cache health metrics.
In the following example, the query retrieves the current cache health status:
SELECT * FROM semantic_cache.cache_health;
The view includes:
- the total entries and expired entries.
- the total cache size in megabytes.
- the average access count per entry.
- hit and miss statistics.
- the hit rate percentage.
recent_cache_activity
The recent_cache_activity view shows the most recently accessed cache
entries.
In the following example, the query retrieves the ten most recently accessed cache entries:
SELECT * FROM semantic_cache.recent_cache_activity LIMIT 10;
The view shows:
- a query preview with the first 80 characters.
- the access count for each entry.
- timestamps for creation, last access, and expiration.
- the result size in bytes.
cache_by_tag
The cache_by_tag view shows cache entries grouped by tag.
In the following example, the query retrieves cache statistics grouped by tag:
SELECT * FROM semantic_cache.cache_by_tag;
The view is useful for:
- understanding cache composition by feature.
- identifying which features use the cache most.
- planning targeted invalidation strategies.
cache_access_summary
The cache_access_summary view provides hourly access statistics with
cost savings information.
In the following example, the query retrieves hourly access statistics for the last 24 hours:
SELECT * FROM semantic_cache.cache_access_summary
ORDER BY hour DESC
LIMIT 24;
cost_savings_daily
The cost_savings_daily view provides a daily breakdown of cost
savings from cache hits.
In the following example, the query retrieves daily cost savings for the last 30 days:
SELECT * FROM semantic_cache.cost_savings_daily
ORDER BY date DESC
LIMIT 30;
top_cached_queries
The top_cached_queries view shows the queries that provide the
greatest cost savings.
In the following example, the query retrieves the ten queries with the highest cost savings:
SELECT * FROM semantic_cache.top_cached_queries
LIMIT 10;
Performance Monitoring
The following sections describe how to monitor cache performance and optimize query execution.
Query Performance
The query performance metrics help you track how fast cache lookups execute.
In the following example, the timing is enabled and a cache lookup is tested with a random embedding vector:
\timing on
SELECT * FROM semantic_cache.get_cached_result(
(SELECT array_agg(random()::float4)::text
FROM generate_series(1, 1536)),
0.95
);
Target performance is less than 5ms for most queries.
In the following example, the benchmark code measures average cache lookup time over 100 iterations:
DO $$
DECLARE
start_time TIMESTAMPTZ;
end_time TIMESTAMPTZ;
test_embedding TEXT;
i INTEGER;
BEGIN
-- Generate test embedding
SELECT array_agg(random()::float4)::text INTO test_embedding
FROM generate_series(1, 1536);
-- Run 100 lookups
start_time := clock_timestamp();
FOR i IN 1..100 LOOP
PERFORM * FROM semantic_cache.get_cached_result(test_embedding, 0.95);
END LOOP;
end_time := clock_timestamp();
RAISE NOTICE 'Average lookup time: % ms',
ROUND((EXTRACT(MILLISECONDS FROM (end_time - start_time))
/ 100)::NUMERIC, 2);
END $$;
Index Performance
The index performance metrics help you monitor vector index effectiveness and usage.
In the following example, the query checks index usage statistics for the semantic cache schema:
SELECT
schemaname,
tablename,
indexname,
idx_scan as times_used,
idx_tup_read as tuples_read,
idx_tup_fetch as tuples_fetched,
pg_size_pretty(pg_relation_size(indexrelid)) as index_size
FROM pg_stat_user_indexes
WHERE schemaname = 'semantic_cache'
ORDER BY idx_scan DESC;
In the following example, the query retrieves detailed index statistics including tuples per scan:
SELECT
i.indexrelname as index_name,
t.tablename as table_name,
pg_size_pretty(pg_relation_size(i.indexrelid)) as index_size,
idx_scan as scans,
idx_tup_read as tuples_read,
ROUND(idx_tup_read::NUMERIC / NULLIF(idx_scan, 0), 2)
as tuples_per_scan
FROM pg_stat_user_indexes i
JOIN pg_stat_user_tables t ON i.relid = t.relid
WHERE i.schemaname = 'semantic_cache';
PostgreSQL Statistics
The PostgreSQL statistics views provide detailed information about table and index operations.
In the following example, the query retrieves table statistics for the semantic cache schema:
SELECT
schemaname,
tablename,
seq_scan,
seq_tup_read,
idx_scan,
idx_tup_fetch,
n_tup_ins as inserts,
n_tup_upd as updates,
n_tup_del as deletes,
n_live_tup as live_tuples,
n_dead_tup as dead_tuples
FROM pg_stat_user_tables
WHERE schemaname = 'semantic_cache';
Alerting
The following sections describe how to set up automated alerts for cache health monitoring.
Set Up Alerts
The alert function monitors cache health and returns warnings when metrics fall outside acceptable ranges.
In the following example, the function creates a monitoring alert system that checks for common cache health issues:
CREATE OR REPLACE FUNCTION monitoring.check_cache_alerts()
RETURNS TABLE(
alert_level TEXT,
alert_type TEXT,
message TEXT,
metric_value NUMERIC
) AS $$
BEGIN
RETURN QUERY
SELECT
'WARNING'::TEXT,
'low_hit_rate'::TEXT,
'Cache hit rate below 60%'::TEXT,
hit_rate_percent::NUMERIC
FROM semantic_cache.cache_stats()
WHERE hit_rate_percent < 60;
RETURN QUERY
SELECT
'WARNING'::TEXT,
'cache_size'::TEXT,
'Cache size exceeding 80% of limit'::TEXT,
(SUM(result_size_bytes) / 1024 / 1024)::NUMERIC
FROM semantic_cache.cache_entries
HAVING SUM(result_size_bytes) / 1024 / 1024 > 800;
RETURN QUERY
SELECT
'INFO'::TEXT,
'expired_entries'::TEXT,
'More than 10% entries expired'::TEXT,
COUNT(*)::NUMERIC
FROM semantic_cache.cache_entries
WHERE expires_at <= NOW()
HAVING COUNT(*) > (SELECT COUNT(*) * 0.1
FROM semantic_cache.cache_entries);
RETURN QUERY
SELECT
'CRITICAL'::TEXT,
'no_activity'::TEXT,
'No cache activity in last hour'::TEXT,
0::NUMERIC
FROM semantic_cache.cache_entries
WHERE last_accessed_at < NOW() - INTERVAL '1 hour'
HAVING COUNT(*) = (SELECT COUNT(*)
FROM semantic_cache.cache_entries);
END;
$$ LANGUAGE plpgsql;
SELECT * FROM monitoring.check_cache_alerts();
Schedule Alert Checks
You can use pg_cron to schedule regular alert checks and notifications.
In the following example, the pg_cron schedule checks for cache alerts every 15 minutes:
SELECT cron.schedule(
'cache-alerts',
'*/15 * * * *',
$$
DO $$
DECLARE
alert RECORD;
BEGIN
FOR alert IN
SELECT * FROM monitoring.check_cache_alerts()
LOOP
RAISE WARNING '[%] %: % (value: %)',
alert.alert_level,
alert.alert_type,
alert.message,
alert.metric_value;
END LOOP;
END $$;
$$
);
Integration with Monitoring Tools
The following sections describe how to integrate cache metrics with external monitoring tools.
Prometheus and Grafana
You can export cache metrics in Prometheus format for visualization in Grafana.
In the following example, the function exports cache statistics in Prometheus text format:
CREATE OR REPLACE FUNCTION monitoring.prometheus_metrics()
RETURNS TEXT AS $$
DECLARE
stats RECORD;
result TEXT := '';
BEGIN
SELECT * INTO stats FROM semantic_cache.cache_stats();
result := result || '# HELP cache_entries_total Total entries'
|| E'\n';
result := result || '# TYPE cache_entries_total gauge' || E'\n';
result := result || 'cache_entries_total ' || stats.total_entries
|| E'\n';
result := result || '# HELP cache_hits_total Total cache hits'
|| E'\n';
result := result || '# TYPE cache_hits_total counter' || E'\n';
result := result || 'cache_hits_total ' || stats.total_hits || E'\n';
result := result || '# HELP cache_misses_total Total cache misses'
|| E'\n';
result := result || '# TYPE cache_misses_total counter' || E'\n';
result := result || 'cache_misses_total ' || stats.total_misses
|| E'\n';
result := result || '# HELP cache_hit_rate Cache hit rate percent'
|| E'\n';
result := result || '# TYPE cache_hit_rate gauge' || E'\n';
result := result || 'cache_hit_rate ' || stats.hit_rate_percent
|| E'\n';
RETURN result;
END;
$$ LANGUAGE plpgsql;
SELECT monitoring.prometheus_metrics();
Application Logging
You can integrate cache metrics into your application logging and monitoring infrastructure.
In the following example, the Python code logs cache metrics to application logs and optionally sends them to a metrics service:
import psycopg2
import logging
logger = logging.getLogger(__name__)
def log_cache_metrics():
"""Log cache metrics to application logs"""
conn = psycopg2.connect("dbname=mydb")
cur = conn.cursor()
cur.execute("SELECT * FROM semantic_cache.cache_stats()")
stats = cur.fetchone()
logger.info(
"Cache Stats - Entries: %d, Hits: %d, Misses: %d, " +
"Hit Rate: %.2f%%",
stats[0], stats[1], stats[2], stats[3]
)
Optimization Guidelines
The following sections provide guidelines for optimizing cache performance based on common issues.
When Hit Rate is Low
If your cache hit rate is below 60 percent, use the following optimization strategies.
In the following example, the similarity threshold is lowered to 0.90 to allow more cache hits:
SELECT * FROM semantic_cache.get_cached_result('[...]'::text, 0.90);
In the following example, the query checks if entries are expiring too quickly by calculating the average TTL:
SELECT COUNT(*), AVG(EXTRACT(EPOCH FROM (expires_at - created_at)))
FROM semantic_cache.cache_entries
WHERE expires_at IS NOT NULL;
In the following example, the query examines similarity scores to verify embedding quality:
SELECT
query_text,
(1 - (query_embedding <=>
(SELECT query_embedding
FROM semantic_cache.cache_entries
LIMIT 1))) as similarity
FROM semantic_cache.cache_entries
ORDER BY similarity DESC
LIMIT 10;
When Cache Size is Growing Too Fast
If your cache is growing faster than expected, use the following optimization strategies.
In the following example, the TTL is reduced to 30 minutes to expire entries more quickly:
UPDATE semantic_cache.cache_config
SET value = '1800'
WHERE key = 'default_ttl_seconds';
In the following example, the maximum cache size is reduced and auto-eviction is triggered:
UPDATE semantic_cache.cache_config
SET value = '500'
WHERE key = 'max_cache_size_mb';
SELECT semantic_cache.auto_evict();
In the following example, entries with zero accesses that are older than one hour are deleted:
DELETE FROM semantic_cache.cache_entries
WHERE access_count = 0
AND created_at < NOW() - INTERVAL '1 hour';
When Lookups are Slow
If cache lookups are taking more than 10ms, use the following optimization strategies.
In the following example, the IVFFlat index is rebuilt with more lists for better performance on larger caches:
DROP INDEX semantic_cache.idx_cache_entries_embedding;
CREATE INDEX idx_cache_entries_embedding
ON semantic_cache.cache_entries
USING ivfflat (query_embedding vector_cosine_ops)
WITH (lists = 1000);
In the following example, the index type is switched to HNSW for better query performance:
SELECT semantic_cache.set_index_type('hnsw');
SELECT semantic_cache.rebuild_index();
In the following example, the work_mem setting is increased to provide more memory for vector operations:
SET work_mem = '512MB';
Regular Maintenance Checklist
The following checklist provides recommended maintenance tasks at different intervals.
Daily tasks include:
- checking the hit rate with
cache_statsfunction. - reviewing the cache size with
cache_healthview. - clearing expired entries with
evict_expiredfunction.
Weekly tasks include:
- reviewing top queries with
recent_cache_activityview. - checking for alerts with
check_cache_alertsfunction. - analyzing tables with the ANALYZE command.
Monthly tasks include:
- reviewing configuration settings for optimization opportunities.
- optimizing the index if needed based on cache size.
- archiving old access logs to prevent table bloat.
- reviewing cost savings with
get_cost_savingsfunction.
See Also
The following resources provide additional information:
- the Functions Reference document describes all monitoring functions.
- the Configuration document explains tuning parameters.
- the Use Cases document provides monitoring patterns in practice.