Logging and Cost Tracking
This guide describes how to track cache hits and misses and calculate cost savings from avoided LLM API calls.
The following sections provide a quick introduction to logging cache access and tracking cost savings.
In the following example, the log_cache_access function logs cache
access events and retrieves cost savings reports:
SELECT semantic_cache.log_cache_access('query_hash', false, NULL, 0.006);
SELECT semantic_cache.log_cache_access('query_hash', true, 0.95, 0.006);
SELECT * FROM semantic_cache.get_cost_savings(7);
SELECT * FROM semantic_cache.cost_savings_daily
ORDER BY date DESC
LIMIT 7;
Functions
The following sections describe the functions available for logging cache access and calculating cost savings.
log_cache_access
The log_cache_access function records a cache access event with cost
information.
In the following example, the function signature shows the required parameters for logging cache access:
SELECT semantic_cache.log_cache_access(
query_hash text,
cache_hit boolean,
similarity_score float4,
query_cost numeric
);
In the following example, the function logs a cache miss when the LLM API must be called:
SELECT semantic_cache.log_cache_access('abc123...', false, NULL, 0.008);
In the following example, the function logs a cache hit when a cached result is returned:
SELECT semantic_cache.log_cache_access('def456...', true, 0.97, 0.008);
get_cost_savings
The get_cost_savings function generates a cost savings report for a
specified time period.
In the following example, the function signature shows the optional days parameter defaulting to 30:
SELECT * FROM semantic_cache.get_cost_savings(
days integer DEFAULT 30
);
The following table describes the columns returned by the function:
| Column | Type | Description |
|---|---|---|
| total_queries | bigint | Total number of queries |
| cache_hits | bigint | Number of cache hits |
| cache_misses | bigint | Number of cache misses |
| hit_rate | float4 | Hit rate percentage |
| total_cost_saved | float8 | Total money saved |
| avg_cost_per_hit | float8 | Average savings per hit |
| total_cost_if_no_cache | float8 | Cost without cache |
In the following example, the function returns cost savings for the last 30 days:
SELECT * FROM semantic_cache.get_cost_savings();
In the following example, the function returns cost savings for the last 7 days:
SELECT * FROM semantic_cache.get_cost_savings(7);
In the following example, the query formats the cost savings output for display:
SELECT
total_queries,
cache_hits,
ROUND(hit_rate, 1) || '%' as hit_rate,
'$' || ROUND(total_cost_saved, 2) as saved,
'$' || ROUND(total_cost_if_no_cache, 2) as would_have_cost
FROM semantic_cache.get_cost_savings(30);
Views
The following sections describe the views available for monitoring cache access patterns and cost savings.
cache_access_summary
The cache_access_summary view provides hourly cache access
statistics with cost savings information.
In the following example, the query retrieves hourly statistics for the last 24 hours:
SELECT * FROM semantic_cache.cache_access_summary
ORDER BY hour DESC
LIMIT 24;
The view includes:
- the hour timestamp.
- the total accesses in that hour.
- the number of hits and misses.
- the hit rate percentage.
- the total cost saved.
cost_savings_daily
The cost_savings_daily view provides a daily breakdown of cost
savings and query statistics.
In the following example, the query retrieves daily cost savings for the last 7 days:
SELECT * FROM semantic_cache.cost_savings_daily
ORDER BY date DESC
LIMIT 7;
The view includes:
- the date.
- the total queries for that day.
- the number of cache hits and misses.
- the hit rate percentage.
- the total cost saved.
- the average savings per hit.
top_cached_queries
The top_cached_queries view ranks queries by total cost savings.
In the following example, the query retrieves the top ten queries with the highest cost savings:
SELECT * FROM semantic_cache.top_cached_queries
LIMIT 10;
The view includes:
- the query hash identifier.
- the number of times served from cache.
- the average similarity score.
- the total cost saved by this query.
- the last access time.
Integration Examples
The following sections provide integration examples for Python and Node.js applications.
Python with OpenAI
In the following example, the Python code integrates cache logging with OpenAI API calls:
import psycopg2
import openai
import hashlib
conn = psycopg2.connect("dbname=mydb")
client = openai.OpenAI(api_key="your-key")
def query_with_cache(query_text, embedding):
cur = conn.cursor()
query_hash = hashlib.sha256(query_text.encode()).hexdigest()
cur.execute("""
SELECT * FROM semantic_cache.get_cached_result(%s, 0.95)
""", (embedding,))
result = cur.fetchone()
if result and result[0]:
cur.execute("""
SELECT semantic_cache.log_cache_access(%s, true, %s, 0.008)
""", (query_hash, result[2]))
conn.commit()
return result[1]
response = client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": query_text}]
)
usage = response.usage
cost = (usage.prompt_tokens / 1000) * 0.03 + \
(usage.completion_tokens / 1000) * 0.06
result_json = response.choices[0].message.content
cur.execute("""
SELECT semantic_cache.cache_query(%s, %s, %s::jsonb, 3600)
""", (query_text, embedding, result_json))
cur.execute("""
SELECT semantic_cache.log_cache_access(%s, false, NULL, %s)
""", (query_hash, cost))
conn.commit()
return result_json
Node.js with Anthropic
In the following example, the Node.js code integrates cache logging with Anthropic API calls:
const { Pool } = require('pg');
const Anthropic = require('@anthropic-ai/sdk');
const crypto = require('crypto');
const pool = new Pool({ database: 'mydb' });
const anthropic = new Anthropic();
async function queryWithCache(queryText, embedding) {
const client = await pool.connect();
const queryHash = crypto.createHash('sha256')
.update(queryText).digest('hex');
try {
const cache = await client.query(
'SELECT * FROM semantic_cache.get_cached_result($1, 0.95)',
[embedding]
);
if (cache.rows[0]?.found) {
await client.query(
'SELECT semantic_cache.log_cache_access($1, $2, $3, $4)',
[queryHash, true, cache.rows[0].similarity_score, 0.008]
);
return cache.rows[0].result_data;
}
const message = await anthropic.messages.create({
model: "claude-3-5-sonnet-20241022",
max_tokens: 1024,
messages: [{ role: "user", content: queryText }]
});
const cost = (message.usage.input_tokens / 1_000_000) * 3.00 +
(message.usage.output_tokens / 1_000_000) * 15.00;
await client.query(
'SELECT semantic_cache.cache_query($1, $2, $3, 3600)',
[queryText, embedding, JSON.stringify(message.content)]
);
await client.query(
'SELECT semantic_cache.log_cache_access($1, $2, $3, $4)',
[queryHash, false, null, cost]
);
return message.content;
} finally {
client.release();
}
}
Cost Calculation
The following sections describe how to calculate costs for logging cache access.
Where Costs Come From
You must calculate the cost from your LLM API response and provide it
when calling the log_cache_access function.
In the following example, the Python code calculates the cost for an OpenAI GPT-4 API call:
usage = response['usage']
input_cost = (usage['prompt_tokens'] / 1000) * 0.03
output_cost = (usage['completion_tokens'] / 1000) * 0.06
total_cost = input_cost + output_cost
Current Pricing
The following table shows current pricing for common LLM models as of January 2026:
| Model | Input (per 1M tokens) | Output (per 1M tokens) |
|---|---|---|
| GPT-4 Turbo | $10.00 | $30.00 |
| GPT-3.5 Turbo | $0.50 | $1.50 |
| Claude 3.5 Sonnet | $3.00 | $15.00 |
| Claude 3 Haiku | $0.25 | $1.25 |
Monitoring Dashboard
The following sections describe how to create monitoring dashboards for cache performance.
In the following example, the query retrieves key cache metrics for a monitoring dashboard:
SELECT
(SELECT COUNT(*) FILTER (WHERE cache_hit = true)
FROM semantic_cache.cache_access_log
WHERE access_time >= NOW() - INTERVAL '24 hours') as hits_24h,
(SELECT ROUND(SUM(cost_saved)::numeric, 4)
FROM semantic_cache.cache_access_log
WHERE access_time >= NOW() - INTERVAL '24 hours') as saved_24h,
(SELECT total_cost_saved
FROM semantic_cache.cache_metadata
WHERE id = 1) as saved_all_time,
(SELECT COUNT(*) FROM semantic_cache.cache_entries) as entries;
Maintenance
The following sections describe maintenance tasks for the cache access log.
Manual Cleanup
In the following example, the commands perform manual cleanup of old log entries:
DELETE FROM semantic_cache.cache_access_log
WHERE access_time < NOW() - INTERVAL '30 days';
SELECT pg_size_pretty(
pg_total_relation_size('semantic_cache.cache_access_log'));
VACUUM semantic_cache.cache_access_log;
Automated Cleanup
You can use the pg_cron extension to schedule automated cleanup tasks.
In the following example, the pg_cron extension schedules daily cleanup at 2 AM:
CREATE EXTENSION pg_cron;
SELECT cron.schedule(
'semantic-cache-log-cleanup',
'0 2 * * *',
$$DELETE FROM semantic_cache.cache_access_log
WHERE access_time < NOW() - INTERVAL '30 days'$$
);
Database Schema
The following sections describe the database schema for logging and cost tracking.
Tables
The cache_metadata table tracks overall cache statistics:
id SERIAL PRIMARY KEY
total_hits BIGINT DEFAULT 0
total_misses BIGINT DEFAULT 0
total_cost_saved NUMERIC(12,6) DEFAULT 0.0
The cache_access_log table records individual cache access events:
id BIGSERIAL PRIMARY KEY
access_time TIMESTAMPTZ DEFAULT NOW()
query_hash TEXT
cache_hit BOOLEAN NOT NULL
similarity_score REAL
query_cost NUMERIC(10,6)
cost_saved NUMERIC(10,6)
The table includes the following indexes:
- the
idx_access_log_timeindex onaccess_time. - the
idx_access_log_hashindex onquery_hash.
Troubleshooting
The following sections address common troubleshooting scenarios for logging and cost tracking.
No data in reports
If reports show no data, use the following troubleshooting queries.
The following query checks if logging is being performed:
SELECT COUNT(*) FROM semantic_cache.cache_access_log;
The following query checks the date range of logs:
SELECT MIN(access_time), MAX(access_time)
FROM semantic_cache.cache_access_log;
The following query tries a longer time period:
SELECT * FROM semantic_cache.get_cost_savings(365);
Costs showing as zero
Ensure you are passing actual costs to the log_cache_access
function.
In the following example, an incorrect call passes zero as the cost:
SELECT semantic_cache.log_cache_access('hash', true, 0.95, 0);
In the following example, the correct call passes the actual cost:
SELECT semantic_cache.log_cache_access('hash', true, 0.95, 0.008);
Storage growing too large
If the cache access log table is growing too large, archive old logs before deleting them.
In the following example, the commands archive and delete old log entries:
CREATE TABLE semantic_cache.cache_access_log_archive AS
SELECT * FROM semantic_cache.cache_access_log
WHERE access_time < NOW() - INTERVAL '90 days';
DELETE FROM semantic_cache.cache_access_log
WHERE access_time < NOW() - INTERVAL '90 days';
VACUUM semantic_cache.cache_access_log;
Performance
The logging system has the following performance characteristics:
- The overhead is approximately 1 to 2ms per log entry.
- The storage requirement is approximately 100 bytes per log entry.
- The system creates automatic indexes on
access_timeandquery_hash. - The recommendation is to archive logs older than 30 to 90 days.