Security Audit Report - pg_semantic_cache Extension
Date: 2024-12-18 Version: 0.1.0-beta3 Auditor: Security Review Process
Executive Summary
This security audit reviews the pg_semantic_cache PostgreSQL extension for common vulnerabilities including SQL injection, buffer overflows, input validation issues, and PostgreSQL-specific security concerns.
Overall Risk Level: LOW-MEDIUM Critical Issues: 1 (SQL Injection vulnerability) High Issues: 0 Medium Issues: 2 Low Issues: 3
Critical Issues
1. SQL Injection Vulnerability in Query Construction
Location: pg_semantic_cache.c:365-373 (evict_lru), pg_semantic_cache.c:400-408 (evict_lfu), pg_semantic_cache.c:260-270 (get_cached_result)
Issue: User-controlled integer is used in SQL query construction via appendStringInfo without proper parameterization.
// VULNERABLE CODE (lines 366-373)
appendStringInfo(&buf,
"DELETE FROM semantic_cache.cache_entries "
"WHERE id NOT IN ("
" SELECT id FROM semantic_cache.cache_entries "
" ORDER BY last_accessed_at DESC "
" LIMIT %d" // ← Integer format, but what if keep_count is manipulated?
")",
keep_count);
Risk: While %d format specifier for integers provides some protection, the value is not properly validated before use.
Status: ✅ MITIGATED - Input validation added (lines 357-363, 392-398)
if (keep_count < 0)
elog(ERROR, "evict_lru: keep_count must be non-negative");
Remaining Concern: No upper bound validation (could cause performance issues with very large values)
Recommendation:
if (keep_count < 0 || keep_count > 1000000)
elog(ERROR, "evict_lru: keep_count must be between 0 and 1000000");
2. String Escaping Function - Potential Issues
Location: pg_semantic_cache.c:45-76 (pg_escape_string)
Issue: Custom string escaping function instead of using PostgreSQL's built-in functions.
static char *
pg_escape_string(const char *str)
{
size_t len = strlen(str);
char *result = palloc(len * 2 + 3); // Allocates enough space
// ... manual escaping logic
}
Risk: Custom escaping is error-prone. PostgreSQL provides quote_literal_cstr() for this purpose.
Status: ⚠️ NEEDS REVIEW
Recommendation: Replace with PostgreSQL's built-in functions:
#include "utils/quote.h"
// Use quote_literal_cstr(str) instead of pg_escape_string(str)
High Issues
None identified.
Medium Issues
3. Unbounded String Concatenation
Location: pg_semantic_cache.c:161-200 (cache_query)
Issue: Large JSONB results are converted to strings and concatenated without size limits.
rstr = JsonbToCString(NULL, &result->root, VARSIZE(result));
// No size check before using rstr
Risk: Very large JSONB documents could cause memory exhaustion.
Recommendation:
if (strlen(rstr) > 10 * 1024 * 1024) // 10MB limit
elog(ERROR, "Result data too large (max 10MB)");
4. Missing NULL Checks in Helper Functions
Location: pg_semantic_cache.c:38-43 (execute_sql)
Issue: No NULL check on input parameter.
static void execute_sql(const char *query)
{
int ret = SPI_execute(query, false, 0);
// What if query is NULL?
}
Recommendation:
static void execute_sql(const char *query)
{
if (query == NULL)
elog(ERROR, "execute_sql: query is NULL");
int ret = SPI_execute(query, false, 0);
if (ret < 0)
elog(ERROR, "SPI_execute failed: %d", ret);
}
Low Issues
5. Memory Leak Potential
Location: pg_semantic_cache.c:165-238 (cache_query)
Issue: If error occurs after memory allocation, pfree() calls may be skipped.
Current Code:
qstr = text_to_cstring(query_text);
estr = text_to_cstring(emb_text);
rstr = JsonbToCString(NULL, &result->root, VARSIZE(result));
// ... lots of code that could error ...
pfree(qstr); // Only freed at the end
Status: ✅ LOW RISK - PostgreSQL's memory context system will clean up on error.
Recommendation: Use PG_TRY/PG_CATCH for explicit cleanup in critical paths.
6. Integer Overflow in String Length Calculation
Location: pg_semantic_cache.c:194-200
Issue: strlen(rstr) result is cast to int without overflow check.
appendStringInfo(&buf,
// ...
"%d, %d, " // result_size_bytes
// ...
(int)strlen(rstr), ttl, ttl);
Risk: If rstr length exceeds INT_MAX, the cast will overflow.
Recommendation:
size_t result_len = strlen(rstr);
if (result_len > INT_MAX)
elog(ERROR, "Result data too large");
int result_size = (int)result_len;
7. SPI Error Handling Inconsistency
Location: Multiple locations
Issue: Some SPI operations check return codes, others don't.
Examples:
- ✅ Good: pg_semantic_cache.c:217-218 checks ret < 0
- ⚠️ Inconsistent: pg_semantic_cache.c:279 doesn't check SPI_connect result
- ⚠️ Inconsistent: pg_semantic_cache.c:321 checks SELECT result but not all operations
Recommendation: Standardize SPI error handling:
if (SPI_connect() != SPI_OK_CONNECT)
elog(ERROR, "SPI_connect failed");
// ... do work ...
if (SPI_finish() != SPI_OK_FINISH)
elog(WARNING, "SPI_finish failed");
Input Validation Summary
✅ Well-Validated Inputs
- evict_lru/evict_lfu: Checks for NULL and negative values
- get_cost_savings: Uses default for NULL days parameter
- cache_query: Checks for NULL tags parameter
⚠️ Needs Validation
- Embedding vectors: No dimension validation (assumes 1536)
- Similarity thresholds: No range check (should be 0.0-1.0)
- TTL values: No upper bound (could be set to extreme values)
- JSONB result size: No size limit
Recommendations
Immediate Actions (P0)
- ✅ Replace custom escaping with PostgreSQL's
quote_literal_cstr() - ✅ Add upper bounds to eviction function parameters
- ✅ Add result size limits for cached data
Short-term (P1)
- Validate embedding dimensions against expected size
- Validate similarity thresholds (0.0 - 1.0 range)
- Standardize SPI error handling
Long-term (P2)
- Add rate limiting for cache writes
- Implement query allowlisting for production use
- Add encryption for sensitive cached data
- Add audit logging for security events
Code Fixes
Fix 1: Replace Custom Escaping
// BEFORE
#include "utils/builtins.h"
static char *
pg_escape_string(const char *str)
{
// ... custom escaping logic ...
}
// AFTER
#include "utils/builtins.h"
#include "utils/quote.h"
// Remove pg_escape_string function entirely
// Use quote_literal_cstr() directly:
char *escaped = quote_literal_cstr(qstr);
Fix 2: Add Input Validation Function
static void validate_similarity_threshold(float4 threshold)
{
if (threshold < 0.0 || threshold > 1.0)
elog(ERROR, "Similarity threshold must be between 0.0 and 1.0");
}
static void validate_ttl(int32 ttl)
{
if (ttl < 0)
elog(ERROR, "TTL must be non-negative");
if (ttl > 86400 * 365) // 1 year max
elog(ERROR, "TTL exceeds maximum (1 year)");
}
static void validate_embedding_size(text *embedding_text)
{
char *emb_str = text_to_cstring(embedding_text);
// Parse and count dimensions
// Ensure it matches expected vector size
pfree(emb_str);
}
Fix 3: Standardized Error Handling
#define SPI_CONNECT_OR_ERROR() \
do { \
if (SPI_connect() != SPI_OK_CONNECT) \
elog(ERROR, "%s: SPI_connect failed", __func__); \
} while(0)
#define SPI_FINISH_OR_WARN() \
do { \
if (SPI_finish() != SPI_OK_FINISH) \
elog(WARNING, "%s: SPI_finish failed", __func__); \
} while(0)
Security Testing Checklist
- [x] SQL injection attempts (integers, strings)
- [x] NULL input handling
- [x] Negative value handling
- [ ] Extremely large values (INT_MAX, LONG_MAX)
- [ ] Very long strings (> 1GB)
- [ ] Malformed embeddings
- [ ] Concurrent access testing
- [ ] Resource exhaustion testing
- [ ] Privilege escalation attempts
Conclusion
The pg_semantic_cache extension has reasonable security for a prototype but requires hardening for production use. The main concerns are:
- Custom string escaping should be replaced with PostgreSQL built-ins
- Input validation should be more comprehensive
- Resource limits should be enforced
Recommended Actions Before Production: 1. Implement all P0 fixes 2. Add comprehensive fuzz testing 3. Perform load testing with malicious inputs 4. Add security documentation for users 5. Consider third-party security audit
Next Review Date: Before 1.0.0 release Status: CONDITIONALLY APPROVED for development use Production Readiness: BLOCKED pending P0 fixes