Alert Rule Reference
This document lists all built-in alert rules included with the alerter. Each rule monitors a specific PostgreSQL metric and triggers an alert when the threshold is exceeded.
Connection Rules
High Connection Utilization
This rule alerts when database connection usage approaches the maximum limit.
| Property | Value |
|---|---|
| Metric | connection_utilization_percent |
| Operator | > |
| Default Threshold | 80 |
| Default Severity | warning |
A high connection utilization indicates the database may
run out of available connections. Consider increasing
max_connections or implementing connection pooling.
High Max Connections
This rule alerts when the max_connections setting
exceeds a threshold.
| Property | Value |
|---|---|
| Metric | pg_settings.max_connections |
| Operator | > |
| Default Threshold | 500 |
| Default Severity | warning |
A very high max_connections setting can degrade
performance. Consider using a connection pooler such as
PgBouncer instead of increasing the connection limit.
Blocked Sessions
This rule alerts when sessions are waiting for locks held by other sessions.
| Property | Value |
|---|---|
| Metric | pg_stat_activity.blocked_count |
| Operator | > |
| Default Threshold | 5 |
| Default Severity | warning |
Blocked sessions indicate lock contention. Investigate the blocking queries and consider optimizing the workload.
Long-Running Idle Transactions
This rule alerts when a transaction has been idle in transaction state too long.
| Property | Value |
|---|---|
| Metric | pg_stat_activity.idle_in_transaction_seconds |
| Operator | > |
| Default Threshold | 300 |
| Default Severity | warning |
Idle in transaction connections hold locks and prevent
vacuum from reclaiming space. Configure
idle_in_transaction_session_timeout to automatically
terminate these connections.
Long Lock Wait Time
This rule alerts when a session has been waiting for a lock too long.
| Property | Value |
|---|---|
| Metric | pg_stat_activity.max_lock_wait_seconds |
| Operator | > |
| Default Threshold | 60 |
| Default Severity | warning |
Long lock waits can indicate deadlock-prone workloads or inefficient query patterns.
Query Performance Rules
Long-Running Query
This rule alerts when a query has been executing for longer than the threshold.
| Property | Value |
|---|---|
| Metric | pg_stat_activity.max_query_duration_seconds |
| Operator | > |
| Default Threshold | 300 |
| Default Severity | warning |
Long-running queries may indicate missing indexes, inefficient query plans, or inappropriate workloads.
Long-Running Transaction
This rule alerts when a transaction has been active for longer than the threshold.
| Property | Value |
|---|---|
| Metric | pg_stat_activity.max_xact_duration_seconds |
| Operator | > |
| Default Threshold | 600 |
| Default Severity | warning |
Long transactions can cause bloat and prevent vacuum from running effectively.
Slow Query Count
This rule alerts when the number of slow queries exceeds
a threshold. The rule requires the pg_stat_statements
extension.
| Property | Value |
|---|---|
| Metric | pg_stat_statements.slow_query_count |
| Operator | > |
| Default Threshold | 10 |
| Default Severity | warning |
| Required Extension | pg_stat_statements |
A high slow query count indicates performance problems that should be investigated.
Replication Rules
High Replication Lag (Time)
This rule alerts when replication replay is behind the primary.
| Property | Value |
|---|---|
| Metric | pg_stat_replication.replay_lag_seconds |
| Operator | > |
| Default Threshold | 30 |
| Default Severity | warning |
Replication lag can indicate network issues, replica resource constraints, or write-heavy workloads.
High Replication Lag (Bytes)
This rule alerts when replication is behind by more than the specified byte count.
| Property | Value |
|---|---|
| Metric | pg_stat_replication.lag_bytes |
| Operator | > |
| Default Threshold | 104857600 (100 MB) |
| Default Severity | warning |
This metric provides a more accurate view of replication lag when write activity is bursty.
Inactive Replication Slot
This rule alerts when a replication slot becomes inactive.
| Property | Value |
|---|---|
| Metric | pg_replication_slots.inactive |
| Operator | >= |
| Default Threshold | 1 |
| Default Severity | critical |
Inactive replication slots prevent WAL cleanup and can cause disk exhaustion. Drop unused slots or reconnect the subscriber.
High Replication Slot WAL Retention
This rule alerts when a replication slot retains more WAL data than the threshold.
| Property | Value |
|---|---|
| Metric | pg_replication_slots.retained_bytes |
| Operator | > |
| Default Threshold | 1073741824 (1 GB) |
| Default Severity | warning |
Large WAL retention by a replication slot can lead to disk exhaustion. Investigate the subscriber connection or consider dropping unused slots.
Standby Disconnected
This rule alerts when a standby has no active WAL receiver process.
| Property | Value |
|---|---|
| Metric | pg_stat_replication.standby_disconnected |
| Operator | == |
| Default Threshold | 1 |
| Default Severity | critical |
A disconnected standby is in recovery mode but not receiving WAL from the primary. The standby will fall further behind until the WAL receiver is restarted and replication resumes. Check the PostgreSQL log on the standby for connection errors and verify that the primary server is accessible.
Subscription Worker Down
This rule alerts when a subscription's apply worker is not running.
| Property | Value |
|---|---|
| Metric | pg_node_role.subscription_worker_down |
| Operator | == |
| Default Threshold | 1 |
| Default Severity | critical |
A subscription worker that is not running means logical
replication has stopped for that subscription. This
alert covers both native PostgreSQL logical replication
and Spock subscriptions. Check the PostgreSQL log on the
subscriber for errors and verify that the publisher is
accessible. Use ALTER SUBSCRIPTION ... ENABLE to
restart a disabled subscription.
Storage Rules
High Disk Usage
This rule alerts when disk usage exceeds the threshold.
| Property | Value |
|---|---|
| Metric | pg_sys_disk_info.used_percent |
| Operator | > |
| Default Threshold | 85 |
| Default Severity | warning |
High disk usage can lead to database failures. Add storage capacity or clean up unnecessary data.
Critical Disk Usage
This rule alerts when disk usage is critically high.
| Property | Value |
|---|---|
| Metric | pg_sys_disk_info.used_percent |
| Operator | > |
| Default Threshold | 95 |
| Default Severity | critical |
Critical disk usage requires immediate action to prevent database outages.
High Dead Tuple Percentage
This rule alerts when tables have accumulated too many dead tuples.
| Property | Value |
|---|---|
| Metric | pg_stat_all_tables.dead_tuple_percent |
| Operator | > |
| Default Threshold | 10 |
| Default Severity | warning |
Dead tuples indicate vacuum is not keeping up with updates. Check vacuum settings and consider running manual vacuum. The alerter excludes tables with fewer than 1,000 total tuples from evaluation to reduce noise from small catalog and system tables.
High Table Bloat
This rule alerts when table bloat exceeds the threshold.
| Property | Value |
|---|---|
| Metric | table_bloat_ratio |
| Operator | > |
| Default Threshold | 50 |
| Default Severity | warning |
Table bloat reduces query performance and wastes
storage. Consider running VACUUM FULL during a
maintenance window.
Stale Autovacuum
This rule alerts when a table has not been autovacuumed recently.
| Property | Value |
|---|---|
| Metric | table_last_autovacuum_hours |
| Operator | > |
| Default Threshold | 168 (7 days) |
| Default Severity | warning |
Tables that have not been vacuumed may have accumulated dead tuples or outdated statistics.
High Transaction ID Age
This rule alerts when transaction IDs are approaching wraparound.
| Property | Value |
|---|---|
| Metric | age_percent |
| Operator | > |
| Default Threshold | 50 |
| Default Severity | warning |
Transaction ID wraparound prevention requires aggressive vacuuming. Monitor this metric carefully on busy databases.
Database Performance Rules
Low Cache Hit Ratio
This rule alerts when the buffer cache hit ratio falls below the threshold.
| Property | Value |
|---|---|
| Metric | pg_stat_database.cache_hit_ratio |
| Operator | < |
| Default Threshold | 90 |
| Default Severity | warning |
A low cache hit ratio indicates the database needs more
memory for shared_buffers; the working set may also be
too large. The alerter calculates the ratio from the
change in block reads between collection intervals. This
delta-based approach reflects recent performance rather
than cumulative counters. The alerter excludes databases
with fewer than 10,000 total block operations in an
interval to avoid noise from idle databases.
Deadlocks Detected
This rule alerts when deadlocks occur.
| Property | Value |
|---|---|
| Metric | pg_stat_database.deadlocks_delta |
| Operator | > |
| Default Threshold | 0 |
| Default Severity | warning |
Deadlocks indicate lock ordering problems in the application. Review the application logic to prevent deadlocks.
High Temporary File Usage
This rule alerts when temporary file creation exceeds the threshold.
| Property | Value |
|---|---|
| Metric | pg_stat_database.temp_files_delta |
| Operator | > |
| Default Threshold | 10 |
| Default Severity | warning |
Temporary files are created when work_mem is
insufficient for sort and hash operations. Consider
increasing work_mem.
System Resource Rules
High CPU Usage
This rule alerts when CPU usage exceeds the threshold.
| Property | Value |
|---|---|
| Metric | pg_sys_cpu_usage_info.processor_time_percent |
| Operator | > |
| Default Threshold | 80 |
| Default Severity | warning |
High CPU usage may indicate inefficient queries, missing indexes, or insufficient hardware capacity.
High Memory Usage
This rule alerts when memory usage exceeds the threshold.
| Property | Value |
|---|---|
| Metric | pg_sys_memory_info.used_percent |
| Operator | > |
| Default Threshold | 85 |
| Default Severity | warning |
High memory usage can lead to swap usage and performance degradation. Review memory allocation settings.
High System Load
This rule alerts when the 15-minute load average exceeds the threshold.
| Property | Value |
|---|---|
| Metric | pg_sys_load_avg_info.load_avg_fifteen_minutes |
| Operator | > |
| Default Threshold | 4 |
| Default Severity | warning |
High system load indicates the server is overloaded. Investigate the source of the load and consider scaling resources.
Archive Rules
Archive Failures
This rule alerts when WAL archiving fails.
| Property | Value |
|---|---|
| Metric | pg_stat_wal.failed_count_delta |
| Operator | > |
| Default Threshold | 0 |
| Default Severity | critical |
Archive failures can prevent point-in-time recovery. Check the archive command and destination storage.
Checkpoint Rules
Frequent Requested Checkpoints
This rule alerts when checkpoints are requested too frequently.
| Property | Value |
|---|---|
| Metric | pg_stat_checkpointer.checkpoints_req_delta |
| Operator | > |
| Default Threshold | 5 |
| Default Severity | warning |
Frequent requested checkpoints indicate
checkpoint_segments or max_wal_size may be too low
for the workload.
Customizing Rules
All built-in rules can be customized through per-connection overrides. Administrators configure overrides through the admin panel. See the Alerts documentation for details on the alert lifecycle and management.