Scheduling ACE Operations (Beta)

ACE supports automated scheduling of table-diff operations through configuration settings in ace_config.py. This allows for regular consistency checks without manual intervention.

Configuration

In ace_config.py, you define jobs and their schedules in two separate lists:

# Define the jobs
schedule_jobs = [
    {
        "name": "t1",
        "cluster_name": "my_cluster",
        "table_name": "public.users"
    },
    {
        "name": "t2",
        "cluster_name": "my_cluster",
        "table_name": "public.orders",
        "args": {
            "max_cpu_ratio": 0.7,
            "batch_size": 1000,
            "block_rows": 10000,
            "nodes": "all",
            "output": "json",
            "quiet": False,
            "dbname": "mydb"
        }
    }
]

Define the schedule for each job

schedule_config = [
    {
        "job_name": "t1",
        "crontab_schedule": "0 0 * * *",    # Run at midnight
        "run_frequency": "30s",             # Alternative to crontab
        "enabled": True,
        "rerun_after": "1h"                # Rerun if diff found after 1 hour
    },
    {
        "job_name": "t2",
        "crontab_schedule": "0 */4 * * *",  # Every 4 hours
        "run_frequency": "5m",              # Alternative to crontab
        "enabled": True,
        "rerun_after": "30m"
    }
]

Job Configuration Options

Each job in schedule_jobs supports:

  • name (required): Unique identifier for the job
  • cluster_name (required): Name of the cluster
  • table_name (required): Fully qualified table name
  • args (optional): Dictionary of table-diff parameters
    • max_cpu_ratio: Maximum CPU usage ratio
    • batch_size: Batch size for processing
    • block_rows: Number of rows per block
    • table_filter: SQL WHERE clause to filter rows for comparison
    • nodes: Nodes to include
    • output: Output format ["json", "csv", "html"]
    • quiet: Suppress output
    • dbname: Database name

Schedule Configuration Options

Each schedule in schedule_config supports:

  • job_name (required): Name of the job to schedule (must match a job name)
  • crontab_schedule: Cron-style schedule expression
  • run_frequency: Alternative to crontab, using time units (e.g., "30s", "5m", "1h")
  • enabled: Whether the schedule is active (default: False)
  • rerun_after: Time to wait before rerunning if differences found

Time Formats

  • Cron Format: * * * * * (minute hour day_of_month month day_of_week)

    • Examples:
      • 0 0 * * *: Daily at midnight
      • 0 */4 * * *: Every 4 hours
      • 0 0 * * 0: Weekly on Sunday
  • Run Frequency Format: <number><unit>

    • Units: "s" (seconds), "m" (minutes), "h" (hours)
    • Minimum: 5 minutes
    • Examples:
      • "30s": Every 30 seconds
      • "5m": Every 5 minutes
      • "1h": Every hour

Starting and Stopping the Scheduler

The scheduler starts automatically when ACE is started.

./pgedge start ace

To stop the scheduler:

./pgedge stop ace

Best Practices

  1. Resource Management:
    • Stagger schedules to avoid overlapping resource-intensive jobs
    • Set appropriate max_cpu_ratio, block_rows, and batch_size values based on the table size and expected load
  2. Frequency Selection:
    • Use crontab_schedule for specific times
    • Use run_frequency for regular intervals

Auto-Repair Module (Beta)

The auto-repair module automatically monitors and repairs data inconsistencies in your cluster. It runs as a background process, periodically checking for inconsistencies and applying repairs based on configured settings.

Configuration

Auto-repair settings are defined in ace_config.py:

auto_repair_config = {
    "enabled": True,
    "cluster_name": "my_cluster",
    "dbname": "mydb",
    "poll_interval": "1000s",
    "status_update_interval": "1000s"
}

Configuration Options

  • enabled: Enable/disable auto-repair functionality (default: False)
  • cluster_name: Name of the cluster to monitor
  • dbname: Database name to monitor
  • poll_interval: How often the Spock exception log is polled to check for new exceptions.
  • status_update_interval: How often the Spock exception status and detail tables are updated with the latest exception status.

Time Intervals

Both poll_interval and status_update_interval accept time strings in the format:

  • <number>s: Seconds (e.g., "60s")
  • <number>m: Minutes (e.g., "5m")
  • <number>h: Hours (e.g., "1h")

Usage

The auto-repair daemon starts automatically when ACE is started.

./pgedge start ace

To stop the auto-repair daemon:

./pgedge stop ace

Common Use Cases

Auto-repair is a great candidate for handling use-cases that have a high probability of insert-insert conflicts. E.g., bidding, reservations, etc., where insert-insert conflicts are likely to arise across multiple nodes.

Limitations and Considerations

  • The auto-repair daemon is currently limited to handling insert-insert conflicts only.
  • Handling other types of conflicts is planned for future releases.