Skip to content

table-repair

Performs repairs on tables of divergent nodes based on the diff report generated by table-diff.

Syntax

./ace table-repair --diff-file=<diff_file> [--source-of-truth=<node>] [options] <cluster_name> <schema.table_name>
  • --diff-file: Path to the diff JSON file.
  • --source-of-truth (-r): Authoritative node for repairs. Required unless you run in bidirectional insert-only mode or --fix-nulls.
    • cluster_name: Name of the cluster. Optional. Overrides default_cluster in ace.yaml if specified.
  • schema.table_name: Schema‑qualified table name to repair.

Options

Option Alias Description Default
--diff-file <path> -f Path to the diff file (required)
--dbname <name> -d Database name override
--nodes <list> -n Nodes to include when resolving cluster metadata all
--source-of-truth <node> -r Authoritative node for repairs (required unless --bidirectional)
--dry-run -y Emit a dry-run plan instead of executing changes false
--generate-report -g Write a JSON repair report to reports/<YYYY-MM-DD>/repair_report_<HHMMSS.mmm>.json false
--insert-only -i Only insert missing rows; skip updates/deletes false
--upsert-only -P Insert or update rows; skip deletes false
--repair-plan <path> -p Path to an advanced repair plan (YAML/JSON). Overrides --source-of-truth and uses rule-based actions.
--fix-nulls -X Fill NULL columns on each node using non-NULL values from its peers (no source-of-truth needed) false
--bidirectional -Z Perform insert-only repairs in both directions false
--fire-triggers -t Execute triggers (otherwise runs with session_replication_role='replica') false
--recovery-mode Enable recovery-mode repair when the diff was generated with --against-origin; can auto-select a source of truth using Spock LSNs false
--preserve-origin Preserve replication origin node ID and LSN with per-row timestamp accuracy for repaired rows. When enabled, repaired rows will have commits with the original node's origin ID and exact commit timestamp (microsecond precision) instead of the local node ID. Requires LSN to be available from a survivor node. false
--quiet -q Suppress non-essential logging false
--debug -v Enable verbose logging false

Advanced repair plans

  • Use --repair-plan to drive repairs from a plan. Source-of-truth becomes optional; the plan sets per-row decisions.
  • Mutually exclusive with --bidirectional and --fix-nulls (those are separate modes).
  • Dry-run and reports include rule usage counts per node.
  • See Advanced repair for grammar and Examples for recipes.

Example

./ace table-repair \
  --diff-file=public_customers_large_diffs-20250718134542.json \
  --source-of-truth=n1 \
  --generate-report \
  acctg public.customers_large

Diff reports share the same prefix generated by table-diff (for example public_customers_large_diffs-20250718134542.json). When you request a dry run or report, ACE also writes JSON summaries under reports/<YYYY-MM-DD>/repair_report_<HHMMSS.mmm>.json (or dry_run_report_<...>.json).

Recovery-mode behavior

  • If the diff file indicates only_origin, table-repair refuses to run unless --recovery-mode is set.
  • In recovery-mode, if no --source-of-truth is provided, ACE probes surviving nodes for the failed node’s Spock origin LSN (preferred) and slot LSN (fallback) and picks the highest. Ties or missing LSNs require you to provide --source-of-truth.
  • The chosen source (auto or explicit) is recorded in the repair report along with the LSN probes.
  • Recovery-mode still accepts --repair-plan; the plan is applied after the source of truth is determined. If no repair plan is provided, ACE performs a standard repair with the recovery-mode source selection.

Sample Output

2025/07/22 12:05:24 INFO Starting table repair for public.customers_large on cluster acctg
2025/07/22 12:05:24 INFO Processing repairs for divergent node: n2
2025/07/22 12:05:24 INFO Executed 99 upsert operations on n2
2025/07/22 12:05:24 INFO Repair of public.customers_large complete in 0.003s. Nodes n2 repaired (99 upserted).

Fixing null-only drifts (--fix-nulls)

Replication hiccups can leave some columns NULL on one node while populated on another. The --fix-nulls mode cross-fills those NULLs in both directions using values from the paired node(s); it does not require a source-of-truth. Use it when the diff shows only NULL/NOT NULL mismatches and you want to reconcile columns without preferring a single node.

Preserving replication origin (--preserve-origin)

When --preserve-origin is enabled, repaired rows maintain the correct replication origin node ID and LSN from the original transaction, along with precise per-row timestamp preservation. This is particularly important in recovery scenarios where:

  • A node fails and rows are repaired from a survivor
  • The failed node may come back online
  • Without origin tracking, the repaired rows would have the local node's origin ID, which could cause conflicts when the original node resumes replication
  • Temporal ordering and conflict resolution depend on accurate commit timestamps

How it works

  1. Origin extraction: ACE extracts the node_origin and commit_ts from the diff file metadata for each row being repaired.

  2. LSN retrieval: For each origin node, ACE queries a survivor node to obtain the origin LSN. If the LSN is not available, the affected rows fall back to a regular repair without origin preservation (a warning will be logged).

  3. Replication origin session: Before executing repairs for each origin group, ACE:

  4. Gets or creates a replication origin for the origin node
  5. Sets up a replication origin session
  6. Configures the session with the origin LSN and timestamp
  7. Executes the repairs
  8. Resets the session

  9. Transaction management: When preserve-origin is active, repairs use a two-phase commit pattern (not a single atomic transaction):

  10. Phase 1: All deletes are executed and committed in a single transaction.
  11. Phase 2: Upserts are grouped by (origin node, LSN, timestamp) tuples. Each group runs in its own transaction with its replication origin xact configured (LSN and commit timestamp), then commits. This is required because pg_replication_origin_xact_setup is transaction-scoped.
  12. Failure/rollback boundaries: If a failure occurs during phase 2, deletes remain committed and any upsert batches that already committed remain committed; only the current batch rolls back. The repair is not all-or-nothing.

  13. Timestamp precision: Timestamps are stored in RFC3339Nano format (e.g., 2026-01-15T14:23:45.123456Z) to preserve microsecond-level accuracy. This precision is essential when:

  14. Multiple transactions occurred within the same second on the origin node
  15. Conflict resolution depends on precise temporal ordering
  16. Recovery scenarios require exact timestamp matching for conflict-free reintegration

Requirements and limitations

  • LSN availability: The origin LSN should be available from at least one survivor node. If not available, the affected rows fall back to a regular repair without origin preservation.
  • Survivor nodes: At least one survivor node must be accessible to fetch the origin LSN.
  • Privileges: Replication origin functions require superuser or replication privileges on the target database.
  • Missing metadata: If origin metadata is missing from the diff file for some rows, those rows will be repaired without origin tracking (a warning will be logged).

When to use

Enable --preserve-origin in recovery scenarios where: - You want to prevent replication conflicts when the origin node returns - You need to maintain the original transaction timestamps and origin metadata

Origin preservation is off by default. Consider leaving it disabled if: - You're certain the origin node will not come back online - You've permanently removed the origin node from the cluster - You want repaired rows to be treated as local writes