table-diff
This command compares the data in the specified table across nodes in a cluster and generates a diff report if any inconsistencies are found.
Syntax
./ace table-diff <cluster_name> <schema.table_name> [options]
cluster_name: The name of the pgEdge cluster in which the table resides.schema.table_name: The schema‑qualified name of the table to compare across nodes.
Options
| Option | Alias | Description |
|---|---|---|
-d, --dbname <name> |
Database name. Defaults to the first DB in the cluster config. | |
--block-size <int> |
-b |
Rows processed per comparison block. Default 100000. Honours ace.yaml limits unless --override-block-size is set. |
--concurrency-factor <int> |
-c |
Number of workers per node (1–10). Default 1. |
--compare-unit-size <int> |
-s |
Recursive split size for mismatched blocks. Default 10000. |
--output <json\|html> |
-o |
Report format. Default json. When html, both JSON and HTML files share the same timestamped prefix. |
--nodes <list> |
-n |
Comma-separated node list or all. Up to three-way diffs are supported. |
--table-filter <WHERE> |
Optional SQL WHERE clause applied on every node before hashing. |
|
--override-block-size |
Skip block-size safety checks defined in ace.yaml. |
|
--quiet |
Suppress progress output. Results still write to the diff file. | |
--debug |
-v |
Enable verbose logging. |
--schedule |
Run the diff repeatedly on a timer (requires --every). |
|
--every <duration> |
Go duration string (for example, 15m, 1h30m). Used with --schedule. |
Example
Compare a table across all nodes and write a JSON report (add --output html to emit both JSON and HTML using the same <schema>_<table>_diffs-<timestamp> prefix):
./ace table-diff acctg public.customers_large
Sample Output (with differences)
2025/07/22 12:03:51 INFO Cluster acctg exists
2025/07/22 12:03:51 INFO Connections successful to nodes in cluster
2025/07/22 12:03:51 INFO Table public.customers_large is comparable across nodes
2025/07/22 12:03:51 INFO Using 16 CPUs, max concurrent workers = 16
Hashing initial ranges: 171 / 177 [=======================================================================>---] 1s | 0s
Analysing mismatches: 2 / 2 [=================================================================================] 0s | done
2025/07/22 12:03:53 INFO Table diff comparison completed for public.customers_large
2025/07/22 12:03:53 WARN ✘ TABLES DO NOT MATCH
2025/07/22 12:03:53 WARN Found 99 differences between n1/n2
2025/07/22 12:03:53 WARN Found 99 differences between n2/n3
2025/07/22 12:03:53 INFO Diff report written to public_customers_large_diffs-20250722120353.json
Sample Output (no differences)
2025/07/22 12:05:59 INFO Cluster acctg exists
2025/07/22 12:05:59 INFO Connections successful to nodes in cluster
2025/07/22 12:05:59 INFO Table public.customers_large is comparable across nodes
2025/07/22 12:05:59 INFO Using 16 CPUs, max concurrent workers = 16
Hashing initial ranges: 168 / 177 [======================================================================>----] 1s | 0s
2025/07/22 12:06:01 INFO Table diff comparison completed for public.customers_large
2025/07/22 12:06:01 INFO ✔ TABLES MATCH
How table-diff Works
ACE optimises comparisons with multiprocessing and block hashing:
- It splits work into blocks (
--block-size) and uses multiple workers per node (--concurrency-factor) to compute hashes. If hashes mismatch for a block, rows are materialised and, if necessary, recursively split using--compare-unit-size. - Runtime factors include host resources (CPU/memory), allowed parallelism, table size and row width (e.g., large JSON/bytea/embedding columns can slow hashing), distribution of differences (widely scattered diffs trigger more block fetches), and network latency to database nodes.
Tuning tips
- Tune
--block-sizeand--concurrency-factorfor your hardware and data profile. - Use
--table-filterto narrow scope on very large tables. - Prefer
--output htmlwhen you’ll manually review diffs. - Use
--override-block-sizesparingly; the guardrails inace.yamlprevent allocations that can overwhelm memory.
Scheduling runs
Add --schedule --every=<duration> to keep the diff running on a loop. The command performs an initial comparison immediately, then repeats after each interval until you stop the process:
./ace table-diff --schedule --every=1h my-cluster public.orders
All other flags (--nodes, --block-size, --output, and so on) are reused for every iteration.