Using a .json File to Deploy a Remote Cluster
In this tutorial, we'll walk you through the process of using pgEdge Platform on a console host to create and modify a .json file that specifies your cluster preferences, and then deploy that cluster on remote hosts. The deployment validates the SSH connection with each node, creates the replication sets and subscriptions, and ensures that communications between the nodes works as expected.
Installing the pgEdge Platform
Before installing the pgEdge Platform, you will need to:
-
Generate an SSH keypair (opens in a new tab). After creating the keypair, copy the private half of a key pair to
~/.ssh
(for example,id_rsa.pem
) on the console host, and the public half of the key pair to ~/.ssh on each node host (for example,id_rsa.pub
). -
Set SELinux to
permissive
ordisabled
mode (opens in a new tab) on any PostgreSQL host. -
Open any firewalls that could obstruct access between the console and the remote hosts.
After meeting the prerequisites, you can install pgEdge Platform with the command:
python3 -c "$(curl -fsSL https://pgedge-download.s3.amazonaws.com/REPO/install.py)"
Paste the command into your command line client and press Return
.
Creating a Cluster Configuration File
After installing the pgEdge Platform, use the pgedge cluster json-template
command to create a .JSON file that describes the configuration of your replication cluster. Move into the pgedge
directory, and invoke the command:
./pgedge cluster json-template cluster_name db_name node_count db_superuser password pg_version port
When you invoke the command, specify the following values for the arguments:
-
cluster_name
is the name of the cluster. A directory with this same name will be created in thecluster
directory; the file describing the cluster configuration will be namedcluster_name.json
. -
db_name
is the name of your PostgreSQL database. -
node_count
specifies the number of nodes that will be in the cluster. -
db_superuser
specifies the username of the database superuser created for this database. -
password
is the password of the database superuser. -
pg_version
specifies the PostgreSQL version of the database; choose from versions14
,15
, and16
. -
port
specifies the port number of the first node created in this cluster.
When the command completes, edit the .JSON file (located in pgedge/cluster/cluster_name/cluster_name.json
), confirming your deployment details and adding any missing information. For detailed information about creating and modifying a JSON cluster configuration file, see Managing a Cluster's JSON File
.
Validating the .json File
You can use the pgedge cluster json-validate
command to verify that your .JSON file is structured properly. The syntax is:
./pgedge cluster json-validate cluster_name
For example, the following command validates the .json file that creates a cluster named demo
.
./pgedge cluster json-validate demo
JSON defines a 3 node cluster
Note: The validate command checks only the structure of the .json file; it does not validate the values you provide.
Deploying a Remote Cluster from a .json File
After updating and validating the cluster_name.json
file, you can invoke the pgedge cluster init
command on the console host to deploy the defined cluster. The syntax is:
./pgedge cluster init cluster_name
The command installs pgEdge Platform and PostgreSQL on each node, creates the initial database, installs the spock and snowflake extensions, and creates the default
replication set and subscriptions. For example, the following command deploys the sample demo.json
file created above:
./pgedge cluster init demo
The CLI displays confirmation of node and subscription creation as it deploys the cluster.
## Loading cluster 'demo' json definition file
July 11, 2024, 14:56:04: 127.0.0.1 : n1 - Checking ssh on 127.0.0.1 [OK]
July 11, 2024, 14:56:05: 127.0.0.1 : n2 - Checking ssh on 127.0.0.1 [OK]
July 11, 2024, 14:56:05: 127.0.0.1 : n3 - Checking ssh on 127.0.0.1 [OK]
################################################################################
# REPO: https://pgedge-upstream.s3.amazonaws.com/REPO
# Cluster Name: demo
# Node Type: Primary
# Primary Node Name:
# Name: n1
# Host: 127.0.0.1
# Port: 6432
# Path: /home/ec2-user/work/platform_test/nc/pgedge/cluster/demo/n1
# Database: lcdb
# Database User: lcusr
# Number of Nodes: 3
################################################################################
July 11, 2024, 14:56:15: 127.0.0.1 : n1 - Installing pgedge [OK]
July 11, 2024, 14:57:03: 127.0.0.1 : n1 - Setup pgedge [OK]
################################################################################
# REPO: https://pgedge-upstream.s3.amazonaws.com/REPO
# Cluster Name: demo
# Node Type: Primary
# Primary Node Name:
# Name: n2
# Host: 127.0.0.1
# Port: 6433
# Path: /home/ec2-user/work/platform_test/nc/pgedge/cluster/demo/n2
# Database: lcdb
# Database User: lcusr
# Number of Nodes: 3
################################################################################
July 11, 2024, 14:57:22: 127.0.0.1 : n2 - Installing pgedge [OK]
July 11, 2024, 14:58:10: 127.0.0.1 : n2 - Setup pgedge [OK]
################################################################################
# REPO: https://pgedge-upstream.s3.amazonaws.com/REPO
# Cluster Name: demo
# Node Type: Primary
# Primary Node Name:
# Name: n3
# Host: 127.0.0.1
# Port: 6434
# Path: /home/ec2-user/work/platform_test/nc/pgedge/cluster/demo/n3
# Database: lcdb
# Database User: lcusr
# Number of Nodes: 3
################################################################################
July 11, 2024, 14:58:29: 127.0.0.1 : n3 - Installing pgedge [OK]
July 11, 2024, 14:59:18: 127.0.0.1 : n3 - Setup pgedge [OK]
July 11, 2024, 14:59:21: 127.0.0.1 : n1 - Creating node n1 [OK]
July 11, 2024, 14:59:24: 127.0.0.1 : n2 - Creating node n2 [OK]
July 11, 2024, 14:59:26: 127.0.0.1 : n3 - Creating node n3 [OK]
July 11, 2024, 14:59:28: 127.0.0.1 : n1 - Creating subscriptions sub_n1n2 [OK]
July 11, 2024, 14:59:29: 127.0.0.1 : n1 - Creating subscriptions sub_n1n3 [OK]
July 11, 2024, 14:59:30: 127.0.0.1 : n2 - Creating subscriptions sub_n2n1 [OK]
July 11, 2024, 14:59:31: 127.0.0.1 : n2 - Creating subscriptions sub_n2n3 [OK]
July 11, 2024, 14:59:32: 127.0.0.1 : n3 - Creating subscriptions sub_n3n1 [OK]
July 11, 2024, 14:59:33: 127.0.0.1 : n3 - Creating subscriptions sub_n3n2 [OK]
July 11, 2024, 14:59:34: 127.0.0.1 : n3 - Listing spock nodes [OK]
[
{
"node_id": 673694252,
"node_name": "n1"
},
{
"node_id": 560818415,
"node_name": "n2"
},
{
"node_id": 193995617,
"node_name": "n3"
}
]
When cluster deployment completes, you can create one or more tables in the database, and use the pgedge spock repset-add-table
to add the table to the default
replication set to start replication.
Adding Tables to the default
Replication Set
For replication to begin, you will need to add tables to the default
replication set; for this example, we'll use pgbench to create those tables. When you open pgbench or psql, specify the port number and database name to ensure you're working on the correct node. Before starting, source the PostgreSQL environment variables on each node to add pgbench and psql to your OS PATH; for example:
source cluster/demo/n1/pgedge/pg16/pg16.env
Then, use pgbench to set up a very simple four-table database. At the OS command line, create the pgbench tables in your database (db_name
) with the pgbench
command. You must create the tables on each node in your replication cluster:
pgbench -i --port=port_number db_name
Then, connect to each node with the psql client:
psql -p port_number -d db_name
Once connected, alter the numeric columns, making the numeric fields conflict-free delta-apply columns, ensuring that the value replicated is the delta of the committed changes (the old value plus or minus any new value) to a given record. If your cluster is configured to use Spock extension 4.0 (or later) use the commands:
ALTER TABLE pgbench_accounts ALTER COLUMN abalance SET (log_old_value=true, delta_apply_function=spock.delta_apply);
ALTER TABLE pgbench_branches ALTER COLUMN bbalance SET (log_old_value=true, delta_apply_function=spock.delta_apply);
ALTER TABLE pgbench_tellers ALTER COLUMN tbalance SET (log_old_value=true, delta_apply_function=spock.delta_apply);
If you're using an older version of the Spock extention (prior to 4.0), use the commands:
ALTER TABLE pgbench_accounts ALTER COLUMN abalance SET (log_old_value=true);
ALTER TABLE pgbench_branches ALTER COLUMN bbalance SET (log_old_value=true);
ALTER TABLE pgbench_tellers ALTER COLUMN tbalance SET (log_old_value=true);
Then, exit psql:
db_name=# \q
On the OS command line for each node, use the pgedge spock repset-add-table
command to add the tables to the system-created replication set (named default
); the command is followed by your database name (db_name
):
./pgedge spock repset-add-table default 'pgbench_*' db_name
Adding table public.pgbench_accounts to replication set default.
Adding table public.pgbench_branches to replication set default.
Adding table public.pgbench_history to replication set default.
⚠ table pgbench_history cannot be added to replication set default
⚠ DETAIL: table does not have PRIMARY KEY and given replication set is configured to replicate UPDATEs and/or DELETEs
Adding table public.pgbench_tellers to replication set default.
The fourth table, pgbench_history
, is excluded from the replication set because it does not have a primary key.
Checking the Configuration
On the psql command line, check the configuration on each node with the following SQL statements:
db_name=# SELECT * FROM spock.node;
node_id | node_name | location | country | info
-----------+-----------+----------+---------+------
193995617 | n3 | | |
673694252 | n1 | | |
560818415 | n2 | | |
(3 rows)
and:
db_name=# SELECT sub_id, sub_name, sub_slot_name, sub_replication_sets FROM spock.subscription;
sub_id | sub_name | sub_slot_name | sub_replication_sets
------------+----------+--------------------------+---------------------------------------
3293941396 | sub_n1n2 | spk_lcdb_n2_sub_n1n2 | {default,default_insert_only,ddl_sql}
1919895280 | sub_n1n3 | spk_lcdb_n3_sub_n1n3 | {default,default_insert_only,ddl_sql}
(2 rows)
The sub_replication_sets
column shown above displays the system-created replication sets. You can add custom replication sets with the spock repset-create
and spock sub-add-repset
commands.
Testing Replication
Now, if you update a row on n1
, you should see the update to the same row on n2
and n3
.
On n1
:
db_name=# SELECT * FROM pgbench_tellers WHERE tid = 1;
tid | bid | tbalance | filler
-----+-----+----------+--------
1 | 1 | 0 |
(1 row)
db_name=# UPDATE pgbench_tellers SET filler = 'test' WHERE tid = 1;
UPDATE 1
Check n2
and n3
:
db_name=# SELECT * FROM pgbench_tellers WHERE tid = 1;
tid | bid | tbalance | filler
-----+-----+----------+--------------------------------------------------
1 | 1 | 0 | test
(1 row)
You can also use pgbench to exercise replication; exit psql:
db_name=# exit
Then, run the following command on all nodes at the same time to run pgbench for one minute.
pgbench -R 100 -T 60 -n --port=port_number db_name
When you connect with psql and check the results on both nodes, you'll see that the sum of the tbalance
columns match on both pgbench_tellers
tables. Without the conflict-free delta-apply columns, each conflict would have resulted in accepting the first in, potentially leading to sums that do not match between nodes.
n1
:
db_name=# SELECT SUM(tbalance) FROM pgbench_tellers;
sum |
------+
-101244
(1 row)
n2
:
db_name=# SELECT SUM(tbalance) FROM pgbench_tellers;
sum |
------+
-101244
(1 row)
n3
:
db_name=# SELECT SUM(tbalance) FROM pgbench_tellers;
sum |
------+
-101244
(1 row)