Using a .json File to Deploy a Remote Cluster
In this tutorial, we'll walk you through the process of using pgEdge Platform on a console host to create and modify a .json file that specifies your cluster preferences, and then deploy that cluster on remote hosts. The deployment validates the SSH connection with each node, creates the replication sets and subscriptions, and ensures that communications between the nodes works as expected.
Installing the pgEdge Platform
Generate an SSH keypair (opens in a new tab). After creating the keypair, copy the private half of a key pair to ~/.ssh
(for example, id_rsa.pem
) on the console host, and the public half of the key pair to ~/.ssh on each node host (for example, id_rsa.pub
).
Set SELinux to permissive
or disabled
mode (opens in a new tab) on any PostgreSQL host.
Open any firewalls that could obstruct access between the console and the remote hosts.
On the console host, you'll install pgEdge Platform; you can install pgEdge Platform with the command:
python3 -c "$(curl -fsSL https://pgedge-download.s3.amazonaws.com/REPO/install.py)"
Paste the command into your command line client and press Return
.
Creating a Cluster Configuration File
After installing the pgEdge Platform, use the pgedge cluster json-template
command to create a .JSON file that describes the configuration of your replication cluster. Move into the pgedge
directory, and invoke the command:
./pgedge cluster json-template cluster_name db_name node_count db_superuser password pg_version port
When you invoke the command, specify the following values for the arguments:
cluster_name
is the name of the cluster. A directory with this same name will be created in the cluster
directory; the file describing the cluster configuration will be named cluster_name.json
.
db_name
is the name of your PostgreSQL database.
node_count
specifies the number of nodes that will be in the cluster.
db_superuser
specifies the username of the database superuser created for this database.
password
is the password of the database superuser.
pg_version
specifies the PostgreSQL version of the database; choose from versions 14
, 15
, and 16
.
port
specifies the port number of the first node created in this cluster.
For example, the following command:
./pgedge cluster json-template demo lcdb 3 lcusr 1safepassword 16 6432
Creates a file named demo.json
in cluster/demo
directory that describes a 3
node cluster that replicates a PostgreSQL 16
database named lcdb
. The first node of the cluster (n1
) is listening on port 6432
; when you edit the .JSON file, you'll specify the port numbers that will be used by n2
and n3
. The database superuser is named lcusr
, and the associated password is 1safepassword
.
Edit the .JSON file, providing:
- The name of a non-root operating system user in the
os_user
property. - The path to and name of the SSH key file on the console host in the
ssh_key
property. - A unique name for each node in the replication cluster in the
nodes
clause'sname
property; in our example, we've usedn1
,n2
, andn3
. - The IP addresses for each node; add the public IP address in the
ip_address
property, and the private IP address in theip_address_private
property. - The PostgreSQL listener port in the
port
property. - The installation path of pgEdge Platform in the
path
property.
The resulting demo.json
file will look like this:
{
"name": "demo",
"style": "remote",
"create_date": "2024-03-07T16:14:30.864336",
"remote": {
"os_user": "susan",
"ssh_key": "~/.ssh/smd_key_pair.pem"
},
"database": {
"databases": [
{
"username": "lcusr",
"password": "1safepassword",
"name": "lcdb"
}
],
"pg_version": 16
},
"node_groups": {
"remote": [
{
"region": "",
"availability_zones": "",
"instance_type": "",
"nodes": [
{
"name": "n1",
"is_active": true,
"ip_address": "18.216.182.44",
"ip_address_private": "10.1.0.44",
"port": 6432,
"path": "~/pgedge/demo"
}
]
},
{
"region": "",
"availability_zones": "",
"instance_type": "",
"nodes": [
{
"name": "n2",
"is_active": true,
"ip_address": "54.241.206.173",
"ip_address_private": "10.2.0.102",
"port": 6432,
"path": "~/pgedge/demo"
}
]
},
{
"region": "",
"availability_zones": "",
"instance_type": "",
"nodes": [
{
"name": "n3",
"is_active": true,
"ip_address": "34.212.52.56",
"ip_address_private": "10.3.0.46",
"port": 6432,
"path": "~/pgedge/demo"
}
]
}
]
}
}
Validating a .json File
You can use the pgedge cluster json-validate
command to verify that your .JSON file is structured properly. The syntax is:
./pgedge cluster json-validate cluster_name
For example, the following command validates the .json file that creates a cluster named demo
.
./pgedge cluster json-validate demo
JSON defines a 3 node cluster
Note: The validate command checks only the structure of the .json file; it does not validate the values you provide.
Deploying a Remote Cluster from a .json File
After updating the cluster_name.json
file, you can invoke the pgedge cluster init
command on the console host to deploy the defined cluster. The syntax is:
./pgedge cluster init cluster_name
The command installs pgEdge Platform and PostgreSQL on each node, creates the initial database, installs the spock and snowflake extensions, and creates the default
replication set and subscriptions.
For example, the following command deploys the sample demo.json
file created above:
./pgedge cluster init demo
The CLI displays confirmation of node and subscription creation as it deploys the cluster. When cluster deployment completes, you need to create one or more tables in the database, and use the pgedge spock repset-add-table
to add the table to the default
replication set to start replication.
Adding Tables to the default
Replication Set
For replication to begin, you will need to add tables to the default
replication set; for this example, we'll use pgbench to create those tables. When you open pgbench or psql, specify the port number and database name to ensure you're working on the correct node.
Before starting, source the PostgreSQL environment variables on each node to add pgbench and psql to your OS PATH; for example:
source pg16/pg16.env
Then, use pgbench to set up a very simple four-table database. At the OS command line, create the pgbench tables in your database (db_name
) with the pgbench
command. You must create the tables on each node in your replication cluster:
pgbench -i --port=port_number db_name
Then, connect to each node with the psql client:
psql --port=port_number -d=db_name
Once connected, alter the numeric columns, setting LOG_OLD_VALUE
equal to true
. This will make these numeric fields conflict-free delta-apply columns, ensuring that the value replicated is the delta of the committed changes (the old value plus or minus any new value) to a given record:
ALTER TABLE pgbench_accounts ALTER COLUMN abalance SET (LOG_OLD_VALUE=true);
ALTER TABLE pgbench_branches ALTER COLUMN bbalance SET (LOG_OLD_VALUE=true);
ALTER TABLE pgbench_tellers ALTER COLUMN tbalance SET (LOG_OLD_VALUE=true);
Then, exit psql:
db_name=# exit
On the OS command line for each node, use the pgedge spock repset-add-table
command to add the tables to the system-created replication set (named default
); the command is followed by your database name (db_name
):
./pgedge spock repset-add-table default 'pgbench_*' db_name
The fourth table, pgbench_history
, is excluded from the replication set because it does not have a primary key.
Checking the Configuration
On the psql command line, check the configuration on each node with the following SQL statements:
db_name=# SELECT * FROM spock.node;
node_id | node_name | location | country | info
-----------+-----------+----------+---------+------
193995617 | n3 | | |
673694252 | n1 | | |
560818415 | n2 | | |
(3 rows)
and:
db_name=# SELECT sub_id, sub_name, sub_slot_name, sub_replication_sets FROM spock.subscription;
sub_id | sub_name | sub_slot_name | sub_replication_sets
------------+----------+--------------------------+---------------------------------------
3293941396 | sub_n1n2 | spk_lcdb_n2_sub_n1n2 | {default,default_insert_only,ddl_sql}
1919895280 | sub_n1n3 | spk_lcdb_n3_sub_n1n3 | {default,default_insert_only,ddl_sql}
(2 rows)
The sub_replication_sets
column shown above displays the system-created replication sets. You can add custom replication sets with the spock repset-create
and spock sub-add-repset
commands.
Testing Replication
Now, if you update a row on n1
, you should see the update to the same row on n2
and n3
.
On n1
:
db_name=# SELECT * FROM pgbench_tellers WHERE tid = 1;
tid | bid | tbalance | filler
-----+-----+----------+--------
1 | 1 | 0 |
(1 row)
db_name=# UPDATE pgbench_tellers SET filler = 'test' WHERE tid = 1;
UPDATE 1
Check n2
and n3
:
db_name=# SELECT * FROM pgbench_tellers WHERE tid = 1;
tid | bid | tbalance | filler
-----+-----+----------+--------------------------------------------------
1 | 1 | 0 | test
(1 row)
You can also use pgbench to exercise replication; exit psql:
db_name=# exit
Then, run the following command on all nodes at the same time to run pgbench for one minute.
pgbench -R 100 -T 60 -n --port=port_number db_name
When you connect with psql and check the results on both nodes, you'll see that the sum of the tbalance
columns match on both pgbench_tellers
tables. Without the conflict-free delta-apply columns, each conflict would have resulted in accepting the first in, potentially leading to sums that do not match between nodes.
n1
:
db_name=# SELECT SUM(tbalance) FROM pgbench_tellers;
sum |
------+
-101244
(1 row)
n2
:
db_name=# SELECT SUM(tbalance) FROM pgbench_tellers;
sum |
------+
-101244
(1 row)
n3
:
db_name=# SELECT SUM(tbalance) FROM pgbench_tellers;
sum |
------+
-101244
(1 row)