Remote Json - pgEdge

Using a .json File to Deploy a Remote Cluster

In this tutorial, we'll walk you through the process of using pgEdge Platform on a console host to create and modify a .json file that specifies your cluster preferences, and then deploy that cluster on remote hosts. The deployment validates the SSH connection with each node, creates the replication sets and subscriptions, and ensures that communications between the nodes works as expected.

Installing the pgEdge Platform

Before installing the pgEdge Platform, you will need to:

Generate an SSH keypair (opens in a new tab). After creating the keypair, copy the private half of a key pair to ~/.ssh (for example, id_rsa.pem) on the console host, and the public half of the key pair to ~/.ssh on each node host (for example, id_rsa.pub).
Set SELinux to permissive or disabled mode (opens in a new tab) on any PostgreSQL host.
Open any firewalls that could obstruct access between the console and the remote hosts.

After meeting the prerequisites, you can install pgEdge Platform with the command:

python3 -c "$(curl -fsSL https://pgedge-download.s3.amazonaws.com/REPO/install.py)"

Paste the command into your command line client and press Return.

Creating a Cluster Configuration File

After installing the pgEdge Platform, use the pgedge cluster json-template command to create a .JSON file that describes the configuration of your replication cluster. Move into the pgedge directory, and invoke the command:

./pgedge cluster json-template cluster_name db_name node_count db_superuser password pg_version port

When you invoke the command, specify the following values for the arguments:

cluster_name is the name of the cluster. A directory with this same name will be created in the cluster directory; the file describing the cluster configuration will be named cluster_name.json.
db_name is the name of your PostgreSQL database.
node_count specifies the number of nodes that will be in the cluster.
db_superuser specifies the username of the database superuser created for this database.
password is the password of the database superuser.
pg_version specifies the PostgreSQL version of the database; choose from versions 15,16, and 17.
port specifies the port number of the first node created in this cluster.

When the command completes, edit the .JSON file (located in pgedge/cluster/cluster_name/cluster_name.json), confirming your deployment details and adding any missing information. For detailed information about creating and modifying a JSON cluster configuration file, see Managing a Cluster's JSON File.

Validating the .json File

You can use the pgedge cluster json-validate command to verify that your .JSON file is structured properly. The syntax is:

    ./pgedge cluster json-validate cluster_name

For example, the following command validates the .json file that creates a cluster named demo.

    ./pgedge cluster json-validate demo
    JSON defines a 3 node cluster

Note: The validate command checks only the structure of the .json file; it does not validate the values you provide.

Deploying a Remote Cluster from a .json File

After updating and validating the cluster_name.json file, you can invoke the pgedge cluster init command on the console host to deploy the defined cluster. The syntax is:

    ./pgedge cluster init cluster_name

The command installs pgEdge Platform and PostgreSQL on each node, creates the initial database, installs the spock and snowflake extensions, and creates the default replication set and subscriptions. For example, the following command deploys the sample demo.json file created above:

./pgedge cluster init demo

The CLI displays confirmation of node and subscription creation as it deploys the cluster.

## Loading cluster 'demo' json definition file
July 11, 2024, 14:56:04: 127.0.0.1 : n1 - Checking ssh on 127.0.0.1                                 [OK]
July 11, 2024, 14:56:05: 127.0.0.1 : n2 - Checking ssh on 127.0.0.1                                 [OK]
July 11, 2024, 14:56:05: 127.0.0.1 : n3 - Checking ssh on 127.0.0.1                                 [OK]
 
 
################################################################################
#              REPO: https://pgedge-upstream.s3.amazonaws.com/REPO
#      Cluster Name: demo
#         Node Type: Primary
# Primary Node Name:  
#              Name: n1
#              Host: 127.0.0.1
#              Port: 6432
#              Path: /home/ec2-user/work/platform_test/nc/pgedge/cluster/demo/n1
#          Database: lcdb
#     Database User: lcusr
#   Number of Nodes: 3
################################################################################
July 11, 2024, 14:56:15: 127.0.0.1 : n1 - Installing pgedge                                         [OK]
July 11, 2024, 14:57:03: 127.0.0.1 : n1 - Setup pgedge                                              [OK]
 
 
################################################################################
#              REPO: https://pgedge-upstream.s3.amazonaws.com/REPO
#      Cluster Name: demo
#         Node Type: Primary
# Primary Node Name:  
#              Name: n2
#              Host: 127.0.0.1
#              Port: 6433
#              Path: /home/ec2-user/work/platform_test/nc/pgedge/cluster/demo/n2
#          Database: lcdb
#     Database User: lcusr
#   Number of Nodes: 3
################################################################################
July 11, 2024, 14:57:22: 127.0.0.1 : n2 - Installing pgedge                                         [OK]
July 11, 2024, 14:58:10: 127.0.0.1 : n2 - Setup pgedge                                              [OK]
 
 
################################################################################
#              REPO: https://pgedge-upstream.s3.amazonaws.com/REPO
#      Cluster Name: demo
#         Node Type: Primary
# Primary Node Name:  
#              Name: n3
#              Host: 127.0.0.1
#              Port: 6434
#              Path: /home/ec2-user/work/platform_test/nc/pgedge/cluster/demo/n3
#          Database: lcdb
#     Database User: lcusr
#   Number of Nodes: 3
################################################################################
July 11, 2024, 14:58:29: 127.0.0.1 : n3 - Installing pgedge                                         [OK]
July 11, 2024, 14:59:18: 127.0.0.1 : n3 - Setup pgedge                                              [OK]
July 11, 2024, 14:59:21: 127.0.0.1 : n1 - Creating node n1                                          [OK]
July 11, 2024, 14:59:24: 127.0.0.1 : n2 - Creating node n2                                          [OK]
July 11, 2024, 14:59:26: 127.0.0.1 : n3 - Creating node n3                                          [OK]
July 11, 2024, 14:59:28: 127.0.0.1 : n1 - Creating subscriptions sub_n1n2                           [OK]
July 11, 2024, 14:59:29: 127.0.0.1 : n1 - Creating subscriptions sub_n1n3                           [OK]
July 11, 2024, 14:59:30: 127.0.0.1 : n2 - Creating subscriptions sub_n2n1                           [OK]
July 11, 2024, 14:59:31: 127.0.0.1 : n2 - Creating subscriptions sub_n2n3                           [OK]
July 11, 2024, 14:59:32: 127.0.0.1 : n3 - Creating subscriptions sub_n3n1                           [OK]
July 11, 2024, 14:59:33: 127.0.0.1 : n3 - Creating subscriptions sub_n3n2                           [OK]
July 11, 2024, 14:59:34: 127.0.0.1 : n3 - Listing spock nodes                                       [OK]
 
 
[
  {
    "node_id": 673694252,
    "node_name": "n1"
  },
  {
    "node_id": 560818415,
    "node_name": "n2"
  },
  {
    "node_id": 193995617,
    "node_name": "n3"
  }
]

When cluster deployment completes, you can create one or more tables in the database, and use the pgedge spock repset-add-table to add the table to the default replication set to start replication.

Adding Tables to the `default` Replication Set

For replication to begin, you will need to add tables to the default replication set; for this example, we'll use pgbench to create those tables. When you open pgbench or psql, specify the port number and database name to ensure you're working on the correct node. Before starting, source the PostgreSQL environment variables on each node to add pgbench and psql to your OS PATH; for example:

source cluster/demo/n1/pgedge/pg16/pg16.env

Then, use pgbench to set up a very simple four-table database. At the OS command line, create the pgbench tables in your database (db_name) with the pgbench command. You must create the tables on each node in your replication cluster:

pgbench -i --port=port_number db_name

Then, connect to each node with the psql client:

psql -p port_number -d db_name

Once connected, alter the numeric columns, making the numeric fields conflict-free delta-apply columns, ensuring that the value replicated is the delta of the committed changes (the old value plus or minus any new value) to a given record. If your cluster is configured to use Spock extension 4.0 (or later) use the commands:

ALTER TABLE pgbench_accounts ALTER COLUMN abalance SET (log_old_value=true, delta_apply_function=spock.delta_apply);
ALTER TABLE pgbench_branches ALTER COLUMN bbalance SET (log_old_value=true, delta_apply_function=spock.delta_apply);
ALTER TABLE pgbench_tellers ALTER COLUMN tbalance SET (log_old_value=true, delta_apply_function=spock.delta_apply);

If you're using an older version of the Spock extention (prior to 4.0), use the commands:

ALTER TABLE pgbench_accounts ALTER COLUMN abalance SET (log_old_value=true);
ALTER TABLE pgbench_branches ALTER COLUMN bbalance SET (log_old_value=true);
ALTER TABLE pgbench_tellers ALTER COLUMN tbalance SET (log_old_value=true);

Then, exit psql:

db_name=# \q

On the OS command line for each node, use the pgedge spock repset-add-table command to add the tables to the system-created replication set (named default); the command is followed by your database name (db_name):

./pgedge spock repset-add-table default 'pgbench_*' db_name
Adding table public.pgbench_accounts to replication set default.
Adding table public.pgbench_branches to replication set default.
Adding table public.pgbench_history to replication set default.
⚠ table pgbench_history cannot be added to replication set default
⚠ DETAIL:  table does not have PRIMARY KEY and given replication set is configured to replicate UPDATEs and/or DELETEs
Adding table public.pgbench_tellers to replication set default.

The fourth table, pgbench_history, is excluded from the replication set because it does not have a primary key.

Checking the Configuration

On the psql command line, check the configuration on each node with the following SQL statements:

db_name=# SELECT * FROM spock.node;
  node_id  | node_name | location | country | info 
-----------+-----------+----------+---------+------
 193995617 | n3        |          |         | 
 673694252 | n1        |          |         | 
 560818415 | n2        |          |         | 
(3 rows)

and:

db_name=# SELECT sub_id, sub_name, sub_slot_name, sub_replication_sets  FROM spock.subscription;
   sub_id   | sub_name |      sub_slot_name       |         sub_replication_sets          
------------+----------+--------------------------+---------------------------------------
 3293941396 | sub_n1n2 | spk_lcdb_n2_sub_n1n2 | {default,default_insert_only,ddl_sql}
 1919895280 | sub_n1n3 | spk_lcdb_n3_sub_n1n3 | {default,default_insert_only,ddl_sql}
(2 rows)

The sub_replication_sets column shown above displays the system-created replication sets. You can add custom replication sets with the spock repset-create and spock sub-add-repset commands.

Testing Replication

Now, if you update a row on n1, you should see the update to the same row on n2 and n3.

On n1:

db_name=# SELECT * FROM pgbench_tellers WHERE tid = 1;
 tid | bid | tbalance | filler
-----+-----+----------+--------
   1 |   1 |    	0 |
 (1 row)

db_name=# UPDATE pgbench_tellers SET filler = 'test' WHERE tid = 1;
UPDATE 1

Check n2 and n3:

db_name=# SELECT * FROM pgbench_tellers WHERE tid = 1;
 tid | bid | tbalance | filler  	 
-----+-----+----------+--------------------------------------------------
   1 |   1 |    	0 | test                               
(1 row)

You can also use pgbench to exercise replication; exit psql:

db_name=# exit

Then, run the following command on all nodes at the same time to run pgbench for one minute.

pgbench -R 100 -T 60 -n --port=port_number db_name

When you connect with psql and check the results on both nodes, you'll see that the sum of the tbalance columns match on both pgbench_tellers tables. Without the conflict-free delta-apply columns, each conflict would have resulted in accepting the first in, potentially leading to sums that do not match between nodes.

n1:

db_name=# SELECT SUM(tbalance) FROM pgbench_tellers;
  sum  |
 ------+
 -101244
(1 row)

n2:

db_name=# SELECT SUM(tbalance) FROM pgbench_tellers;
  sum  |
 ------+
 -101244
(1 row)

n3:

db_name=# SELECT SUM(tbalance) FROM pgbench_tellers;
  sum  |
 ------+
 -101244
(1 row)

Manually Configuring a Cluster Deploying a Cluster on your Local Host