Using a json File to Deploy a Remote Cluster

Using a .json File to Deploy a Remote Cluster

In this tutorial, we'll walk you through the process of using pgEdge Platform on a console host to create and modify a .json file that specifies your cluster preferences, and then deploy that cluster on remote hosts. The deployment validates the SSH connection with each node, creates the replication sets and subscriptions, and ensures that communications between the nodes works as expected.

Installing the pgEdge Platform

Generate an SSH keypair (opens in a new tab). After creating the keypair, copy the private half of a key pair to ~/.ssh (for example, id_rsa.pem) on the console host, and the public half of the key pair to ~/.ssh on each node host (for example, id_rsa.pub).

Set SELinux to permissive or disabled mode (opens in a new tab) on any PostgreSQL host.

Open any firewalls that could obstruct access between the console and the remote hosts.

On the console host, you'll install pgEdge Platform; you can install pgEdge Platform with the command:

python3 -c "$(curl -fsSL https://pgedge-download.s3.amazonaws.com/REPO/install.py)"

Paste the command into your command line client and press Return.

Creating a Cluster Configuration File

After installing the pgEdge Platform, use the pgedge cluster json-template command to create a .JSON file that describes the configuration of your replication cluster. Move into the pgedge directory, and invoke the command:

./pgedge cluster json-template cluster_name db_name node_count db_superuser password pg_version port

When you invoke the command, specify the following values for the arguments:

cluster_name is the name of the cluster. A directory with this same name will be created in the cluster directory; the file describing the cluster configuration will be named cluster_name.json.

db_name is the name of your PostgreSQL database.

node_count specifies the number of nodes that will be in the cluster.

db_superuser specifies the username of the database superuser created for this database.

password is the password of the database superuser.

pg_version specifies the PostgreSQL version of the database; choose from versions 14, 15, and 16.

port specifies the port number of the first node created in this cluster.

For example, the following command:

./pgedge cluster json-template demo lcdb 3 lcusr 1safepassword 16 6432

Creates a file named demo.json in cluster/demo directory that describes a 3 node cluster that replicates a PostgreSQL 16 database named lcdb. The first node of the cluster (n1) is listening on port 6432; when you edit the .JSON file, you'll specify the port numbers that will be used by n2 and n3. The database superuser is named lcusr, and the associated password is 1safepassword.

Edit the .JSON file, providing:

  • The name of a non-root operating system user in the os_user property.
  • The path to and name of the SSH key file on the console host in the ssh_key property.
  • A unique name for each node in the replication cluster in the nodes clause's name property; in our example, we've used n1, n2, and n3.
  • The IP addresses for each node; add the public IP address in the ip_address property, and the private IP address in the ip_address_private property.
  • The PostgreSQL listener port in the port property.
  • The installation path of pgEdge Platform in the path property.

The resulting demo.json file will look like this:

{
  "name": "demo",
  "style": "remote",
  "create_date": "2024-03-07T16:14:30.864336",
  "remote": {
    "os_user": "susan",
    "ssh_key": "~/.ssh/smd_key_pair.pem"
  },
  "database": {
    "databases": [
      {
        "username": "lcusr",
        "password": "1safepassword",
        "name": "lcdb"
      }
    ],
    "pg_version": 16
  },
  "node_groups": {
    "remote": [
      {
        "region": "",
        "availability_zones": "",
        "instance_type": "",
        "nodes": [
          {
            "name": "n1",
            "is_active": true,
            "ip_address": "18.216.182.44",
            "ip_address_private": "10.1.0.44",
            "port": 6432,
            "path": "~/pgedge/demo"
          }
        ]
      },
      {
        "region": "",
        "availability_zones": "",
        "instance_type": "",
        "nodes": [
          {
            "name": "n2",
            "is_active": true,
            "ip_address": "54.241.206.173",
            "ip_address_private": "10.2.0.102",
            "port": 6432,
            "path": "~/pgedge/demo"
          }
        ]
      },
      {
        "region": "",
        "availability_zones": "",
        "instance_type": "",
 
        "nodes": [
          {
            "name": "n3",
            "is_active": true,
            "ip_address": "34.212.52.56",
            "ip_address_private": "10.3.0.46",
            "port": 6432,
            "path": "~/pgedge/demo"
          }
        ]
      }
    ]
  }
}
 

Validating a .json File

You can use the pgedge cluster json-validate command to verify that your .JSON file is structured properly. The syntax is:

    ./pgedge cluster json-validate cluster_name

For example, the following command validates the .json file that creates a cluster named demo.

    ./pgedge cluster json-validate demo
    JSON defines a 3 node cluster

Note: The validate command checks only the structure of the .json file; it does not validate the values you provide.

Deploying a Remote Cluster from a .json File

After updating the cluster_name.json file, you can invoke the pgedge cluster init command on the console host to deploy the defined cluster. The syntax is:

    ./pgedge cluster init cluster_name

The command installs pgEdge Platform and PostgreSQL on each node, creates the initial database, installs the spock and snowflake extensions, and creates the default replication set and subscriptions.

For example, the following command deploys the sample demo.json file created above:

./pgedge cluster init demo

The CLI displays confirmation of node and subscription creation as it deploys the cluster. When cluster deployment completes, you need to create one or more tables in the database, and use the pgedge spock repset-add-table to add the table to the default replication set to start replication.

Adding Tables to the default Replication Set

For replication to begin, you will need to add tables to the default replication set; for this example, we'll use pgbench to create those tables. When you open pgbench or psql, specify the port number and database name to ensure you're working on the correct node.

Before starting, source the PostgreSQL environment variables on each node to add pgbench and psql to your OS PATH; for example:

source pg16/pg16.env

Then, use pgbench to set up a very simple four-table database. At the OS command line, create the pgbench tables in your database (db_name) with the pgbench command. You must create the tables on each node in your replication cluster:

pgbench -i --port=port_number db_name

Then, connect to each node with the psql client:

psql --port=port_number -d=db_name

Once connected, alter the numeric columns, setting LOG_OLD_VALUE equal to true. This will make these numeric fields conflict-free delta-apply columns, ensuring that the value replicated is the delta of the committed changes (the old value plus or minus any new value) to a given record:

ALTER TABLE pgbench_accounts ALTER COLUMN abalance SET (LOG_OLD_VALUE=true);
ALTER TABLE pgbench_branches ALTER COLUMN bbalance SET (LOG_OLD_VALUE=true);
ALTER TABLE pgbench_tellers ALTER COLUMN tbalance SET (LOG_OLD_VALUE=true);

Then, exit psql:

db_name=# exit

On the OS command line for each node, use the pgedge spock repset-add-table command to add the tables to the system-created replication set (named default); the command is followed by your database name (db_name):

./pgedge spock repset-add-table default 'pgbench_*' db_name

The fourth table, pgbench_history, is excluded from the replication set because it does not have a primary key.

Checking the Configuration

On the psql command line, check the configuration on each node with the following SQL statements:

db_name=# SELECT * FROM spock.node;
  node_id  | node_name | location | country | info 
-----------+-----------+----------+---------+------
 193995617 | n3        |          |         | 
 673694252 | n1        |          |         | 
 560818415 | n2        |          |         | 
(3 rows)
 

and:

db_name=# SELECT sub_id, sub_name, sub_slot_name, sub_replication_sets  FROM spock.subscription;
   sub_id   | sub_name |      sub_slot_name       |         sub_replication_sets          
------------+----------+--------------------------+---------------------------------------
 3293941396 | sub_n1n2 | spk_lcdb_n2_sub_n1n2 | {default,default_insert_only,ddl_sql}
 1919895280 | sub_n1n3 | spk_lcdb_n3_sub_n1n3 | {default,default_insert_only,ddl_sql}
(2 rows)

The sub_replication_sets column shown above displays the system-created replication sets. You can add custom replication sets with the spock repset-create and spock sub-add-repset commands.

Testing Replication

Now, if you update a row on n1, you should see the update to the same row on n2 and n3.

On n1:

db_name=# SELECT * FROM pgbench_tellers WHERE tid = 1;
 tid | bid | tbalance | filler
-----+-----+----------+--------
   1 |   1 |    	0 |
 (1 row)
db_name=# UPDATE pgbench_tellers SET filler = 'test' WHERE tid = 1;
UPDATE 1

Check n2 and n3:

db_name=# SELECT * FROM pgbench_tellers WHERE tid = 1;
 tid | bid | tbalance | filler  	 
-----+-----+----------+--------------------------------------------------
   1 |   1 |    	0 | test                               
(1 row)

You can also use pgbench to exercise replication; exit psql:

db_name=# exit

Then, run the following command on all nodes at the same time to run pgbench for one minute.

pgbench -R 100 -T 60 -n --port=port_number db_name

When you connect with psql and check the results on both nodes, you'll see that the sum of the tbalance columns match on both pgbench_tellers tables. Without the conflict-free delta-apply columns, each conflict would have resulted in accepting the first in, potentially leading to sums that do not match between nodes.

n1:

db_name=# SELECT SUM(tbalance) FROM pgbench_tellers;
  sum  |
 ------+
 -101244
(1 row)

n2:

db_name=# SELECT SUM(tbalance) FROM pgbench_tellers;
  sum  |
 ------+
 -101244
(1 row)

n3:

db_name=# SELECT SUM(tbalance) FROM pgbench_tellers;
  sum  |
 ------+
 -101244
(1 row)