Deploying a Cluster on your Local Host

Deploying a Cluster on your Localhost

In this tutorial, we'll walk you through the process of using pgEdge Platform to create and modify a .json file to specify your cluster preferences, and then deploy the cluster with a single command. This is possibly the simplest way to deploy a cluster for experimentation and test purposes.

Installing pgEdge Platform

Before starting this tutorial, you should prepare a local host or virtual machine (VM) running EL9 or Ubuntu 22.04. On the host, you should:

Then, install the pgEdge Platform with the command:

python3 -c "$(curl -fsSL https://pgedge-download.s3.amazonaws.com/REPO/install.py)"

Paste the command into your command line client and press Return.

Deploying the Cluster

After installing the pgEdge Platform, use the pgedge localhost cluster-create command to deploy a cluster (with the default replication set and inter-node subscriptions), and create a .json file that describes the cluster configuration. Move into the pgedge directory, and invoke the command:

./pgedge cluster define-localhost cluster_name node_count -d db_name -U db_superuser -P password -port1 port -pg pg_version 

When you invoke the command, specify the following values for the positional arguments:

cluster_name is the name of the cluster. A directory with this same name will be created in the cluster directory; the file describing the cluster configuration will be named cluster_name.json.

node_count specifies the number of nodes that will be in the cluster.

After providing the values for the positional arguments, provide any optional flags and their corresponding values:

-d db_name specifies the name of your PostgreSQL database; the default is lcdb.

-U db_superuser specifies the username of the database superuser created for this database; the default is lcusr.

-P password specifies the password of the database superuser.

-port1 port specifies the port number of the first node created in this cluster; as each subsequent node is created, the port number will increase by 1.

-pg pg_version specifies the PostgreSQL version of the database; choose from versions 14, 15, and 16.

Example

./pgedge localhost cluster-create demo 3 -d lcdb -U lcusr -P 1safepassword -port1 6432 -pg 16  

Deploys a cluster described by the demo.json file located in the cluster/demo directory. The file describes a 3 node cluster with a PostgreSQL 16 database named lcdb. The first node of the cluster (n1) is listening on port 6432; n2 is listening for database connections on 6433 and n3 is listening for connections on 6434. The database superuser is named lcusr, and the associated password is 1safepassword.

The demo.json file contains:

{
  "name": "demo",
  "style": "localhost",
  "create_date": "2024-03-01",
  "localhost": {
    "os_user": "susan",
    "ssh_key": ""
  },
  "database": {
    "databases": [
      {
        "username": "lcusr",
        "password": "1safepassword",
        "name": "lcdb"
      }
    ],
    "pg_version": 16
  },
  "node_groups": {
    "localhost": [
      {
        "nodes": [
          {
            "name": "n1",
            "is_active": true,
            "ip_address": "127.0.0.1",
            "port": 6432,
            "path": "/home/susan/work/pgedge/cluster/demo/n1"
          }
        ]
      },
      {
        "nodes": [
          {
            "name": "n2",
            "is_active": true,
            "ip_address": "127.0.0.1",
            "port": 6433,
            "path": "/home/susan/work/pgedge/cluster/demo/n2"
          }
        ]
      },
      {
        "nodes": [
          {
            "name": "n3",
            "is_active": true,
            "ip_address": "127.0.0.1",
            "port": 6434,
            "path": "/home/susan/work/pgedge/cluster/demo/n3"
          }
        ]
      }
    ]
  }
}
 

Starting Replication on the Cluster

For replication to begin, you will need to add tables to the default replication set; for our example, we'll use pgbench to add some tables. When you open pgbench or psql, specify the port number and database name to ensure you're working on the correct node.

Before starting, source the PostgreSQL environment variables on each node to add pgbench and psql to your OS PATH; for example to source the variables on n1, use the command:

source cluster/demo/n1/pgedge/pg16/pg16.env

Then, use pgbench to set up a very simple four-table database. At the OS command line, create the pgbench tables in your database (db_name) with the pgbench command. You must create the tables on each node in your replication cluster:

pgbench -i --port port_number db_name

Then, connect to each node with the psql client:

psql --port port_number -d db_name

Once connected, alter the numeric columns, setting LOG_OLD_VALUE equal to true. This will make these numeric fields conflict-free delta-apply columns, ensuring that the value replicated is the delta of the committed changes (the old value plus or minus any new value) to a given record:

ALTER TABLE pgbench_accounts ALTER COLUMN abalance SET (LOG_OLD_VALUE=true);
ALTER TABLE pgbench_branches ALTER COLUMN bbalance SET (LOG_OLD_VALUE=true);
ALTER TABLE pgbench_tellers ALTER COLUMN tbalance SET (LOG_OLD_VALUE=true);

Then, exit psql:

db_name=# exit

On the OS command line for each node, use the pgedge spock repset-add-table command to add the tables to the system-created replication set (named default); the command is followed by your database name (db_name):

./pgedge spock repset-add-table default 'pgbench_*' db_name

The fourth table, pgbench_history, is excluded from the replication set because it does not have a primary key.

Checking the Configuration

On the psql command line, check the configuration on each node with the following SQL statements:

db_name=# SELECT * FROM spock.node;
  node_id  | node_name | location | country | info 
-----------+-----------+----------+---------+------
 193995617 | n3        |          |         | 
 673694252 | n1        |          |         | 
 560818415 | n2        |          |         | 
(3 rows)
 

and:

db_name=# SELECT sub_id, sub_name, sub_slot_name, sub_replication_sets  FROM spock.subscription;
   sub_id   | sub_name |      sub_slot_name       |         sub_replication_sets          
------------+----------+--------------------------+---------------------------------------
 3293941396 | sub_n1n2 | spk_lcdb_n2_sub_n1n2 | {default,default_insert_only,ddl_sql}
 1919895280 | sub_n1n3 | spk_lcdb_n3_sub_n1n3 | {default,default_insert_only,ddl_sql}
(2 rows)
 

The sub_replication_sets column shown above displays the system-created replication sets. You can add custom replication sets with the spock repset-create and spock sub-add-repset commands.

Testing Replication

Now, if you update a row on n1, you should see the update to the same row on n2.

On n1:

db_name=# SELECT * FROM pgbench_tellers WHERE tid = 1;
 tid | bid | tbalance | filler
-----+-----+----------+--------
   1 |   1 |    	0 |
 (1 row)
db_name=# UPDATE pgbench_tellers SET filler = 'test' WHERE tid = 1;
UPDATE 1

Check n2 and n3:

db_name=# SELECT * FROM pgbench_tellers WHERE tid = 1;
 tid | bid | tbalance | filler  	 
-----+-----+----------+--------------------------------------------------
   1 |   1 |    	0 | test                               
(1 row)

You can also use pgbench to exercise replication; exit psql:

db_name=# exit

Then, run the following command on all nodes at the same time to run pgbench for one minute.

pgbench -R 100 -T 60 -n --port=port_number db_name

When you connect with psql and check the results on both nodes, you'll see that the sum of the tbalance columns match on both pgbench_tellers tables. Without the conflict-free delta-apply columns, each conflict would have resulted in accepting the first in, potentially leading to sums that do not match between nodes.

n1:

db_name=# SELECT SUM(tbalance) FROM pgbench_tellers;
  sum  |
 ------+
 -101244
(1 row)

n2:

db_name=# SELECT SUM(tbalance) FROM pgbench_tellers;
  sum  |
 ------+
 -101244
(1 row)

n3:

db_name=# SELECT SUM(tbalance) FROM pgbench_tellers;
  sum  |
 ------+
 -101244
(1 row)