Deploying a Cluster on your Localhost
In this tutorial, we'll walk you through the process of using pgEdge Platform to create and modify a .json file to specify your cluster preferences, and then deploy the cluster with a single command. This is possibly the simplest way to deploy a cluster for experimentation and test purposes.
Installing pgEdge Platform
Before starting this tutorial, you should prepare a local host or virtual machine (VM) running EL9 or Ubuntu 22.04. On the host, you should:
- Set SELinux to
permissive
ordisabled
mode (opens in a new tab), and reboot the host. - Configure passwordless sudo access for a non-root OS user.
- Configure passwordless ssh access for the same non-root OS user.
- Open any firewalls that could obstruct access between nodes.
Then, install the pgEdge Platform with the command:
python3 -c "$(curl -fsSL https://pgedge-download.s3.amazonaws.com/REPO/install.py)"
Paste the command into your command line client and press Return
.
Deploying the Cluster
After installing the pgEdge Platform, use the pgedge localhost cluster-create
command to deploy a cluster (with the default
replication set and inter-node subscriptions), and create a .json file that describes the cluster configuration. Move into the pgedge
directory, and invoke the command:
./pgedge localhost cluster-create cluster_name node_count -d db_name -U db_superuser -P password -port1 port -pg pg_version
When you invoke the command, specify the following values for the positional arguments:
cluster_name
is the name of the cluster. A directory with this same name will be created in the cluster
directory; the file describing the cluster configuration will be named cluster_name.json
.
node_count
specifies the number of nodes that will be in the cluster.
After providing the values for the positional arguments, provide any optional flags and their corresponding values:
-d db_name
specifies the name of your PostgreSQL database; the default is lcdb
.
-U db_superuser
specifies the username of the database superuser created for this database; the default is lcusr
.
-P password
specifies the password of the database superuser.
-port1 port
specifies the port number of the first node created in this cluster; as each subsequent node is created, the port number will increase by 1
.
-pg pg_version
specifies the PostgreSQL version of the database; choose from versions 14
, 15
, and 16
.
Example
./pgedge localhost cluster-create demo 3 -d lcdb -U lcusr -P 1safepassword -port1 6432 -pg 16
Deploys a cluster described by the demo.json
file located in the cluster/demo
directory. The file describes a 3
node cluster with a PostgreSQL 16
database named lcdb
. The first node of the cluster (n1
) is listening on port 6432
; n2
is listening for database connections on 6433
and n3
is listening for connections on 6434
. The database superuser is named lcusr
, and the associated password is 1safepassword
.
Starting Replication on the Cluster
For replication to begin, you will need to add tables to the default
replication set; for our example, we'll use pgbench to add some tables. When you open pgbench or psql, specify the port number and database name to ensure you're working on the correct node.
Before starting, source the PostgreSQL environment variables on each node to add pgbench and psql to your OS PATH; for example to source the variables on n1
, use the command:
source cluster/demo/n1/pgedge/pg16/pg16.env
Then, use pgbench to set up a very simple four-table database. At the OS command line, create the pgbench tables in your database (db_name
) with the pgbench
command. You must create the tables on each node in your replication cluster:
pgbench -i --port port_number db_name
Then, connect to each node with the psql client:
psql --port port_number -d db_name
Once connected, alter the numeric columns, making the numeric fields conflict-free delta-apply columns, ensuring that the value replicated is the delta of the committed changes (the old value plus or minus any new value) to a given record. If your cluster is configured to use Spock extension 4.0 (or later) use the commands:
ALTER TABLE pgbench_accounts ALTER COLUMN abalance SET (log_old_value=true, delta_apply_function=spock.delta_apply);
ALTER TABLE pgbench_branches ALTER COLUMN bbalance SET (log_old_value=true, delta_apply_function=spock.delta_apply);
ALTER TABLE pgbench_tellers ALTER COLUMN tbalance SET (log_old_value=true, delta_apply_function=spock.delta_apply);
If you're using an older version of the Spock extension (prior to 4.0), use the commands:
ALTER TABLE pgbench_accounts ALTER COLUMN abalance SET (log_old_value=true);
ALTER TABLE pgbench_branches ALTER COLUMN bbalance SET (log_old_value=true);
ALTER TABLE pgbench_tellers ALTER COLUMN tbalance SET (log_old_value=true);
Then, exit psql:
db_name=# exit
On the OS command line for each node (for example, in the pgedge/cluster/demo/n1/pgedge
directory), use the pgedge spock repset-add-table
command to add the tables to the system-created replication set (named default
); the command is followed by your database name (db_name
):
./pgedge spock repset-add-table default 'pgbench_*' db_name
The fourth table, pgbench_history
, is excluded from the replication set because it does not have a primary key.
Checking the Configuration
On the psql command line, check the configuration on each node with the following SQL statements:
db_name=# SELECT * FROM spock.node;
node_id | node_name | location | country | info
-----------+-----------+----------+---------+------
193995617 | n3 | | |
673694252 | n1 | | |
560818415 | n2 | | |
(3 rows)
and:
db_name=# SELECT sub_id, sub_name, sub_slot_name, sub_replication_sets FROM spock.subscription;
sub_id | sub_name | sub_slot_name | sub_replication_sets
------------+----------+--------------------------+---------------------------------------
3293941396 | sub_n1n2 | spk_lcdb_n2_sub_n1n2 | {default,default_insert_only,ddl_sql}
1919895280 | sub_n1n3 | spk_lcdb_n3_sub_n1n3 | {default,default_insert_only,ddl_sql}
(2 rows)
The sub_replication_sets
column shown above displays the system-created replication sets. You can add custom replication sets with the spock repset-create
and spock sub-add-repset
commands.
Testing Replication
Now, if you update a row on n1
, you should see the update to the same row on n2
.
On n1
:
db_name=# SELECT * FROM pgbench_tellers WHERE tid = 1;
tid | bid | tbalance | filler
-----+-----+----------+--------
1 | 1 | 0 |
(1 row)
db_name=# UPDATE pgbench_tellers SET filler = 'test' WHERE tid = 1;
UPDATE 1
Check n2
and n3
:
db_name=# SELECT * FROM pgbench_tellers WHERE tid = 1;
tid | bid | tbalance | filler
-----+-----+----------+--------------------------------------------------
1 | 1 | 0 | test
(1 row)
You can also use pgbench to exercise replication; exit psql:
db_name=# exit
Then, run the following command on all nodes at the same time to run pgbench for one minute.
pgbench -R 100 -T 60 -n --port=port_number db_name
When you connect with psql and check the results on both nodes, you'll see that the sum of the tbalance
columns match on both pgbench_tellers
tables. Without the conflict-free delta-apply columns, each conflict would have resulted in accepting the first in, potentially leading to sums that do not match between nodes.
n1
:
db_name=# SELECT SUM(tbalance) FROM pgbench_tellers;
sum |
------+
-101244
(1 row)
n2
:
db_name=# SELECT SUM(tbalance) FROM pgbench_tellers;
sum |
------+
-101244
(1 row)
n3
:
db_name=# SELECT SUM(tbalance) FROM pgbench_tellers;
sum |
------+
-101244
(1 row)