Installing and Configuring the Spock Extension
The Spock extension can be installed with pgEdge Platform or built from source.
Installing Spock with pgEdge Platform
The latest Spock extension is automatically installed and created with any pgEdge Platform installation. A pgEdge platform deployment provides quick and easy access to:
- the latest minor version of your preferred PostgreSQL version.
- Spock functions and procedures.
- ACE (the Active Consistency Engine) consistency monitoring.
- Snowflake sequences.
- pre-installed pgBackRest for simplified backup and restore capabilities.
- LOLOR large object support.
- management tools for your active-active distributed cluster.
pgEdge software is open-source and available for review at Github (opens in a new tab).
pgEdge Platform deployments also provide a simplified upgrade path so you can take advantage of important new features quickly and easily.
Creating the Extension from Source
Spock is available as an open-source project (opens in a new tab) that you can build from source code distributed at the Github project page. The Spock extension must be built on a version-specific build of PostgreSQL that is patched to provide 'hooks' so spock
can access the Postgres engine. When building Spock from source, you'll need to first build the PostgreSQL database from the source available at the Postgres project (opens in a new tab), applying version-specific .diff
files from the patches
directory (also in the spock
repository). To apply a patch, use the command:
patch -p1 < path_to_patch/patch_name
After applying version-specific patches, you can configure, make
, and make install
the Postgres server as described in the PostgreSQL documentation (opens in a new tab). When the build completes, add the location of your pg_config
file to your PATH
variable:
export PATH=path_to_pg_config_file
Then, build the Spock extension from source code. Before building the extension, copy the files in the version-specific compatXX
file (where XX is your Postgres version) in the spock repository into the base directory of the repository. After copying the files, use a build process much like any other PostgreSQL extension; you will need to make and make-install the code.
Then, update your PostgreSQL postgresql.conf
file, setting:
shared_preload_libraries = 'spock'
track_commit_timestamp = on # needed for conflict resolution
Then, connect to the server and use the CREATE EXTENSION
command to install the spock extension on each node in the database you wish to replicate:
CREATE EXTENSION spock;
Basic Configuration and Usage without pgEdge Platform
pgEdge Platform simplifies installation and management of a distributed multi-master active-active replication cluster. When deployed with pgEdge Platform, replication clusters are automatically configured and deployed with the Spock extension, ACE extension, Snowflake extension and more.
If you're not deploying a cluster with pgEdge Platform, you will need to configure your PostgreSQL server to support logical decoding:
wal_level = 'logical' max_worker_processes = 10 # one per database needed on provider node
one per node needed on subscriber node
max_replication_slots = 10 # one per node needed on provider node max_wal_senders = 10 # one per node needed on provider node shared_preload_libraries = 'spock' track_commit_timestamp = on # needed for conflict resolution
Also, you will need to configure your pg_hba.conf
file to allow logical replication connections from localhost. Logical replication connections are treated by pg_hba.conf
as regular connections to the provider database.
Then, use the spock.node_create
command to create the provider node:
SELECT spock.node_create( node_name := 'provider1', dsn := 'host=providerhost port=5432 dbname=db' );
Then, add the tables in the public
schema to the default
replication set.
SELECT spock.repset_add_all_tables('default', ARRAY['public']);
Once the provider node is configured, you can add subscribers. First create the subscriber node:
SELECT spock.node_create( node_name := 'subscriber1', dsn := 'host=thishost port=5432 dbname=db' );
Then, create the subscription which will start synchronization and replication on the subscriber node:
SELECT spock.sub_create( subscription_name := 'subscription1', provider_dsn := 'host=providerhost port=5432 dbname=db' );
SELECT spock.sub_wait_for_sync('subscription1');
Advanced Configuration Options for Spock
You can use the following configuration parameters (GUCs) on the psql command line with a SET
statement or in a PostgreSQL configuration file (like postgresql.conf
) to configure a Spock installation.
spock.allow_ddl_from_functions
spock.allow_ddl_from_functions
enables spock to automatically replicate DDL statements that are called within functions to also be automatically replicated. This can be turned off
if these functions are expected to run on every node. When this is set to off
, statements replicated from functions adhere to the same rule previously described for include_ddl_repset
. If a table possesses a defined primary key, it will be added into the default
replication set; alternatively, they will be added to the default_insert_only
replication set.
spock.batch_inserts
spock.batch_inserts
tells Spock to use batch insert mechanism if possible. The batch mechanism uses PostgreSQL internal batch insert mode which is also used by COPY
command.
spock.channel_counters
spock.channel_counters
is a boolean value (the default is true
) that enables or disables the Spock channel statistic information collection. This option can only be set when the postmaster starts.
spock.check_all_uc_indexes
This feature is experimental and OFF
by default. Use this feature at your own risk.
This parameter enhances conflict resolution during INSERT
operations in scenarios where a row exists that meets unique constraints defined on the table (rather than primary keys or replication identity).
If this GUC is enabled
, Spock will continue to check unique constraint indexes, after checking the primary key / replica identity index. Only one conflict will be resolved, using Last-Write-Wins logic. This includes the primary key / replica identity index. If a second conflict occurs, an exception is recorded in the spock.exception_log
table.
Partial unique constraints are supported, but nullable unique constraints are not. Deferable constraints are not supported, are filtered out and are not checked, and therefore may still cause an exception during the final commit.
Unique constraint indexes are checked in OID order, after the primary key. It is therefore possible for the resolution of a unique constraint to change the primary key value.
Child tables / foreign keys are not checked or handled. Therefore, if resolving a unique constraint changes the primary key value, you can inadvertently create orphaned foreign key records.
spock.conflict_resolution
spock.conflict_resolution
sets the resolution method for any detected conflicts between local data and incoming changes. Possible values are:
error
- the replication will stop on error if conflict is detected and manual action is needed for resolvingapply_remote
- always apply the change that's conflicting with local datakeep_local
- keep the local version of the data and ignore the conflicting change that is coming from the remote nodelast_update_wins
- the version of data with newest commit timestamp will be kept (this can be either local or remote version)
To enable conflict resolution, the track_commit_timestamp
PostgreSQL setting must be enabled.
spock.deny_all_ddl
spock.deny_all_ddl
is a boolean value (the default is false
) that enables or disables the execution of DDL statements within a Spock configured cluster. This option can be set at postmaster startup, with the SIGHUP mechanism, or on the command line with SQL if you're a superuser.
spock.enable_ddl_replication
spock.enable_ddl_replication
enables automatic replication of ddl statements through the default
replication set.
spock.exception_behaviour
Use this GUC to specify the commit behavior of pgEdge platform when it encounters an ERROR within a transaction:
transdiscard
(the default) - Setspock.exception_behaviour
totransdiscard
to instruct the server to discard a transaction if any operation within that transaction returns anERROR
.discard
- Setspock.exception_behaviour
todiscard
to continue processing (and replicating) without interrupting server use. The server will commit any operation (any portion of a transaction) that does not return anERROR
statement.sub_disable
- Setspock.exception_behaviour
tosub_disable
to disable the subscription for the node on which the exception was reported. Transactions for the disabled node will be added to a queue that is written to the WAL log file; when the subscription is enabled, replication will resume with the transaction that caused the exception, followed by the other queued transactions. You can use thespock.alter_sub_skip_lsn
function to skip the transaction that caused the exception and resume processing with the next transaction in the queue.
Note that the value you choose for spock.exception_behaviour
could potentially result in a large WAL log if transactions are allowed to accumulate.
spock.exception_logging
Use this GUC to specify which operations/transactions are written to the exception log table:
all
(the default) - Setspock.exception_logging
toall
to instruct the server to record all transactions that contain one or more failed operations in thespock.exception_log
table. Note that this setting will log all operations that are part of a transaction if one operation fails.discard
- Add a row to thespock.exception_log
table for any discarded operation; successful transactions are not logged.none
- Instructs the server to not log any operation or transactions to the exception log table.
spock.exception_replay_queue_size
When Spock encounters a replication exception, it attempts to resolve the exception by entering exception-handling mode, based on the value of spock.exception_behaviour
. Spock then writes any transaction up to a default size of 4MB
to memory, and the apply worker replays the transaction from memory. This provides a massive speed and performance increase in the handling of the vast majority of exceptions. The memory size is configurable with spock.exception_replay_queue_size
.
spock.extra_connection_options
You can use the spock.extra_connection_options
parameter in the postgresql.conf
file to assign connection options that apply to all connections made by Spock. This can be a useful place to set up custom keepalive options, etc.
spock
defaults to enabling TCP keepalives to ensure that it notices when the upstream server disappears unexpectedly. To disable them add keepalives = 0
to spock.extra_connection_options
.
spock.include_ddl_repset
spock.include_ddl_repset
enables spock to automatically add tables to replication sets at the time they are created on each node. Tables with primary keys will be added to the default
replication set, and tables without primary keys will be added to the default_insert_only
replication set. Altering a table to add or remove a primary key will make the correct adjustment to which replication set the table is part of. Setting a table to unlogged will remove it from replication. Detaching a partition will not remove it from replication.
spock.save_resolutions
spock.save_resolutions
is a boolean value (the default is false
) that logs all conflict resolutions to the spock.resolutions table
. This option can only be set when the postmaster starts.
spock.stats_max_entries
spock.stats_max_entries
specifies the maximum number of entries that can be stored into the Spock channel statistic. This option can only be set when the postmaster starts. The parameter accepts values from -1
to INT_MAX
(the default is -1
).
spock.temp_directory
spock.temp_directory
defines the system path where temporary files needed for schema synchronization are written.
This path needs to exist and be writable by the user running PostgreSQL. The default is empty
, which tells Spock to use the default temporary directory based on environment or operating system settings.