Configuring Spock
Add the following postgresql.conf settings on each node in your Spock
replication scenario before creating the Spock extension:
wal_level = 'logical'
max_worker_processes = 10 # one per database needed on provider node
# one per node needed on subscriber node
max_replication_slots = 10 # one per node needed on provider node
max_wal_senders = 10 # one per node needed on provider node
shared_preload_libraries = 'spock'
track_commit_timestamp = on # needed for conflict resolution
After modifying the parameters and restarting the Postgres server with your OS-specific restart command, connect with psql and create the Spock extension:
CREATE EXTENSION spock;
You will also need to modify your
pg_hba.conf file
to allow logical replication connections from localhost and between nodes.
Logical replication connections are treated by pg_hba.conf as regular
connections to the provider database.
After modifying the pg_hba.conf file on each node, restart the server to apply the changes.
Advanced Configuration Options for Spock
You can use the following configuration parameters (GUCs) on the psql
command line with a SET statement or in a Postgres configuration file
(like postgresql.conf) to configure a Spock installation.
spock.allow_ddl_from_functions
spock.allow_ddl_from_functions enables spock to automatically replicate
DDL statements that are called within functions to also be automatically
replicated. This can be turned off if these functions are expected to run
on every node. When this is set to off, statements replicated from
functions adhere to the same rule previously described for
include_ddl_repset. If a table possesses a defined primary key, it will be
added into the default replication set; alternatively, they will be added
to the default_insert_only replication set.
spock.batch_inserts
spock.batch_inserts tells Spock to use batch insert mechanism if
possible. The batch mechanism uses Postgres internal batch insert mode which
is also used by COPY command.
spock.channel_counters
spock.channel_counters is a boolean value (the default is true) that
enables or disables the Spock channel statistic information collection. This
option can only be set when the postmaster starts.
spock.check_all_uc_indexes
Info
This feature is experimental and OFF by default. Use this feature at
your own risk.
This parameter enhances conflict resolution during INSERT operations in
scenarios where a row exists that meets unique constraints defined on the
table (rather than primary keys or replication identity).
If this GUC is enabled, Spock will continue to check unique constraint
indexes, after checking the primary key / replica identity index. Only one
conflict will be resolved, using Last-Write-Wins logic. This includes the
primary key / replica identity index. If a second conflict occurs, an
exception is recorded in the spock.exception_log table.
Partial unique constraints are supported, but nullable unique constraints are not. Deferable constraints are not supported, are filtered out and are not checked, and therefore may still cause an exception during the final commit.
Unique constraint indexes are checked in OID order, after the primary key. It is therefore possible for the resolution of a unique constraint to change the primary key value.
Child tables / foreign keys are not checked or handled. Therefore, if resolving a unique constraint changes the primary key value, you can inadvertently create orphaned foreign key records.
spock.conflict_resolution
spock.conflict_resolution sets the resolution method for any detected
conflicts between local data and incoming changes. Possible values are:
error- the replication will stop on error if conflict is detected and manual action is needed for resolvingapply_remote- always apply the change that's conflicting with local datakeep_local- keep the local version of the data and ignore the conflicting change that is coming from the remote nodelast_update_wins- the version of data with newest commit timestamp will be kept (this can be either local or remote version)
To enable conflict resolution, the track_commit_timestamp setting must be
enabled.
spock.deny_all_ddl
spock.deny_all_ddl is a boolean value (the default is false) that
enables or disables the execution of DDL statements within a Spock
configured cluster. This option can be set at postmaster startup, with the
SIGHUP mechanism, or on the command line with SQL if you're a superuser.
spock.enable_ddl_replication
spock.enable_ddl_replication enables
automatic replication of ddl statements through
the default replication set.
spock.exception_behaviour
Use this GUC to specify the commit behavior of Postgres when it encounters an ERROR within a transaction:
transdiscard(the default) - Setspock.exception_behaviourtotransdiscardto instruct the server to discard a transaction if any operation within that transaction returns anERROR.discard- Setspock.exception_behaviourtodiscardto continue processing (and replicating) without interrupting server use. The server will commit any operation (any portion of a transaction) that does not return anERRORstatement.sub_disable- Setspock.exception_behaviourtosub_disableto disable the subscription for the node on which the exception was reported. Transactions for the disabled node will be added to a queue that is written to the WAL log file; when the subscription is enabled, replication will resume with the transaction that caused the exception, followed by the other queued transactions. You can use thespock.alter_sub_skip_lsnfunction to skip the transaction that caused the exception and resume processing with the next transaction in the queue.
Note that the value you choose for spock.exception_behaviour could
potentially result in a large WAL log if transactions are allowed to
accumulate.
spock.exception_logging
Use this GUC to specify which operations/transactions are written to the exception log table:
all(the default) - Setspock.exception_loggingtoallto instruct the server to record all transactions that contain one or more failed operations in thespock.exception_logtable. Note that this setting will log all operations that are part of a transaction if one operation fails.discard- Add a row to thespock.exception_logtable for any discarded operation; successful transactions are not logged.none- Instructs the server to not log any operation or transactions to the exception log table.
spock.exception_replay_queue_size
When Spock encounters a replication exception, it attempts to resolve the
exception by entering exception-handling mode, based on the value of
spock.exception_behaviour. Spock then writes any transaction up to a
default size of 4MB to memory, and the apply worker replays the
transaction from memory. This provides a massive speed and performance
increase in the handling of the vast majority of exceptions. The memory
size is configurable with spock.exception_replay_queue_size.
If the transaction exceeds the configured size, Spock performs as specified
by the spock.exception_behaviour parameter.
spock.extra_connection_options
You can use the spock.extra_connection_options parameter in the
postgresql.conf file to assign connection options that apply to all
connections made by Spock. This can be a useful place to set up custom
keepalive options, etc.
spock defaults to enabling TCP keepalives to ensure that it notices when
the upstream server disappears unexpectedly. To disable them add
keepalives = 0 to spock.extra_connection_options.
spock.feedback_frequency
Controls how many WAL messages the apply worker processes before sending an LSN feedback packet to the provider. Lower values increase feedback overhead due to synchronous socket flushes; higher values reduce overhead during bulk catch-up. There is a time-based guard (wal_sender_timeout / 2) that ensures connection liveness regardless of this setting. The default is 200.
spock.include_ddl_repset
spock.include_ddl_repset enables spock to automatically add tables to
replication sets at the time they are created on each node. Tables with
primary keys will be added to the default replication set, and tables
without primary keys will be added to the default_insert_only replication
set. Altering a table to add or remove a primary key will make the correct
adjustment to which replication set the table is part of. Setting a table to
unlogged will remove it from replication. Detaching a partition will not
remove it from replication.
spock.log_origin_change
spock.log_origin_change indicates whether changes to a row's
origin should be logged to the PostgreSQL log. Rows may be being updated
locally by regular SQL operations, or by replication from apply workers.
Note that rows that are changed locally (not from replication) have the
origin value of 0.
The default of none is recommended because otherwise the amount of entries
may become numerous. The other options allow for monitoring when updates
occur outside expected patterns.
The following configuration values are possible:
none(the default)- do not log any origin change informationremote_only_differs- only log origin changes when the existing row was from one remote publisher and was changed by another remote publishersince_sub_creation- log origin changes whether a publisher changed a row that was previously from another publisher or updated it locally, but only since the time when the subscription was created.
spock.save_resolutions
spock.save_resolutions is a boolean value (the default is false) that
logs all conflict resolutions to the spock.resolutions table. This option
can only be set when the postmaster starts.
spock.stats_max_entries
spock.stats_max_entries specifies the maximum number of entries that can
be stored into the Spock channel statistic. This option can only be set when
the postmaster starts. The parameter accepts values from -1 to INT_MAX
(the default is -1).
spock.temp_directory
spock.temp_directory defines the system path where temporary files
needed for schema synchronization are written. This path needs to exist
and be writable by the user running Postgres. The default is empty,
which tells Spock to use the default temporary directory based on
environment or operating system settings.