Snowflake Sequences
Using a Snowflake
sequence allows you to take full advantage of the benefits offered by a distributed, multi-master replication solution. The Snowflake
(opens in a new tab) extension provides a replication-friendly identifier that you can use to replace the PostgreSQL sequence definitions in your tables. pgEdge automatically installs and creates the extension in your pgEdge cluster.
Snowflake sequences let you:
- add or modify data in different regions while ensuring a unique transaction sequence.
- preserve unique transaction identifiers without manual/administrative management of a numbering scheme.
- accurately identify the order in which globally distributed transactions are performed.
Snowflake sequences also alleviate concerns that network lag could disrupt sequences in distributed transactions.
A Snowflake sequence is made up of a timestamp, a counter, and a unique node identifier; these components are sized to ensure over 18 trillion unique sequence numbers. pgEdge Platform provides versatile commands and functions that simplify creating a new Snowflake sequence or converting an existing sequence in a distributed, multi-master replication cluster.
Creating a Snowflake Sequence in a New Cluster
Before using Snowflake sequences, each node in your cluster must be assigned a unique snowflake.node
identity in the postgresql.conf
file. There are three ways to set the value of snowflake.node
. You can:
- use the
pgedge spock node-create
command. - use the
pgedge db guc-set snowflake.node
command. - manually modify the
postgresql.conf
file.
If your cluster has 9 nodes (or fewer), and you use the node naming convention N1
, N2
, N3
for your nodes, the pgedge spock node-create
command will set the snowflake.node
value for you. When you invoke the pgedge spock node-create
command to define a new node, the value of snowflake.node
is displayed after the command confirmation:
[pgedge]$ ./pgedge spock node-create n2 'host=10.2.1.2 user=pgedge port=5432 dbname=acctg' acctg
[
{
"node_create": 560818415
}
]
new: snowflake.node = 2
If you have a cluster with more than 9 nodes or a different node naming convention (other than N1, N2, etc.), you can use the pgedge db guc-set
command to manually set snowflake.node
to a numeric value that is unique within your cluster. The syntax is:
[pgedge]$ ./pgedge database_name guc-set snowflake.node node_number
Where
node_number
is a unique number associated with the node. For example, the following command sets the snowflake.node
value to 10
:
[pgedge]$ ./pgedge acctg guc-set snowflake.node 10
You also have the option to manually edit the postgresql.conf
file, adding the value of snowflake.node
. For example, you can add the following statement at the end of the file to identify the host as node 11
:
snowflake.node = 11
After setting the node identifier, you need to reload the server before using a Snowflake sequence. You can use the pgedge reload
command:
[pgedge]$ ./pgedge reload pg16
pg16 reloading
or use the pg_ctl reload
command to reload the server:
[pgedge]$ /home/pgedge/pg16/bin/pg_ctl reload -D "/home/pgedge/data/pg16"
server signaled
Converting a PostgreSQL Sequence to a Snowflake Sequence
You can use the pgedge spock sequence-convert
command at the psql command line to convert your existing sequence definitions into Snowflake sequences. Note that this converts the sequence definition; existing values in a sequence column will not change. The syntax is:
pgedge spock sequence-convert sequence_name database_name
Where
sequence_name
is the name of a sequence; you can use wildcards when specifying the sequence name to convert sequences in more than one table.
database_name
is the name of your database.
For example, the following command:
[pgedge]$ ./pgedge spock sequence-convert public.* acctg
Converts the sequences in tables that reside in the public
schema of the acctg
database to use Snowflake sequences. If you invoke the command on one node of a replicating cluster, the table definition updates are propagated to the other nodes in the cluster.
Example: Converting an Existing Sequence
The example that follows starts at the psql command line; in this example, we are using a table named orders
that has three columns; the last column is a sequence named id
:
acctg=# CREATE TABLE orders (customer VARCHAR, invoice VARCHAR, id bigserial PRIMARY KEY);
CREATE TABLE
After creating the table, we insert data into the orders
table. We only need to provide content for the first two columns, since the sequence definition keeps track of the value of the third column and adds it as needed:
acctg=# INSERT INTO orders VALUES ('Chesterfield Schools', 'art_9338');
INSERT 0 1
acctg=# INSERT INTO orders VALUES ('Chesterfield Schools', 'math_9663');
INSERT 0 1
acctg=# INSERT INTO orders VALUES ('Albemarle Schools', 'sci_2009');
INSERT 0 1
acctg=# INSERT INTO orders VALUES ('King William Schools', 'sci_7399');
INSERT 0 1
acctg=# INSERT INTO orders VALUES ('King William Schools', 'art_9484');
INSERT 0 1
acctg=# INSERT INTO orders VALUES ('Hanover Schools', 'music_1849');
INSERT 0 1
acctg=# INSERT INTO orders VALUES ('Washington Schools', 'hist_2983');
INSERT 0 1
When we select the rows from our table, we can see the sequence numbers in the id
column:
acctg=# SELECT * FROM orders;
customer | invoice | id
----------------------+------------+----
Chesterfield Schools | art_9338 | 1
Chesterfield Schools | math_9663 | 2
Albemarle Schools | sci_2009 | 3
King William Schools | sci_7399 | 4
King William Schools | art_9484 | 5
Hanover Schools | music_1849 | 6
Washington Schools | hist_2983 | 7
(7 rows)
To convert the sequence definition for the orders
table to use Snowflake sequences, we exit psql on n1
, and invoke the command:
[pgedge]$ ./pgedge spock sequence-convert public.orders_id_seq acctg
Converting sequence public.orders_id_seq to snowflake sequence.
The conversion process modifies the sequence definition to use Snowflake sequences, but does not update existing rows. If we reconnect with psql and add new rows to the table, the new row's id
will be a Snowflake sequence:
acctg=# INSERT INTO orders VALUES ('Prince William Schools', 'math_8330');
INSERT 0 1
acctg=# INSERT INTO orders VALUES ('Fluvanna Schools', 'art_9447');
INSERT 0 1
In the query results that follows, you can see the unformatted sequence value in the id
column, and the same information in the format
column, formatted with the snowflake.format(id)
function. The rows added before the conversion to Snowflake sequences show a fixed timestamp of 2022-12-31 19:00:00-05
, while the Snowflake sequences have a unique id
and timestamp.
Original entries in the table display a PostgreSQL sequence, while entries made after the conversion display Snowflake sequences:
acctg=# SELECT id, snowflake.format(id), customer, invoice FROM orders;
id | format | customer | invoice
--------------------+-----------------------------------------------------------+------------------------+------------
1 | {"id": 1, "ts": "2022-12-31 19:00:00-05", "count": 0} | Chesterfield Schools | art_9338
2 | {"id": 2, "ts": "2022-12-31 19:00:00-05", "count": 0} | Chesterfield Schools | math_9663
3 | {"id": 3, "ts": "2022-12-31 19:00:00-05", "count": 0} | Albemarle Schools | sci_2009
4 | {"id": 4, "ts": "2022-12-31 19:00:00-05", "count": 0} | King William Schools | sci_7399
5 | {"id": 5, "ts": "2022-12-31 19:00:00-05", "count": 0} | King William Schools | art_9484
6 | {"id": 6, "ts": "2022-12-31 19:00:00-05", "count": 0} | Hanover Schools | music_1849
7 | {"id": 7, "ts": "2022-12-31 19:00:00-05", "count": 0} | Washington Schools | hist_2983
135824181823537153 | {"id": 1, "ts": "2024-01-10 14:16:48.438-05", "count": 0} | Prince William Schools | math_8330
135824609030176769 | {"id": 1, "ts": "2024-01-10 14:18:30.292-05", "count": 0} | Fluvanna Schools | art_9447
(9 rows)
Snowflake Functions
You can query the following Snowflake functions at the psql command line to return useful information about tables that use Snowflake sequences.
snowflake.nextval([sequence regclass])
Generates the next snowflake in the specified sequence. If no sequence is specified, the internal, database-wide sequence snowflake.id_seq
is used. For example, the following query returns the next number in the orders_id_seq
sequence:
acctg=# SELECT * FROM snowflake.nextval('orders_id_seq'::regclass);
nextval
--------------------
136169504773242881
(1 row)
snowflake.currval([sequence regclass])
Returns the current value of the specified sequence. This value is undefined until the function snowflake.nextval()
has been called for the sequence in the current session. For example, the following query returns the current value of the orders_id_seq
sequence:
acctg=# SELECT * FROM snowflake.currval('orders_id_seq'::regclass);
currval
--------------------
136169504773242881
(1 row)
snowflake.get_epoch(snowflake int8)
Returns the timestamp part of the snowflake as EPOCH
(seconds since 2023-01-01) as a NUMERIC
value with precision of three digits. For example, the timestamp of the following sequence value is 1704996539.845
seconds past Jan. 1, 2023:
acctg=# SELECT * FROM snowflake.get_epoch(136169504773242881);
get_epoch
----------------
1704996539.845
(1 row)
You can use the to_timestamp(snowflake.get_epoch(<value>))
function to convert the epoch into a timestamp. For example:
acctg=# SELECT to_timestamp(snowflake.get_epoch(136169504773242881));
to_timestamp
----------------------------
2024-01-11 13:08:59.845-05
(1 row)
snowflake.get_count(snowflake int8)
Returns the count part of the given snowflake as int4
value; count resets to 0
for each new millisecond. For example:
acctg=# SELECT snowflake.get_count(136169504773242881);
get_count
-----------
3
(1 row)
snowflake.get_node(snowflake int8)
Returns the value of snowflake.node
in postgresql.conf
for the host of the specified Snowflake sequence. For example, the Snowflake sequence number 136169504773242881
resides on Node 1
:
acctg=# SELECT * FROM snowflake.get_node(136169504773242881);
get_node
----------
1
(1 row)
snowflake.format(snowflake int8)
Returns a jsonb
object of the given snowflake in the form: {"node": 1, "ts": "2023-10-16 17:57:26.361+00", "count": 0}
. For example:
acctg=# SELECT * FROM snowflake.format(136169504773242881);
format
-----------------------------------------------------------
{"id": 1, "ts": "2024-01-11 13:08:59.845-05", "count": 0}
(1 row)
Reviewing Sequence Definitions in your Table Meta-data
If you query the table meta-data before performing a sequence conversion with the \d+ orders
command, the PostgreSQL sequence is defined in the Default
column:
acctg=# \d+ orders;
Table "public.orders"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
----------+-------------------+-----------+----------+------------------------------------+----------+-------------+--------------+-------------
customer | character varying | | | | extended | | |
invoice | character varying | | | | extended | | |
id | bigint | | not null | nextval('orders_id_seq'::regclass) | plain | | |
Indexes:
"orders_pkey" PRIMARY KEY, btree (id)
Access method: heap
After performing a conversion, the \d+ table_name
command shows a Default
value of snowflake.nextval('orders_id_seq'::regclass)
:
acctg=# \d+ orders
Table "public.orders"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Descripti
on
----------+-------------------+-----------+----------+----------------------------------------------+----------+-------------+--------------+----------
---
customer | character varying | | | | extended | | |
invoice | character varying | | | | extended | | |
id | bigint | | not null | snowflake.nextval('orders_id_seq'::regclass) | plain | | |
Indexes:
"orders_pkey" PRIMARY KEY, btree (id)
Access method: heap