Snowflake - pgEdge

Snowflake Sequences

Using a Snowflake sequence allows you to take full advantage of the benefits offered by a distributed, multi-master replication solution. The Snowflake (opens in a new tab) extension provides a replication-friendly identifier that you can use to replace the PostgreSQL sequence definitions in your tables. pgEdge automatically installs and creates the extension in your pgEdge cluster.

Snowflake sequences let you:

add or modify data in different regions while ensuring a unique transaction sequence.
preserve unique transaction identifiers without manual/administrative management of a numbering scheme.
accurately identify the order in which globally distributed transactions are performed.

Snowflake sequences also alleviate concerns that network lag could disrupt sequences in distributed transactions.

A Snowflake sequence is made up of a timestamp, a counter, and a unique node identifier; these components are sized to ensure over 18 trillion unique sequence numbers. pgEdge Platform provides versatile commands and functions that simplify creating a new Snowflake sequence or converting an existing sequence in a distributed, multi-master replication cluster.

Creating a Snowflake Sequence in a New Cluster

Before using Snowflake sequences, each node in your cluster must be assigned a unique snowflake.node identity in the postgresql.conf file. There are three ways to set the value of snowflake.node. You can:

use the pgedge spock node-create command.
use the pgedge db guc-set snowflake.node command.
manually modify the postgresql.conf file.

If your cluster has 9 nodes (or fewer), and you use the node naming convention N1, N2, N3 for your nodes, the pgedge spock node-create command will set the snowflake.node value for you. When you invoke the pgedge spock node-create command to define a new node, the value of snowflake.node is displayed after the command confirmation:

[pgedge]$ ./pgedge spock node-create n2 'host=10.2.1.2 user=pgedge port=5432 dbname=acctg' acctg
 
[
  {
	"node_create": 560818415
  }
]
  new: snowflake.node = 2

If you have a cluster with more than 9 nodes or a different node naming convention (other than N1, N2, etc.), you can use the pgedge db guc-set command to manually set snowflake.node to a numeric value that is unique within your cluster. The syntax is:

[pgedge]$ ./pgedge database_name guc-set snowflake.node node_number

Where

node_number is a unique number associated with the node. For example, the following command sets the snowflake.node value to 10:

[pgedge]$ ./pgedge acctg guc-set snowflake.node 10

You also have the option to manually edit the postgresql.conf file, adding the value of snowflake.node. For example, you can add the following statement at the end of the file to identify the host as node 11:

snowflake.node = 11

After setting the node identifier, you need to reload the server before using a Snowflake sequence. You can use the pgedge reload command:

[pgedge]$ ./pgedge reload pg16
pg16 reloading

or use the pg_ctl reload command to reload the server:

[pgedge]$ /home/pgedge/pg16/bin/pg_ctl reload -D "/home/pgedge/data/pg16"
server signaled

Converting a PostgreSQL Sequence to a Snowflake Sequence

You can use the pgedge spock sequence-convert command at the psql command line to convert your existing sequence definitions into Snowflake sequences. Note that this converts the sequence definition; existing values in a sequence column will not change. The syntax is:

pgedge spock sequence-convert sequence_name database_name

Where

sequence_name is the name of a sequence; you can use wildcards when specifying the sequence name to convert sequences in more than one table.

database_name is the name of your database.

For example, the following command:

[pgedge]$ ./pgedge spock sequence-convert public.* acctg

Converts the sequences in tables that reside in the public schema of the acctg database to use Snowflake sequences. If you invoke the command on one node of a replicating cluster, the table definition updates are propagated to the other nodes in the cluster.

Example: Converting an Existing Sequence

The example that follows starts at the psql command line; in this example, we are using a table named orders that has three columns; the last column is a sequence named id:

acctg=# CREATE TABLE orders (customer VARCHAR, invoice VARCHAR, id bigserial PRIMARY KEY);
CREATE TABLE

After creating the table, we insert data into the orders table. We only need to provide content for the first two columns, since the sequence definition keeps track of the value of the third column and adds it as needed:

acctg=# INSERT INTO orders VALUES ('Chesterfield Schools', 'art_9338');
INSERT 0 1
acctg=# INSERT INTO orders VALUES ('Chesterfield Schools', 'math_9663');
INSERT 0 1
acctg=# INSERT INTO orders VALUES ('Albemarle Schools', 'sci_2009');
INSERT 0 1
acctg=# INSERT INTO orders VALUES ('King William Schools', 'sci_7399');
INSERT 0 1
acctg=# INSERT INTO orders VALUES ('King William Schools', 'art_9484');
INSERT 0 1
acctg=# INSERT INTO orders VALUES ('Hanover Schools', 'music_1849');
INSERT 0 1
acctg=# INSERT INTO orders VALUES ('Washington Schools', 'hist_2983');
INSERT 0 1

When we select the rows from our table, we can see the sequence numbers in the id column:

acctg=# SELECT * FROM orders;
       customer       |  invoice   | id 
----------------------+------------+----
 Chesterfield Schools | art_9338   |  1
 Chesterfield Schools | math_9663  |  2
 Albemarle Schools    | sci_2009   |  3
 King William Schools | sci_7399   |  4
 King William Schools | art_9484   |  5
 Hanover Schools      | music_1849 |  6
 Washington Schools   | hist_2983  |  7
(7 rows)

To convert the sequence definition for the orders table to use Snowflake sequences, we exit psql on n1, and invoke the command:

[pgedge]$ ./pgedge spock sequence-convert public.orders_id_seq acctg
Converting sequence public.orders_id_seq to snowflake sequence.

The conversion process modifies the sequence definition to use Snowflake sequences, but does not update existing rows. If we reconnect with psql and add new rows to the table, the new row's id will be a Snowflake sequence:

acctg=# INSERT INTO orders VALUES ('Prince William Schools', 'math_8330');
INSERT 0 1
acctg=# INSERT INTO orders VALUES ('Fluvanna Schools', 'art_9447');
INSERT 0 1

In the query results that follows, you can see the unformatted sequence value in the id column, and the same information in the format column, formatted with the snowflake.format(id) function. The rows added before the conversion to Snowflake sequences show a fixed timestamp of 2022-12-31 19:00:00-05, while the Snowflake sequences have a unique id and timestamp.

Original entries in the table display a PostgreSQL sequence, while entries made after the conversion display Snowflake sequences:

acctg=# SELECT id, snowflake.format(id), customer, invoice FROM orders;
         id         |                          format                           |        customer        |  invoice   
--------------------+-----------------------------------------------------------+------------------------+------------
                  1 | {"id": 1, "ts": "2022-12-31 19:00:00-05", "count": 0}     | Chesterfield Schools   | art_9338
                  2 | {"id": 2, "ts": "2022-12-31 19:00:00-05", "count": 0}     | Chesterfield Schools   | math_9663
                  3 | {"id": 3, "ts": "2022-12-31 19:00:00-05", "count": 0}     | Albemarle Schools      | sci_2009
                  4 | {"id": 4, "ts": "2022-12-31 19:00:00-05", "count": 0}     | King William Schools   | sci_7399
                  5 | {"id": 5, "ts": "2022-12-31 19:00:00-05", "count": 0}     | King William Schools   | art_9484
                  6 | {"id": 6, "ts": "2022-12-31 19:00:00-05", "count": 0}     | Hanover Schools        | music_1849
                  7 | {"id": 7, "ts": "2022-12-31 19:00:00-05", "count": 0}     | Washington Schools     | hist_2983
 135824181823537153 | {"id": 1, "ts": "2024-01-10 14:16:48.438-05", "count": 0} | Prince William Schools | math_8330
 135824609030176769 | {"id": 1, "ts": "2024-01-10 14:18:30.292-05", "count": 0} | Fluvanna Schools       | art_9447
(9 rows)

Snowflake Functions

You can query the following Snowflake functions at the psql command line to return useful information about tables that use Snowflake sequences.

snowflake.nextval([sequence regclass]) Generates the next snowflake in the specified sequence. If no sequence is specified, the internal, database-wide sequence snowflake.id_seq is used. For example, the following query returns the next number in the orders_id_seq sequence:

acctg=# SELECT * FROM snowflake.nextval('orders_id_seq'::regclass);
      nextval       
--------------------
 136169504773242881
(1 row)

snowflake.currval([sequence regclass]) Returns the current value of the specified sequence. This value is undefined until the function snowflake.nextval() has been called for the sequence in the current session. For example, the following query returns the current value of the orders_id_seq sequence:

acctg=# SELECT * FROM snowflake.currval('orders_id_seq'::regclass);
      currval       
--------------------
 136169504773242881
(1 row)

snowflake.get_epoch(snowflake int8) Returns the timestamp part of the snowflake as EPOCH (seconds since 2023-01-01) as a NUMERIC value with precision of three digits. For example, the timestamp of the following sequence value is 1704996539.845 seconds past Jan. 1, 2023:

acctg=# SELECT * FROM snowflake.get_epoch(136169504773242881);
   get_epoch    
----------------
 1704996539.845
(1 row)

You can use the to_timestamp(snowflake.get_epoch(<value>)) function to convert the epoch into a timestamp. For example:

acctg=# SELECT to_timestamp(snowflake.get_epoch(136169504773242881));
        to_timestamp        
----------------------------
 2024-01-11 13:08:59.845-05
(1 row)

snowflake.get_count(snowflake int8) Returns the count part of the given snowflake as int4 value; count resets to 0 for each new millisecond. For example:

acctg=# SELECT snowflake.get_count(136169504773242881);
 get_count 
-----------
         3
(1 row)

snowflake.get_node(snowflake int8) Returns the value of snowflake.node in postgresql.conf for the host of the specified Snowflake sequence. For example, the Snowflake sequence number 136169504773242881 resides on Node 1:

acctg=# SELECT * FROM snowflake.get_node(136169504773242881);
 get_node 
----------
        1
(1 row)

snowflake.format(snowflake int8) Returns a jsonb object of the given snowflake in the form: {"node": 1, "ts": "2023-10-16 17:57:26.361+00", "count": 0}. For example:

acctg=# SELECT * FROM snowflake.format(136169504773242881);
                          format                           
-----------------------------------------------------------
 {"id": 1, "ts": "2024-01-11 13:08:59.845-05", "count": 0}
(1 row)

Reviewing Sequence Definitions in your Table Meta-data

If you query the table meta-data before performing a sequence conversion with the \d+ orders command, the PostgreSQL sequence is defined in the Default column:

acctg=# \d+ orders;
                                                             Table "public.orders"
  Column  |       Type        | Collation | Nullable |              Default               | Storage  | Compression | Stats target | Description 
----------+-------------------+-----------+----------+------------------------------------+----------+-------------+--------------+-------------
 customer | character varying |           |          |                                    | extended |             |              | 
 invoice  | character varying |           |          |                                    | extended |             |              | 
 id       | bigint            |           | not null | nextval('orders_id_seq'::regclass) | plain    |             |              | 
Indexes:
    "orders_pkey" PRIMARY KEY, btree (id)
Access method: heap

After performing a conversion, the \d+ table_name command shows a Default value of snowflake.nextval('orders_id_seq'::regclass):

acctg=# \d+ orders
                                                                  Table "public.orders"
  Column  |       Type        | Collation | Nullable |                   Default                    | Storage  | Compression | Stats target | Descripti
on 
----------+-------------------+-----------+----------+----------------------------------------------+----------+-------------+--------------+----------
---
 customer | character varying |           |          |                                              | extended |             |              | 
 invoice  | character varying |           |          |                                              | extended |             |              | 
 id       | bigint            |           | not null | snowflake.nextval('orders_id_seq'::regclass) | plain    |             |              | 
Indexes:
    "orders_pkey" PRIMARY KEY, btree (id)
Access method: heap

Using Auto DDL Filtering Data with Replication Sets