Snowflake Sequences
Using a [Snowflake sequence] (https://github.com/pgEdge/snowflake?tab=readme-ov-file#snowflake-sequences-for-postgresql)allows (opens in a new tab) you to take full advantage of the benefits offered by a distributed, multi-master replication solution. Snowflake is a Postgres extension that provides an int8 and sequence based unique ID solution to optionally replace the Postgres built-in bigserial data type.
Internally, snowflakes are 64 bit integers represented externally as bigint values. The 64 bits are divided into bit fields:
- bit 63 - unused (sign of int8)
- bits 22-62 - timestamp with millisecond precision
- bits 10-21 - counter within one millisecond
- bits 0-9 - unique pgEdge Distributed Postgres node number
- The timestamp is a 41-bit unsigned value representing millisecond precision and an epoch of 2023-01-01.
- The counter is a 12-bit unsigned value that increments per ID allocation. This provides for 4096 unique IDs per millisecond, or 4 million IDs per second.
- The node number is a 10-bit unique identifier for the Postgres instance inside a global cluster. This value is set with the snowflake.nodeGUC in thepostgresql.conffile.
Snowflake sequences let you:
- add or modify data in different regions while ensuring a unique transaction sequence.
- preserve unique transaction identifiers without manual/administrative management of a numbering scheme.
- accurately identify the order in which globally distributed transactions are performed.
Snowflake sequences also alleviate concerns that network lag could disrupt sequences in distributed transactions.
A Snowflake sequence is made up of a timestamp, a counter, and a unique node identifier; these components are sized to ensure over 18 trillion unique sequence numbers. The extension supports versatile commands and functions that simplify creating a new Snowflake sequence or converting an existing sequence in a pgEdge Distributed Postgres cluster.
Creating a Snowflake Sequence in a New Cluster
Before using Snowflake sequences, each node in your cluster must be assigned a unique snowflake.node identity in the postgresql.conf file. There are three ways to set the value of snowflake.node. You can:
- use the pgedge spock node-createcommand.
- use the pgedge db guc-set snowflake.nodecommand.
- manually modify the postgresql.conffile.
If your cluster has 9 nodes (or fewer), and you use the node naming convention N1, N2, N3 for your nodes, the pgedge spock node-create command will set the snowflake.node value for you. When you invoke the pgedge spock node-create command to define a new node, the value of snowflake.node is displayed after the command confirmation:
[pgedge]$ ./pgedge spock node-create n2 'host=10.2.1.2 user=pgedge port=5432 dbname=acctg' acctg
 
[
  {
	"node_create": 560818415
  }
]
  new: snowflake.node = 2If you have a cluster with more than 9 nodes or a different node naming convention (other than N1, N2, etc.), you can use the pgedge db guc-set command to manually set snowflake.node to a numeric value that is unique within your cluster. The syntax is:
[pgedge]$ ./pgedge database_name guc-set snowflake.node node_number
Where
node_number is a unique number associated with the node. For example, the following command sets the snowflake.node value to 10:
[pgedge]$ ./pgedge acctg guc-set snowflake.node 10You also have the option to manually edit the postgresql.conf file, adding the value of snowflake.node. For example, you can add the following statement at the end of the file to identify the host as node 11:
snowflake.node = 11After setting the node identifier, you need to reload the server before using a Snowflake sequence.  You can use the pgedge reload command:
[pgedge]$ ./pgedge reload pg16
pg16 reloadingor use the pg_ctl reload command to reload the server:
[pgedge]$ /home/pgedge/pg16/bin/pg_ctl reload -D "/home/pgedge/data/pg16"
server signaledConverting a Postgres Sequence to a Snowflake Sequence
You can use the pgedge spock sequence-convert command at the psql command line to convert your existing sequence definitions into Snowflake sequences. Note that this converts the sequence definition; existing values in a sequence column will not change. The syntax is:
pgedge spock sequence-convert sequence_name database_name
Where
sequence_name is the name of a sequence; you can use wildcards when specifying the sequence name to convert sequences in more than one table.
database_name is the name of your database.
For example, the following command:
[pgedge]$ ./pgedge spock sequence-convert public.* acctgConverts the sequences in tables that reside in the public schema of the acctg database to use Snowflake sequences. If you invoke the command on one node of a replicating cluster, the table definition updates are propagated to the other nodes in the cluster.
Example: Converting an Existing Sequence
The example that follows starts at the psql command line; in this example, we are using a table named orders that has three columns; the last column is a sequence named id:
acctg=# CREATE TABLE orders (customer VARCHAR, invoice VARCHAR, id bigserial PRIMARY KEY);
CREATE TABLEAfter creating the table, we insert data into the orders table. We only need to provide content for the first two columns, since the sequence definition keeps track of the value of the third column and adds it as needed:
acctg=# INSERT INTO orders VALUES ('Chesterfield Schools', 'art_9338');
INSERT 0 1
acctg=# INSERT INTO orders VALUES ('Chesterfield Schools', 'math_9663');
INSERT 0 1
acctg=# INSERT INTO orders VALUES ('Albemarle Schools', 'sci_2009');
INSERT 0 1
acctg=# INSERT INTO orders VALUES ('King William Schools', 'sci_7399');
INSERT 0 1
acctg=# INSERT INTO orders VALUES ('King William Schools', 'art_9484');
INSERT 0 1
acctg=# INSERT INTO orders VALUES ('Hanover Schools', 'music_1849');
INSERT 0 1
acctg=# INSERT INTO orders VALUES ('Washington Schools', 'hist_2983');
INSERT 0 1When we select the rows from our table, we can see the sequence numbers in the id column:
acctg=# SELECT * FROM orders;
       customer       |  invoice   | id 
----------------------+------------+----
 Chesterfield Schools | art_9338   |  1
 Chesterfield Schools | math_9663  |  2
 Albemarle Schools    | sci_2009   |  3
 King William Schools | sci_7399   |  4
 King William Schools | art_9484   |  5
 Hanover Schools      | music_1849 |  6
 Washington Schools   | hist_2983  |  7
(7 rows)To convert the sequence definition for the orders table to use Snowflake sequences, we exit psql on n1, and invoke the command:
[pgedge]$ ./pgedge spock sequence-convert public.orders_id_seq acctg
Converting sequence public.orders_id_seq to snowflake sequence.The conversion process modifies the sequence definition to use Snowflake sequences, but does not update existing rows. If we reconnect with psql and add new rows to the table, the new row's id will be a Snowflake sequence:
acctg=# INSERT INTO orders VALUES ('Prince William Schools', 'math_8330');
INSERT 0 1
acctg=# INSERT INTO orders VALUES ('Fluvanna Schools', 'art_9447');
INSERT 0 1In the query results that follows, you can see the unformatted sequence value in the id column, and the same information in the format column, formatted with the snowflake.format(id) function. The rows added before the conversion to Snowflake sequences show a fixed timestamp of 2022-12-31 19:00:00-05, while the Snowflake sequences have a unique id and timestamp.
Original entries in the table display a Postgres sequence, while entries made after the conversion display Snowflake sequences:
acctg=# SELECT id, snowflake.format(id), customer, invoice FROM orders;
         id         |                          format                           |        customer        |  invoice   
--------------------+-----------------------------------------------------------+------------------------+------------
                  1 | {"id": 1, "ts": "2022-12-31 19:00:00-05", "count": 0}     | Chesterfield Schools   | art_9338
                  2 | {"id": 2, "ts": "2022-12-31 19:00:00-05", "count": 0}     | Chesterfield Schools   | math_9663
                  3 | {"id": 3, "ts": "2022-12-31 19:00:00-05", "count": 0}     | Albemarle Schools      | sci_2009
                  4 | {"id": 4, "ts": "2022-12-31 19:00:00-05", "count": 0}     | King William Schools   | sci_7399
                  5 | {"id": 5, "ts": "2022-12-31 19:00:00-05", "count": 0}     | King William Schools   | art_9484
                  6 | {"id": 6, "ts": "2022-12-31 19:00:00-05", "count": 0}     | Hanover Schools        | music_1849
                  7 | {"id": 7, "ts": "2022-12-31 19:00:00-05", "count": 0}     | Washington Schools     | hist_2983
 135824181823537153 | {"id": 1, "ts": "2024-01-10 14:16:48.438-05", "count": 0} | Prince William Schools | math_8330
 135824609030176769 | {"id": 1, "ts": "2024-01-10 14:18:30.292-05", "count": 0} | Fluvanna Schools       | art_9447
(9 rows)Snowflake Functions
You can query the following Snowflake functions at the psql command line to return useful information about tables that use Snowflake sequences.
snowflake.nextval([sequence regclass])
Generates the next snowflake in the specified sequence. If no sequence is specified, the internal, database-wide sequence snowflake.id_seq is used. For example, the following query returns the next number in the orders_id_seq sequence:
acctg=# SELECT * FROM snowflake.nextval('orders_id_seq'::regclass);
      nextval       
--------------------
 136169504773242881
(1 row)snowflake.currval([sequence regclass])
Returns the current value of the specified sequence. This value is undefined until the function snowflake.nextval() has been called for the sequence in the current session. For example, the following query returns the current value of the orders_id_seq sequence:
acctg=# SELECT * FROM snowflake.currval('orders_id_seq'::regclass);
      currval       
--------------------
 136169504773242881
(1 row)snowflake.get_epoch(snowflake int8) Returns the timestamp part of the snowflake as EPOCH (seconds since 2023-01-01) as a NUMERIC value with precision of three digits. For example, the timestamp of the following sequence value is 1704996539.845 seconds past Jan. 1, 2023:
acctg=# SELECT * FROM snowflake.get_epoch(136169504773242881);
   get_epoch    
----------------
 1704996539.845
(1 row)You can use the to_timestamp(snowflake.get_epoch(<value>)) function to convert the epoch into a timestamp.  For example:
acctg=# SELECT to_timestamp(snowflake.get_epoch(136169504773242881));
        to_timestamp        
----------------------------
 2024-01-11 13:08:59.845-05
(1 row)snowflake.get_count(snowflake int8) Returns the count part of the given snowflake as int4 value; count resets to 0 for each new millisecond. For example:
acctg=# SELECT snowflake.get_count(136169504773242881);
 get_count 
-----------
         3
(1 row)snowflake.get_node(snowflake int8) Returns the value of snowflake.node in postgresql.conf for the host of the specified Snowflake sequence. For example, the Snowflake sequence number 136169504773242881 resides on Node 1:
acctg=# SELECT * FROM snowflake.get_node(136169504773242881);
 get_node 
----------
        1
(1 row)snowflake.format(snowflake int8) Returns a jsonb object of the given snowflake in the form: {"node": 1, "ts": "2023-10-16 17:57:26.361+00", "count": 0}.  For example:
acctg=# SELECT * FROM snowflake.format(136169504773242881);
                          format                           
-----------------------------------------------------------
 {"id": 1, "ts": "2024-01-11 13:08:59.845-05", "count": 0}
(1 row)Reviewing Sequence Definitions in your Table Meta-data
If you query the table meta-data before performing a sequence conversion with the \d+ orders command, the Postgres sequence is defined in the Default column:
acctg=# \d+ orders;
                                                             Table "public.orders"
  Column  |       Type        | Collation | Nullable |              Default               | Storage  | Compression | Stats target | Description 
----------+-------------------+-----------+----------+------------------------------------+----------+-------------+--------------+-------------
 customer | character varying |           |          |                                    | extended |             |              | 
 invoice  | character varying |           |          |                                    | extended |             |              | 
 id       | bigint            |           | not null | nextval('orders_id_seq'::regclass) | plain    |             |              | 
Indexes:
    "orders_pkey" PRIMARY KEY, btree (id)
Access method: heapAfter performing a conversion, the \d+ table_name command shows a Default value of snowflake.nextval('orders_id_seq'::regclass):
acctg=# \d+ orders
                                                                  Table "public.orders"
  Column  |       Type        | Collation | Nullable |                   Default                    | Storage  | Compression | Stats target | Descripti
on 
----------+-------------------+-----------+----------+----------------------------------------------+----------+-------------+--------------+----------
---
 customer | character varying |           |          |                                              | extended |             |              | 
 invoice  | character varying |           |          |                                              | extended |             |              | 
 id       | bigint            |           | not null | snowflake.nextval('orders_id_seq'::regclass) | plain    |             |              | 
Indexes:
    "orders_pkey" PRIMARY KEY, btree (id)
Access method: heap