Building ColdFront from source
Most users should install from packages - see Installation in the README. This document is the build-from-source workflow: build the patched DuckDB-1.5.x stack yourself, in Docker or bare-metal.
ColdFront runs on a DuckDB 1.5.x stack: PostgreSQL + pg_duckdb (DuckDB 1.5.3) and a patched duckdb-iceberg that carries ColdFront's four patches - the bakery-aware commit-refresh patch (the no-409 guarantee for concurrent cold-tier writers) and three strict-reader interop patches (so apache/iceberg-go, the cold-tier compactor, can read the manifests duckdb-iceberg writes). The patch internals are in DUCKDB_1.5_PATCHED.md. No released pg_duckdb tag carries DuckDB 1.5.x yet, so the stack is built from a pinned upstream PR plus our patches - all from sources you can fetch.
What gets built
docker/Dockerfile.duckdb15-base is the recipe; it fetches the
requirements, applies our patches, and compiles the following
components:
| Component | Source |
|---|---|
| libcurl 8.12.0 | curl.se, built from source (compile-time dep of DuckDB 1.5.3 httpfs; needs curl >= 7.77, the pgEdge base ships 7.76.1) |
| pg_duckdb (DuckDB 1.5.3) | github.com/duckdb/pg_duckdb, PR #1025 |
| duckdb-iceberg | github.com/duckdb/duckdb-iceberg, v1.5-variegata @ 0fad545a |
| vcpkg | github.com/microsoft/vcpkg |
The base build runs as three Docker stages: the first builds libcurl and
pg_duckdb; the second clones duckdb-iceberg at the pinned ref, applies
ColdFront's four patches, and compiles the iceberg, avro, azure, and
postgres_scanner extensions under vcpkg; the third assembles the runtime.
The build git apply --checks each patch before applying it, so it fails
loudly on patch rot rather than silently shipping stock iceberg (which
409s under concurrency and writes manifests strict Apache readers reject).
ColdFront applies the following four patches to duckdb-iceberg; the full rationale is in DUCKDB_1.5_PATCHED.md:
| Patch | What it does |
|---|---|
iceberg-bakery-aware-commit-refresh-v15 |
Re-stamps the parent snapshot at the commit POST so concurrent cold writers never get a Lakekeeper 409 (the no-409 guarantee). |
iceberg-manifest-list-format-version-v15 |
Adds the spec-optional format-version key to the manifest-list metadata so strict Apache readers parse the entries as v2. |
iceberg-manifest-content-v15 |
Writes the manifest's real content type instead of a hardcoded value, so strict readers accept delete manifests. |
iceberg-data-file-format-v15 |
Upper-cases the data-file format in the manifest to match the spec enum strict readers check case-sensitively. |
The bakery patch is mandatory for the no-409 guarantee. The other three
are interop patches so the manifests duckdb-iceberg writes are readable
by strict Apache readers such as apache/iceberg-go, the cold-tier
compactor; they are inert to pg_duckdb's own reads. The canonical recipe
- every source pin and compile step - is
docker/Dockerfile.duckdb15-base itself.
Build the image (Docker)
Build the stack in two stages, the prebuilt base and the thin app layer:
git clone <coldfront-repo> && cd coldfront
# 1. Build the base (fetches the deps above, applies our patches, compiles
# pg_duckdb 1.5.3 + the patched duckdb-iceberg). ~30–60 min,
# needs network + a few GB of disk/RAM. Repeat with =16 / =17 for those majors.
docker build -f docker/Dockerfile.duckdb15-base --build-arg PG_MAJOR=18 \
-t ghcr.io/pgedge/coldfront-duckdb-base:pg18 .
# 2. Build the thin coldfront app layer + bring up the stack (seconds — it only
# compiles the coldfront extension on top of the base).
docker compose up -d --build # end-user single-node stack (ports published)
# (CI uses docker-compose.matrix.yml / docker-compose.mesh.yml — NOT for end-user setup)
The split keeps app builds fast and always testing current source: the
expensive, stable compiles (pg_duckdb 1.5.3 + the patched duckdb-iceberg)
live in the prebuilt base, published to
ghcr.io/pgedge/coldfront-duckdb-base:pg{16,17,18}; the app build
(docker/Dockerfile.duckdb15) just FROMs it and
compiles the coldfront extension in seconds. If you build the base
yourself (step 1) the app layer FROMs your local image; otherwise it
FROMs the published ghcr.io/pgedge/coldfront-duckdb-base:pg<major>.
Rebuild the published base via the
base-image workflow (gh workflow run
base-image.yml) when its inputs change.
Then follow usage.md → One-time setup (bootstrap Lakekeeper → create a table → tier → verify).
Pin pg_duckdb for reproducible builds. The base pins pg_duckdb to
pull/1025/head(a moving, unreleased PR ref). For reproducible builds, pin it to a specific commit SHA (or the eventual DuckDB-1.5.x release) instead of the live PR head.Base foundation. The base is
FROM ghcr.io/pgedge/pgedge-postgres:<pg>-spock5-minimal; you need pull access to that image (or substitute an equivalent PostgreSQL base with the same layout).
Verify the build
A self-contained smoke test confirms the freshly built stack works end
to end: pg_duckdb, the patched duckdb-iceberg, Lakekeeper, and the object
store. The fastest path needs no cloud credentials; bring the stack up
with the in-compose SeaweedFS S3 emulator under the local-store
profile:
docker compose --profile local-store up -d --build
Bootstrap Lakekeeper, create the wh warehouse against the SeaweedFS
credentials in
docker/seaweedfs-s3.json,
and seed the default namespace:
curl -sf -X POST http://localhost:8181/management/v1/bootstrap \
-H 'Content-Type: application/json' -d '{"accept-terms-of-use":true}'
curl -s -X POST http://localhost:8181/management/v1/warehouse \
-H 'Content-Type: application/json' -d '{
"warehouse-name":"wh",
"storage-profile":{"type":"s3","bucket":"iceberg","region":"us-east-1",
"endpoint":"http://seaweedfs:8333","path-style-access":true,
"flavor":"s3-compat","sts-enabled":false,"remote-signing-enabled":false},
"storage-credential":{"type":"s3","credential-type":"access-key",
"aws-access-key-id":"admin","aws-secret-access-key":"adminsecret"}
}'
WID=$(curl -s http://localhost:8181/management/v1/warehouse \
| grep -oE '"warehouse-id":"[^"]+"' | head -1 | cut -d'"' -f4)
curl -s -X POST "http://localhost:8181/catalog/v1/$WID/namespaces" \
-H 'Content-Type: application/json' -d '{"namespace":["default"]}'
Create the extensions, set the cold-store secret, create a decoupled table, insert a row, and read it back through Iceberg:
psql -h localhost -U coldfront -d coldfront <<'SQL'
CREATE EXTENSION IF NOT EXISTS pg_duckdb;
CREATE EXTENSION IF NOT EXISTS coldfront;
SELECT coldfront.set_storage_secret('admin', 'adminsecret', 'seaweedfs:8333');
SELECT coldfront.create_iceberg_table('public', 'events',
'[{"name":"id","type":"bigint"},{"name":"ts","type":"timestamptz"},{"name":"note","type":"text"}]'::jsonb);
INSERT INTO events VALUES (1, now(), 'hello');
SELECT count(*) FROM events;
SQL
A row count of 1 read back through Iceberg confirms the full path. For a
real cloud store, drop the local-store profile, point the warehouse at
your own bucket, and follow
usage.md → One-time setup for the full
tier-and-verify journey.
Build prerequisites
The following table lists the prerequisites for each build path:
| For | You need |
|---|---|
| Docker build (above) | Docker; network access (GitHub / quay.io); ~a few GB disk + RAM and 30-60 min for the base compile |
| The archiver (all paths) | Go 1.26.4+ (pinned in go.mod), make (make build → ./bin/archiver) |
| Bare metal (below) | pg_config, PostgreSQL server dev headers, make, gcc |
Bare metal (no Docker)
The coldfront extension is a standard PGXS C extension:
cd extension/coldfront
make && make install # needs pg_config + PG server dev headers on PATH
You separately need pg_duckdb (DuckDB 1.5.3) and the patched iceberg
DuckDB extension installed in your PostgreSQL - follow the compile steps
in
docker/Dockerfile.duckdb15-base - plus, in
postgresql.conf:
shared_preload_libraries = 'pg_duckdb,coldfront'
coldfront.warehouse = '<warehouse-name>'
coldfront.lakekeeper_endpoint = 'http://<lakekeeper-host>:8181/catalog'
coldfront.local_pg_dsn = 'host=/var/run/postgresql dbname=<db> user=<role>'
(See the README for the full GUC set and the optional turnkey non-superuser role.)
Testing & CI
One canonical user journey (ci/journey.sh) runs identically in
every deployment cell; ci/matrix.sh drives the cells and
ci/topo/*.sh brings up each topology. All cells share the DuckDB 1.5.x
app image
(docker/Dockerfile.duckdb15, built on the prebuilt
base; --build-arg PG_MAJOR=16|17|18).
Pre-commit gate
./run-ci-local.sh runs ci/matrix.sh --quick: gofmt, golangci-lint,
unit tests, build, the pg_regress unit layer, and the full journey on
one representative cell (PG18 · vanilla · tiered · s3). Fast; runs on
every commit. GitHub Actions (.github/workflows/ci.yml)
runs the identical ci/matrix.sh harness - --quick on every push/PR,
--full nightly and on demand - so local and CI never diverge.
Full matrix
ci/matrix.sh --full, the beta gate: PG {16, 17, 18} ×
{vanilla, mesh (3-node Spock)} × {tiered, decoupled} × {primary, standby}
× {s3, aws, azure, gcs}. The mesh cells add the cross-node stories - hot
visibility via Spock, cold visibility via the shared Lakekeeper catalog,
the R-A bakery serialising concurrent cold writers (same-node and
cross-node) with no 409, and an N×(N-1) probe that the bakery's
coldfront.claims table replicates in every direction.
Storage-backend gating
The same policy applies locally and in GitHub CI: the hermetic
SeaweedFS-as-S3 backend (s3) always runs - that is the default
coverage with no credentials. The real cloud stores run only when
their credentials are present in the environment, else they are
reported PENDING and never invoked (no real cloud calls without
explicit creds). The following table shows each backend and its gating
environment variables:
| Backend | Store | Gating env vars |
|---|---|---|
s3 |
SeaweedFS (in-compose, hermetic) | - always runs |
aws |
real AWS S3 (native vhost+HTTPS) | COLDFRONT_AWS_ACCESS_KEY, _SECRET_KEY, _BUCKET, _REGION |
azure |
real Azure ADLS Gen2 | COLDFRONT_AZURE_ACCOUNT, _FILESYSTEM, _KEY, _CONNECTION_STRING |
gcs |
real GCS via S3-interop (HMAC) | COLDFRONT_GCS_ACCESS_KEY, _SECRET_KEY, _BUCKET |
In GitHub Actions these come from repo secrets; an unset secret arrives
empty, so that backend stays PENDING. Fork PRs (no secret access) run
SeaweedFS-only.