Version:

Adding a Supported Service

This guide explains how to add a new supported service type to the control plane. MCP is the reference implementation throughout. All touch points are summarized in the checklist at the end.

Caution

This document is subject to change. Some areas are still in active design:

Workflow refactoring: Some of the workflow code (e.g., service provisioning) needs to be refactored based on PR feedback. This work is being coordinated with the SystemD migration tasks.

Treat this as a snapshot of the current architecture, not a stable contract.

Overview

A supported service is a containerized application deployed alongside a pgEdge database. Services are declared in DatabaseSpec.Services, provisioned after the database is available, and deleted when the database is deleted.

One ServiceSpec produces N ServiceInstance records — one per host_id in the spec. Each instance runs as its own Docker Swarm service on the database's overlay network, with its own dedicated Postgres credentials.

Design Constraint: Services Depend on the Database Only

The current architecture constrains every service to have exactly one dependency: the parent database. There is no concept of service provisioning ordering or dependencies between services. All services declared in a database spec are provisioned independently and in parallel once the database is available.

This means:

A service cannot declare a dependency on another service
There is no ordering guarantee between services (service A may start before or after service B)
A service cannot discover another service's hostname or connection info at provisioning time

This constraint keeps the model simple and parallelizable, but it will likely need to be relaxed in the future. Scaling out the AI workbench will require multi-container service types where one component depends on another (e.g., a web client that proxies to an API server). Supporting this will require introducing service-to-service dependencies, provisioning ordering, and health-gated startup — effectively a dependency graph within the service layer.

The data flow from API request to running container looks like this:

API Request (spec.services)
  → Validation (API layer)
  → Store spec in etcd
  → UpdateDatabase workflow
    → PlanUpdate sub-workflow
      → For each (service, host_id):
          → Resolve service image from registry
          → Validate Postgres/Spock version compatibility
          → Build multi-host connection info (BuildServiceHostList)
          → Generate resource objects (network, user role, dir, config, spec, instance, monitor)
      → Merge service resources into EndState
      → Compute plan (diff current state vs. desired state)
    → Apply plan (execute resource Create/Update/Delete)
  → Service container running

API Layer

The ServiceSpec Goa type is defined in api/apiv1/design/database.go. It lives inside the DatabaseSpec as an array:

g.Attribute("services", g.ArrayOf(ServiceSpec), func() { ... })

The ServiceSpec type has these attributes:

Attribute	Type	Description
`service_id`	`Identifier`	Unique ID for this service within the database
`service_type`	`String` (enum)	The type of service (e.g., `"mcp"`)
`version`	`String`	Semver (e.g., `"1.0.0"`) or `"latest"`
`host_ids`	`[]Identifier`	Which hosts should run this service (one instance per host)
`port`	`Int` (optional)	Host port to publish; `0` = random; omitted = not published
`config`	`MapOf(String, Any)`	Service-specific configuration
`database_connection`	`DatabaseConnection` (optional)	Controls database connection topology (see Optional: `database_connection`)
`cpus`	`String` (optional)	CPU limit; accepts SI suffix `m` (e.g., `"500m"`, `"1"`)
`memory`	`String` (optional)	Memory limit in SI or IEC notation (e.g., `"512M"`, `"1GiB"`)
`orchestrator_opts`	`OrchestratorOpts` (optional)	Orchestrator-specific options (e.g., Swarm extra labels)

To add a new service type, add its string value to the enum on service_type:

g.Attribute("service_type", g.String, func() {
    g.Enum("mcp", "my-new-service")
})

The config attribute is intentionally MapOf(String, Any) so the API schema doesn't change when new service types are added. Config structure is validated at the application layer (see Validation).

After editing the design file, regenerate the API code:

make -C api generate

Validation

There are two validation layers that catch different classes of errors.

API validation (HTTP 400)

validateServiceSpec() in server/internal/api/apiv1/validate.go runs on every API request that includes services. It performs fast, syntactic checks:

service_id: must be a valid identifier (the shared validateIdentifier function)
service_type: must be in the allowlist. Currently this is a direct check:
```
if svc.ServiceType != "mcp" {
    // error: unsupported service type
}
```
Add your new type here.
version: must match semver format or be the literal "latest"
host_ids: must be unique within the service (duplicate host IDs are rejected)

config: dispatches to a per-type config validator:

if svc.ServiceType == "mcp" {
    errs = append(errs, validateMCPServiceConfig(svc.Config, ...)...)
}

The per-type config validator is where you enforce required fields, type correctness, and service-specific constraints. For example, validateMCPServiceConfig() in the same file:

Requires llm_provider and llm_model
Validates llm_provider is one of "anthropic", "openai", "ollama"
Requires the provider-specific API key (anthropic_api_key, openai_api_key, or ollama_url) based on the chosen provider

Write a parallel validateMyServiceConfig() function for your service type and add a dispatch branch in validateServiceSpec().

Workflow-time validation

GetServiceImage() in server/internal/orchestrator/swarm/service_images.go is called during workflow execution. If the service_type/version combination is not registered in the image registry, it returns an error that fails the workflow task and sets the database to "failed" state. Note that this is distinct from post-provision health-check failures detected by the service instance monitor, which transition individual ServiceInstance records to "failed" but do not change the parent database's state.

This catches cases where the API validation passes (valid semver, known type) but the specific version hasn't been registered. The E2E test TestProvisionMCPServiceUnsupportedVersion in e2e/service_provisioning_test.go demonstrates this: version "99.99.99" passes API validation but fails at workflow time.

Service Image Registry

The image registry maps (serviceType, version) pairs to container image references. It lives in server/internal/orchestrator/swarm/service_images.go.

Data structures

type ServiceImage struct {
    Tag                string                  // Full image:tag reference
    PostgresConstraint *host.VersionConstraint // Optional: restrict PG versions
    SpockConstraint    *host.VersionConstraint // Optional: restrict Spock versions
}

type ServiceVersions struct {
    images map[string]map[string]*ServiceImage // serviceType → version → image
}

Registering a new service type

Add your image in NewServiceVersions():

func NewServiceVersions(cfg config.Config) *ServiceVersions {
    versions := &ServiceVersions{...}

    // Existing MCP registration
    versions.addServiceImage("mcp", "latest", &ServiceImage{
        Tag: serviceImageTag(cfg, "postgres-mcp:latest"),
    })

    // Your new service
    versions.addServiceImage("my-service", "1.0.0", &ServiceImage{
        Tag: serviceImageTag(cfg, "my-service:1.0.0"),
    })

    return versions
}

serviceImageTag() prepends the configured registry host (cfg.DockerSwarm.ImageRepositoryHost) unless the image reference already contains a registry prefix (detected by checking if the first path component contains a ., :, or is localhost).

Version constraints

If your service is only compatible with specific Postgres or Spock versions, set the constraint fields:

versions.addServiceImage("my-service", "1.0.0", &ServiceImage{
    Tag: serviceImageTag(cfg, "my-service:1.0.0"),
    PostgresConstraint: &host.VersionConstraint{
        Min: host.MustParseVersion("15"),
        Max: host.MustParseVersion("17"),
    },
    SpockConstraint: &host.VersionConstraint{
        Min: host.MustParseVersion("4.0.0"),
    },
})

ValidateCompatibility() checks these constraints against the database's running versions during GenerateServiceInstanceResources() in the workflow. Constraint failures produce errors like "postgres version 14 does not satisfy constraint >=15".

Resource Lifecycle

Every service instance is represented by a chain of resources that participate in the standard plan/apply reconciliation cycle. The generic resources (Network, ServiceUserRole, DirResource, ServiceInstanceSpec, ServiceInstance, Monitor) are shared by all service types. Service-type-specific config resources (e.g., MCPConfigResource) slot into the chain between DirResource and ServiceInstanceSpec.

Dependency chain

Phase 1: Network (swarm.network)                    — no dependencies
         ServiceUserRole (swarm.service_user_role)   — no dependencies
Phase 2: DirResource (filesystem.dir)                — no dependencies
Phase 3: MCPConfigResource (swarm.mcp_config)        — depends on DirResource + ServiceUserRole
Phase 4: ServiceInstanceSpec (swarm.service_instance_spec) — depends on Network + MCPConfigResource
Phase 5: ServiceInstance (swarm.service_instance)    — depends on ServiceUserRole + ServiceInstanceSpec
Phase 6: ServiceInstanceMonitor (monitor.service_instance) — depends on ServiceInstance

On deletion, the order reverses: monitor first, then instance, spec, config, directory, and finally the network and user role.

A new service type replaces MCPConfigResource (Phase 3) with its own config resource. The rest of the chain is unchanged.

What each resource does

Network (server/internal/orchestrator/swarm/network.go): Creates a Docker Swarm overlay network for the database. The network is shared between Postgres instances and service instances of the same database, so it deduplicates naturally via identifier matching (both generate the same "{databaseID}-database" name). Uses the IPAM service to allocate subnets. Runs on ManagerExecutor.

ServiceUserRole (server/internal/orchestrator/swarm/service_user_role.go): Manages the Postgres user lifecycle for a service. This resource is keyed by ServiceID, so one role is shared across all instances of the same service. On Create, it generates a deterministic username via database.GenerateServiceUsername() (format: svc_{serviceID}), generates a random 32-byte password, creates the Postgres role with LOGIN and grants read-only access to the public schema. Credentials are persisted in the resource state and reused on subsequent reconciliation cycles. The role is created on the primary instance and Spock replicates it to all other nodes automatically. On Delete, it drops the role. Runs on PrimaryExecutor(nodeName). See docs/development/service-credentials.md for full details on credential generation.

DirResource (server/internal/filesystem/dir_resource.go): Creates and manages a host-side directory for the service instance's data files. The directory is bind-mounted into the container at /app/data. Config resources (like MCPConfigResource) write files into this directory before the container starts. On Delete, the directory and its contents are removed. Runs on HostExecutor(hostID).

MCPConfigResource (server/internal/orchestrator/swarm/mcp_config_resource.go): Generates and writes MCP server config files to the data directory. Manages three files: - config.yaml: CP-owned, overwritten on every Create/Update - tokens.yaml: Application-owned, written only on first Create - users.yaml: Application-owned, written only on first Create

Credentials are populated from the ServiceUserRole resource at runtime. A new service type would create an analogous config resource for its own config format. Runs on HostExecutor(hostID).

ServiceInstanceSpec (server/internal/orchestrator/swarm/service_instance_spec.go): A virtual resource that generates the Docker Swarm ServiceSpec. Its Refresh(), Create(), and Update() methods all call ServiceContainerSpec() to compute the spec from the current inputs. The computed spec is stored in the Spec field and consumed by the ServiceInstance resource during deployment. Delete() is a no-op. Runs on HostExecutor(hostID).

ServiceInstance (server/internal/orchestrator/swarm/service_instance.go): The resource that actually deploys the Docker Swarm service. On Create, it stores an initial etcd record with state="creating", then calls client.ServiceDeploy() with the spec from ServiceInstanceSpec, waits up to 5 minutes for the service to start, and transitions the state to "running". On Delete, it scales the service to 0 replicas (waiting for containers to stop), removes the Docker service, and deletes the etcd record. Runs on ManagerExecutor.

ServiceInstanceMonitor (server/internal/monitor/service_instance_monitor_resource.go): Registers (or deregisters) a health monitor for the service instance. The monitor periodically checks the service's /health endpoint and updates status in etcd. Runs on HostExecutor(hostID).

How resources are generated

GenerateServiceInstanceResources() in server/internal/orchestrator/swarm/orchestrator.go is the entry point. It:

Resolves the ServiceImage via GetServiceImage(serviceType, version)
Validates Postgres/Spock version compatibility if constraints exist
Constructs the resource chain: Network → ServiceUserRole → DirResource → config resource (e.g., MCPConfigResource) → ServiceInstanceSpec → ServiceInstance
Serializes them to []*resource.ResourceData via resource.ToResourceData()
Returns a *database.ServiceInstanceResources wrapper

The monitor resource is added separately in the workflow layer (see Workflow Integration).

All resource types are registered in server/internal/orchestrator/swarm/resources.go via resource.RegisterResourceType[*T](registry, ResourceTypeConstant), which enables the resource framework to deserialize stored state back into typed structs.

Container Spec

ServiceContainerSpec() in server/internal/orchestrator/swarm/service_spec.go builds the Docker Swarm ServiceSpec from a ServiceContainerSpecOptions struct. The options contain everything needed to build the spec: the ServiceSpec, credentials, image, network IDs, database connection info, and placement constraints.

The generated spec configures:

Placement: pins the container to a specific Swarm node via node.id==<cohortMemberID>
Networks: attaches to both the default bridge network (for control plane access and external connectivity) and the database overlay network (for Postgres connectivity)
Bind mount: the host-side data directory (managed by DirResource) is mounted into the container at /app/data. Config files written by the service-type-specific config resource (e.g., MCPConfigResource) are available to the container at startup.
Entrypoint: overrides the default container entrypoint to pass the config file path as a CLI argument (e.g., -config /app/data/config.yaml for MCP)
Port publication: buildServicePortConfig() publishes port 8080 in host mode. If the port field in the spec is nil, no port is published. If it's 0, Docker assigns a random port. If it's a specific value, that port is used.
Health check: currently configured to curl -f http://localhost:8080/health with a 30s start period, 10s interval, 5s timeout, and 3 retries
Resource limits: CPU and memory limits from the spec, if provided

Configuration delivery

Configuration reaches service containers via bind-mounted config files, not environment variables. The pattern follows Patroni's config delivery:

DirResource creates a host-side directory for the service instance
A service-type-specific config resource (e.g., MCPConfigResource) writes config files into that directory
ServiceContainerSpec() bind-mounts the directory into the container
The container entrypoint reads the config file from the mount path

This approach has several advantages over environment variables: - Config files can be updated without recreating the container (the config resource's Update() method overwrites the file, and the service can reload) - Structured formats (YAML, JSON) are easier to validate and debug than flattened env vars - Sensitive values (API keys, passwords) are stored in files with restricted permissions (0600) rather than being visible in docker inspect - Application-owned files (like MCP's tokens.yaml and users.yaml) can be written once and preserved across config updates

For a new service type, you will need to:

Create a config resource (analogous to MCPConfigResource) that generates your service's config files and writes them to the data directory
Adjust the entrypoint/args in ServiceContainerSpec() if your service reads config from a different path or uses a different CLI flag
Extend ServiceContainerSpecOptions if your service needs additional container-level settings (different health check endpoint, different target port, additional mount points, etc.)

Workflow Integration

The workflow layer is generic. No per-service-type changes are needed here.

PlanUpdate

PlanUpdate in server/internal/workflows/plan_update.go is the sub-workflow that computes the reconciliation plan. It:

Computes NodeInstances from the spec
Generates node resources (same as before services existed)
Determines a nodeName for ServiceUserRole executor routing — ServiceUserRole runs on PrimaryExecutor(nodeName) so the role is created on the primary instance and Spock replicates it to all nodes
Iterates spec.Services and for each (service, hostID) pair, calls getServiceResources()
Passes both node and service resources to operations.UpdateDatabase()

getServiceResources

getServiceResources() in the same file builds an operations.ServiceResources for a single service instance:

Generates the ServiceInstanceID via database.GenerateServiceInstanceID(databaseID, serviceID, hostID)
Resolves target_session_attrs via resolveTargetSessionAttrs() (see Database Connection Topology)
Builds the ordered host list via database.BuildServiceHostList(), which produces an ordered []ServiceHostEntry array with co-located and local-node instances prioritized
Constructs a database.ServiceInstanceSpec with all the inputs (including DatabaseHosts and TargetSessionAttrs)
Fires the GenerateServiceInstanceResources activity (executes on the manager queue)
Wraps the result in operations.ServiceResources, adding the ServiceInstanceMonitorResource

EndState

EndState() in server/internal/database/operations/end.go merges service resources into the desired end state:

for _, svc := range services {
    state, err := svc.State()
    end.Merge(state)
}

Service resources always land in the final plan, after all node operations. This is because intermediate states (from UpdateNodes, AddNodes, PopulateNodes) only contain node diffs. PlanAll produces one plan per state transition, so services end up in the last plan (the diff from the last intermediate state to EndState).

Resources that exist in the current state but are absent from the end state are automatically marked PendingDeletion by the plan engine, which generates delete events in reverse dependency order.

Database Connection Topology

Services connect to Postgres via the Docker overlay network. Rather than connecting to a single instance, each service instance receives an ordered list of hosts so that libpq (or the service's connection library) can try multiple instances in priority order and select one matching the desired role (primary vs standby).

The architecture separates generic host ordering (reusable by all service types) from service-specific configuration (how each service type maps its own semantics to connection parameters).

Generic layer: `BuildServiceHostList`

BuildServiceHostList() in server/internal/database/service_connection.go is the shared host list builder. It is service-type agnostic — it knows nothing about MCP, allow_writes, or any service-specific config. It accepts:

type BuildServiceHostListParams struct {
    ServiceHostID      string              // Host where the service instance runs
    NodeInstances      []*NodeInstances    // All database instances, grouped by node
    TargetNodes        []string            // Optional ordered node filter (from database_connection.target_nodes)
    TargetSessionAttrs string              // Caller-provided: "primary", "prefer-standby", etc.
}

And returns:

type ServiceConnectionInfo struct {
    Hosts              []ServiceHostEntry  // Ordered host:port pairs
    TargetSessionAttrs string              // Passed through unchanged
}

Ordering algorithm:

Determine node list: If TargetNodes is set, use only listed nodes in the specified order. Otherwise, use all nodes with the local node (containing the service's host) first, then remaining nodes in iteration order.
Group by node: For each node, list instances with the co-located instance (same host as the service) first, then remaining instances.
Pass through TargetSessionAttrs: The builder does not interpret this value — it comes from the caller.

Hostname format: postgres-{instanceID} (matches swarm convention). Port: always 5432 (internal container port via overlay network).

Service-specific layer: `resolveTargetSessionAttrs`

Each service type maps its own config semantics to a target_session_attrs value. This dispatch lives in server/internal/workflows/plan_update.go:

func resolveTargetSessionAttrs(serviceSpec *database.ServiceSpec) string {
    // Tier 1: Explicit user setting in database_connection
    if serviceSpec.DatabaseConnection != nil && serviceSpec.DatabaseConnection.TargetSessionAttrs != "" {
        return serviceSpec.DatabaseConnection.TargetSessionAttrs
    }
    // Tier 2: Per-service-type default
    switch serviceSpec.ServiceType {
    case "mcp":
        if allowWrites, ok := serviceSpec.Config["allow_writes"].(bool); ok && allowWrites {
            return "primary"
        }
        return "prefer-standby"
    default:
        return "prefer-standby"
    }
}

This is the only service-type-specific code in the workflow path. If the user explicitly sets database_connection.target_session_attrs, that value is used directly (already validated at the API layer). Otherwise, the per-service- type default applies. The fallback is "prefer-standby" for safety — a new service type defaults to read-only behavior. prefer-standby falls back to the primary when no standbys exist, so it works in all topologies.

Services without multi-host support

BuildServiceHostList always produces the full ordered host list regardless of whether the service supports multi-host connections. Each service type's config generator decides how to consume the list:

Multi-host services (e.g., MCP) use the full hosts array and target_session_attrs for automatic failover at the connection layer.
Single-host services take hosts[0] — the ordering algorithm ensures this is the optimal choice (co-located instance, local node). This is equivalent to the pre-PLAT-463 findPostgresInstance behavior.

Single-host services lose automatic connection-layer failover. After a Patroni failover within a node, the service points at the old primary (now a replica) until the Control Plane intervenes. This is acceptable for standard multi-active deployments (1 host per node) where every host runs a writable primary and Swarm self-heals. For HA topologies, Phase 3 (proactive config regeneration on failover) narrows the window by detecting role changes and regenerating the config with the new primary as hosts[0].

No changes to the shared infrastructure are needed — this is purely a config generator concern. When adding a new service type, document whether it supports multi-host connections in the service type's config generator.

Config generation (per service type)

Each service type has its own config generator that maps the generic []ServiceHostEntry into whatever format the service expects. For MCP, this is GenerateMCPConfig() in server/internal/orchestrator/swarm/mcp_config.go, which produces a YAML config with a structured hosts array:

databases:
  - name: mydb
    hosts:
      - host: postgres-abc123
        port: 5432
      - host: postgres-def456
        port: 5432
    target_session_attrs: prefer-standby
    # ... other fields

A future service type would implement its own config generator, converting []ServiceHostEntry into the format that service expects (e.g., a comma-separated DSN, a JSON config, environment variables, etc.).

What a new service type inherits automatically

By using BuildServiceHostList, any new service type automatically gets:

Co-location preference: Instances on the same host as the service are tried first for lowest latency
Local-node preference: Instances on the same database node are tried before remote nodes
database_connection.target_nodes filtering: API users can override the default ordering with an explicit node list in the service spec
Multi-host failover: The full host list enables the service's connection library to try alternate instances if the preferred one is unavailable

What a new service type must implement

Touch point	What to do
`resolveTargetSessionAttrs` in `plan_update.go`	Add a `case` for the per-service-type default `target_session_attrs` (used when `database_connection.target_session_attrs` is not set)
Config generator	Create a config generator (analogous to `GenerateMCPConfig`) that maps `[]ServiceHostEntry` + `TargetSessionAttrs` into your service's config format
`GenerateServiceInstanceResources` in `orchestrator.go`	Wire the config generator into the resource generation pipeline (analogous to `MCPConfigResource`)
Validation in `validate.go` (if needed)	Cross-validate `database_connection.target_session_attrs` against service-specific config (e.g., MCP rejects `allow_writes: true` with standby-targeting attrs)

Optional: `database_connection`

The ServiceSpec includes an optional database_connection struct:

type DatabaseConnection struct {
    TargetNodes        []string `json:"target_nodes,omitempty"`
    TargetSessionAttrs string   `json:"target_session_attrs,omitempty"`
}

target_nodes: When set, BuildServiceHostList uses only the listed nodes in the specified order, ignoring co-location-based ordering. Useful when the API user wants explicit control over which database nodes a service connects to (e.g., pinning a read-heavy service to a specific replica node).
target_session_attrs: When set, overrides the per-service-type default (e.g., MCP's allow_writes → primary/prefer-standby mapping). Valid values: primary, prefer-standby, standby, read-write, any.

Validation: format, uniqueness, node-name existence (against the database spec's node names), and target_session_attrs enum are all checked at the API layer. MCP cross-validates allow_writes vs target_session_attrs to reject unsafe combinations (e.g., allow_writes: true with target_session_attrs: prefer-standby).

Domain Model

ServiceSpec

server/internal/database/spec.go:

type ServiceSpec struct {
    ServiceID          string              `json:"service_id"`
    ServiceType        string              `json:"service_type"`
    Version            string              `json:"version"`
    HostIDs            []string            `json:"host_ids"`
    Config             map[string]any      `json:"config"`
    DatabaseConnection *DatabaseConnection `json:"database_connection,omitempty"`
    Port               *int                `json:"port,omitempty"`
    CPUs               *float64            `json:"cpus,omitempty"`
    MemoryBytes        *uint64             `json:"memory,omitempty"`
    OrchestratorOpts   *OrchestratorOpts   `json:"orchestrator_opts,omitempty"`
}

This is the spec-level declaration that lives inside Spec.Services. It's service-type-agnostic — no fields are MCP-specific. The Config map holds all service-specific settings. The optional DatabaseConnection controls connection topology (see Optional: database_connection).

ServiceInstance

server/internal/database/service_instance.go:

The runtime artifact that tracks an individual container's state. Key fields: ServiceInstanceID, ServiceID, DatabaseID, HostID, State (creating/running/failed/deleting), Status (container ID, hostname, IP, port mappings, health), and Credentials (ServiceUser with username and password).

ID generation

Function	Format	Example
`GenerateServiceInstanceID(dbID, svcID, hostID)`	`"{dbID}-{svcID}-{hostID}"`	`"mydb-mcp-host1"`
`GenerateServiceUsername(svcID)`	`"svc_{svcID}"`	`"svc_mcp_server"`
`GenerateDatabaseNetworkID(dbID)`	`"{dbID}"`	`"mydb"`

GenerateDatabaseNetworkID returns the resource identifier used to look up the overlay network in the resource registry. The actual Docker Swarm network name is "{databaseID}-database" (set in the Network.Name field in orchestrator.go).

Usernames longer than 63 characters are truncated with a deterministic hash suffix. Because the username is now per-service (not per-instance), all instances of the same service share one set of credentials. See docs/development/service-credentials.md for details.

ServiceResources

server/internal/database/operations/common.go:

type ServiceResources struct {
    ServiceInstanceID string
    Resources         []*resource.ResourceData
    MonitorResource   resource.Resource
}

The operations-layer wrapper that bridges the orchestrator output and the planning system. Resources holds the serialized orchestrator resources (from GenerateServiceInstanceResources). MonitorResource is the ServiceInstanceMonitorResource. The State() method merges both into a resource.State for use in EndState().

Testing

Unit tests

Container spec tests in server/internal/orchestrator/swarm/service_spec_test.go:

Table-driven tests for ServiceContainerSpec() and its helpers. The test pattern uses per-case check functions:

{
    name: "basic MCP service",
    opts: &ServiceContainerSpecOptions{...},
    checks: []checkFunc{
        checkLabels(expectedLabels),
        checkNetworks("bridge", "my-db-database"),
        checkContainerSpec(image, mounts, command, args),
        checkPlacement("node.id==swarm-node-1"),
        checkHealthcheck("/health", 8080),
        checkPorts(8080, 5434),
    },
}

Add test cases for your new service type here, particularly if it has different health check endpoints, ports, or other spec differences.

Image registry tests in server/internal/orchestrator/swarm/service_images_test.go:

Tests GetServiceImage(), SupportedServiceVersions(), and ValidateCompatibility(). Covers both happy path (valid type + version) and error cases (unsupported type, unregistered version, constraint violations). Add test cases for your new service type's image registration and any version constraints.

Golden plan tests

The golden plan tests in server/internal/database/operations/update_database_test.go validate that service resources are correctly integrated into the plan/apply reconciliation cycle.

How they work:

Build a start *resource.State representing the current state of the world
Build []*operations.NodeResources and []*operations.ServiceResources representing the desired state
Call operations.UpdateDatabase() to compute the plan
Summarize the plans via resource.SummarizePlans()
Compare against a committed JSON golden file via testutils.GoldenTest[T]

Test helpers in server/internal/database/operations/helpers_test.go define stub resource types that mirror the real swarm types' Identifier(), Dependencies(), DiffIgnore(), and Executor() without importing the swarm package. This avoids pulling in the Docker SDK and keeps tests self-contained. The stubs use the orchestratorResource embedding pattern already established in the file.

Note

The service-specific test stubs (makeServiceResources() and its companion types) are being added as part of PLAT-412. Once merged, makeServiceResources() will construct a complete set of stub resources for a single service instance, serialize them to []*resource.ResourceData, create the real monitor.ServiceInstanceMonitorResource, and return the operations.ServiceResources wrapper.

Five standard test cases will cover the full lifecycle:

Test case	Start state	Services	Verifies
`single node with service from empty`	empty	1 service	Service resources created in correct phase order alongside database resources
`single node with service no-op`	node + service	same service	Unchanged services produce an empty plan (the core regression test)
`add service to existing database`	node only	1 new service	Only service create events, no database changes
`remove service from existing database`	node + service	nil	Service delete events in reverse dependency order, database unchanged
`update database node with unchanged service`	node + service	same service	Only database update events, service resources untouched

These test cases are generic and apply regardless of service type. To regenerate golden files after changes:

go test ./server/internal/database/operations/... -run TestUpdateDatabase -update

Always review the generated JSON files in golden_test/TestUpdateDatabase/ before committing.

E2E tests

E2E tests in e2e/service_provisioning_test.go validate service provisioning against a real control plane cluster.

Build tag: //go:build e2e_test

The tests use fixture.NewDatabaseFixture() for auto-cleanup and poll the API for state transitions. Key patterns to replicate for a new service type:

Pattern	Example test	What it validates
Single-host provision	`TestProvisionMCPService`	Service reaches `"running"` state
Multi-host provision	`TestProvisionMultiHostMCPService`	One instance per host, all reach `"running"`
Add to existing DB	`TestUpdateDatabaseAddService`	Service added without affecting database
Remove from DB	`TestUpdateDatabaseRemoveService`	Empty `Services` array removes the service
Stability	`TestUpdateDatabaseServiceStable`	Unrelated DB update doesn't recreate service (checks `created_at` and `container_id` unchanged)
Bad version	`TestProvisionMCPServiceUnsupportedVersion`	Unregistered version fails task, DB goes to `"failed"`
Recovery	`TestProvisionMCPServiceRecovery`	Failed DB recovered by updating with valid version

Run service E2E tests:

make test-e2e E2E_RUN=TestProvisionMCPService

API Usage & Verification

This section shows how services look in the API request and response payloads. Use these examples to verify your integration or to hand-test with curl.

Creating a Database with a Service

POST /v1/databases

{
  "id": "my-app",
  "spec": {
    "database_name": "storefront",
    "port": 5432,
    "database_users": [
      {
        "username": "admin",
        "password": "secret",
        "db_owner": true,
        "attributes": ["LOGIN", "SUPERUSER"]
      }
    ],
    "nodes": [
      {
        "name": "n1",
        "host_ids": ["host-1"]
      }
    ],
    "services": [
      {
        "service_id": "mcp-server",
        "service_type": "mcp",
        "version": "latest",
        "host_ids": ["host-1"],
        "port": 8080,
        "config": {
          "llm_provider": "anthropic",
          "llm_model": "claude-sonnet-4-5",
          "anthropic_api_key": "sk-ant-..."
        },
        "database_connection": {
          "target_nodes": ["n1"],
          "target_session_attrs": "primary"
        }
      }
    ]
  }
}

A successful response returns HTTP 200 with a task object you can poll for completion.

Validation Error Response

If the request payload fails validation, the API returns HTTP 400 with an APIError. All service validation errors use the invalid_input error name.

Example — missing a required MCP config field:

{
  "name": "invalid_input",
  "message": "services[0].config: missing required field 'llm_provider'"
}

Other common validation errors:

Condition	Example `message`
Duplicate `service_id`	`services[1]: service IDs must be unique within a database`
Unsupported `service_type`	`services[0].service_type: unsupported service type 'foo' (only 'mcp' is currently supported)`
Bad `version` format	`services[0].version: version must be in semver format (e.g., '1.0.0') or 'latest'`
Missing provider API key	`services[0].config: missing required field 'anthropic_api_key' for anthropic provider`
Unsupported `llm_provider`	`services[0].config[llm_provider]: unsupported llm_provider 'foo' (must be one of: anthropic, openai, ollama)`

Reading a Database with Service Instances

GET /v1/databases/my-app returns the full database, including runtime service_instances:

{
  "id": "my-app",
  "state": "available",
  "created_at": "2025-06-01T12:00:00Z",
  "updated_at": "2025-06-01T12:05:00Z",
  "spec": {
    "services": [
      {
        "service_id": "mcp-server",
        "service_type": "mcp",
        "version": "latest",
        "host_ids": ["host-1"],
        "port": 8080,
        "config": {
          "llm_provider": "anthropic",
          "llm_model": "claude-sonnet-4-5"
        }
      }
    ]
  },
  "service_instances": [
    {
      "service_instance_id": "my-app-mcp-server-host-1",
      "service_id": "mcp-server",
      "database_id": "my-app",
      "host_id": "host-1",
      "state": "running",
      "status": {
        "container_id": "a1b2c3d4e5f6",
        "image_version": "latest",
        "addresses": [
            "10.0.1.5",
            "mcp-server-host-1.internal"
        ],
        "ports": [
          {
            "name": "http",
            "container_port": 8080,
            "host_port": 8080
          }
        ],
        "health_check": {
          "status": "healthy",
          "message": "Service responding normally",
          "checked_at": "2025-06-01T12:04:50Z"
        },
        "last_health_at": "2025-06-01T12:04:50Z",
        "service_ready": true
      },
      "created_at": "2025-06-01T12:00:30Z",
      "updated_at": "2025-06-01T12:01:00Z"
    }
  ]
}

Note

Sensitive config keys (anthropic_api_key, openai_api_key, and any key matching patterns like password, secret, token, credential, private_key, access_key) are stripped from all API responses. The config object in spec.services will only contain non-sensitive keys.

Updating Services on an Existing Database

Services are managed through the spec.services array in PUT /v1/databases/{id}. The control plane diffs the desired state against the current state and creates, updates, or deletes service instances accordingly.

Add a service — include it in the services array:

{
  "spec": {
    "services": [
      {
        "service_id": "mcp-server",
        "service_type": "mcp",
        "version": "latest",
        "host_ids": ["host-1"],
        "config": {
          "llm_provider": "anthropic",
          "llm_model": "claude-sonnet-4-5",
          "anthropic_api_key": "sk-ant-..."
        }
      },
      {
        "service_id": "mcp-analytics",
        "service_type": "mcp",
        "version": "1.0.0",
        "host_ids": ["host-2"],
        "config": {
          "llm_provider": "openai",
          "llm_model": "gpt-4",
          "openai_api_key": "sk-..."
        }
      }
    ]
  }
}

Remove a service — omit it from the services array. The control plane deletes the corresponding service instances:

{
  "spec": {
    "services": []
  }
}

Update a service — change fields (e.g., version, config, host_ids) in the existing entry. The control plane will update or recreate service instances as needed.

Service Health & Failure in API State

Service instance health is tracked by the service instance monitor, which periodically polls the Docker Swarm orchestrator for container status. The results are reflected in the service_instances array of the database response.

State lifecycle:

State	Meaning
`creating`	Instance provisioned; waiting for container to become healthy (up to 5 min timeout)
`running`	Container is healthy and accepting requests
`failed`	Container health check failed, creation timed out, or container disappeared
`deleting`	Instance is being removed

How failures surface:

When a running instance's health check fails, the monitor transitions it to failed and populates the error field with a diagnostic message (e.g., "container is no longer healthy").
When a creating instance exceeds the 5-minute creation timeout, it transitions to failed with an error like "creation timeout after 5m0s - container not healthy".
If the container disappears entirely (nil status from orchestrator), a running instance transitions to failed after a 30-second grace period with "container status not available".

Example of a failed service instance in the API response:

{
  "service_instance_id": "my-app-mcp-server-host-1",
  "service_id": "mcp-server",
  "database_id": "my-app",
  "host_id": "host-1",
  "state": "failed",
  "status": null,
  "error": "creation timeout after 5m0s - no status available",
  "created_at": "2025-06-01T12:00:30Z",
  "updated_at": "2025-06-01T12:05:30Z"
}

Note

Service instance failures detected by the monitor do not automatically change the parent database's state. A database can be "available" while one or more of its service instances are "failed". This is distinct from provisioning/workflow failures (e.g., an unregistered image version), which do set the database to "failed" because the workflow itself fails. Monitor both database.state and service_instances[].state for a complete health picture.

Checklist: Adding a New Service Type

Step	File	Change
1. API enum	`api/apiv1/design/database.go`	Add to `g.Enum(...)` on `service_type`
2. Regenerate	—	`make -C api generate`
3. Validation	`server/internal/api/apiv1/validate.go`	Add type to allowlist in `validateServiceSpec()`; add `validateMyServiceConfig()` function
4. Image registry	`server/internal/orchestrator/swarm/service_images.go`	Call `versions.addServiceImage()` in `NewServiceVersions()`
5. Connection topology	`server/internal/workflows/plan_update.go`	Add `case` to `resolveTargetSessionAttrs()` for your service type
6. Config generator	`server/internal/orchestrator/swarm/`	Create config generator mapping `[]ServiceHostEntry` to your service's config format; document whether the service supports multi-host connections (see Services without multi-host support)
7. Container spec	`server/internal/orchestrator/swarm/service_spec.go`	Service-specific configuration delivery, health check, mounts, entrypoint
8. Unit tests	`swarm/service_spec_test.go`, `swarm/service_images_test.go`	Add cases for new type
9. Golden plan tests	`operations/update_database_test.go`	Already covered generically; regenerate with `-update` if resource shape changes
10. E2E tests	`e2e/service_provisioning_test.go`	Add provision, lifecycle, stability, and failure/recovery tests

What Doesn't Change

The following are service-type-agnostic and require no modification:

ServiceSpec struct — server/internal/database/spec.go
ServiceInstance domain model — server/internal/database/service_instance.go
Workflow code — server/internal/workflows/plan_update.go (except adding a case to resolveTargetSessionAttrs for your service type's target_session_attrs mapping)
Generic resource types — Network, ServiceUserRole, DirResource, ServiceInstanceSpec, ServiceInstance, ServiceInstanceMonitor are service-type-agnostic (you add a service-type-specific config resource alongside them)
Operations layer — server/internal/database/operations/ (UpdateDatabase, EndState)
Store/etcd layer

Future Work

Read/write service user accounts: Service users are currently provisioned with the pgedge_application_read_only role. Some service types will require write access (INSERT, UPDATE, DELETE, DDL). This will require a mechanism for the service spec to declare the required access level and for ServiceUserRole to provision the appropriate role accordingly.
Primary-aware database connection routing (in progress — PLAT-463): BuildServiceHostList and resolveTargetSessionAttrs provide multi-host connection topology with target_session_attrs support. Services receive an ordered host list and the connection library selects the appropriate instance (primary or standby) at connect time. See Database Connection Topology for details. Future phases will add proactive config regeneration on failover/switchover events.
Persistent bind mounts (implemented): Service containers now use a DirResource to create a host-side data directory, which is bind-mounted into the container at /app/data. Config files are written to this directory by service-type-specific config resources (e.g., MCPConfigResource). Application-owned files (like token and user stores) are preserved across config updates. See Configuration delivery for details.

Appendix: MCP Reference Implementation for AI-Assisted Development

Note

This section is designed for consumption by Claude Code (or similar AI assistants). When a developer asks Claude to help add a new service type, point it at this document. The code below provides complete, copy-editable reference implementations from MCP at each touch point, so the assistant can work from concrete examples rather than having to read every source file independently.

A.1 API Enum (`api/apiv1/design/database.go`)

The ServiceSpec Goa type. To add a new service type, add it to the g.Enum() call on service_type:

// lines 125–191
var ServiceSpec = g.Type("ServiceSpec", func() {
    g.Attribute("service_id", Identifier, func() {
        g.Description("The unique identifier for this service.")
        g.Example("mcp-server")
        g.Example("analytics-service")
        g.Meta("struct:tag:json", "service_id")
    })
    g.Attribute("service_type", g.String, func() {
        g.Description("The type of service to run.")
        g.Enum("mcp") // ← add new type here, e.g. g.Enum("mcp", "my-service")
        g.Example("mcp")
        g.Meta("struct:tag:json", "service_type")
    })
    g.Attribute("version", g.String, func() {
        g.Description("The version of the service in semver format (e.g., '1.0.0') or the literal 'latest'.")
        g.Pattern(serviceVersionPattern) // `^\d+\.\d+\.\d+|latest$`
        g.Example("1.0.0")
        g.Example("latest")
        g.Meta("struct:tag:json", "version")
    })
    g.Attribute("host_ids", HostIDs, func() {
        g.Description("The IDs of the hosts that should run this service.")
        g.MinLength(1)
        g.Meta("struct:tag:json", "host_ids")
    })
    g.Attribute("port", g.Int, func() {
        g.Description("The port to publish the service on the host.")
        g.Minimum(0)
        g.Maximum(65535)
        g.Meta("struct:tag:json", "port,omitempty")
    })
    g.Attribute("config", g.MapOf(g.String, g.Any), func() {
        g.Description("Service-specific configuration.")
        g.Meta("struct:tag:json", "config")
    })
    g.Attribute("cpus", g.String, func() {
        g.Description("CPU limit. Accepts SI suffix 'm', e.g. '500m'.")
        g.Pattern(cpuPattern)
        g.Meta("struct:tag:json", "cpus,omitempty")
    })
    g.Attribute("memory", g.String, func() {
        g.Description("Memory limit in SI or IEC notation, e.g. '512M', '1GiB'.")
        g.MaxLength(16)
        g.Meta("struct:tag:json", "memory,omitempty")
    })

    g.Required("service_id", "service_type", "version", "host_ids", "config")
})

After editing, run make -C api generate.

A.2 Validation (`server/internal/api/apiv1/validate.go`)

validateServiceSpec() — the dispatcher. Add your type to the allowlist and config dispatch:

// lines 229–280
func validateServiceSpec(svc *api.ServiceSpec, path []string) []error {
    var errs []error

    serviceIDPath := appendPath(path, "service_id")
    errs = append(errs, validateIdentifier(string(svc.ServiceID), serviceIDPath))

    // ← add your type to this check
    if svc.ServiceType != "mcp" {
        err := fmt.Errorf("unsupported service type '%s' (only 'mcp' is currently supported)", svc.ServiceType)
        errs = append(errs, newValidationError(err, appendPath(path, "service_type")))
    }

    if svc.Version != "latest" && !semverPattern.MatchString(svc.Version) {
        err := errors.New("version must be in semver format (e.g., '1.0.0') or 'latest'")
        errs = append(errs, newValidationError(err, appendPath(path, "version")))
    }

    seenHostIDs := make(ds.Set[string], len(svc.HostIds))
    for i, hostID := range svc.HostIds {
        hostIDStr := string(hostID)
        hostIDPath := appendPath(path, "host_ids", arrayIndexPath(i))
        errs = append(errs, validateIdentifier(hostIDStr, hostIDPath))
        if seenHostIDs.Has(hostIDStr) {
            err := errors.New("host IDs must be unique within a service")
            errs = append(errs, newValidationError(err, hostIDPath))
        }
        seenHostIDs.Add(hostIDStr)
    }

    // ← add dispatch for your type here
    if svc.ServiceType == "mcp" {
        errs = append(errs, validateMCPServiceConfig(svc.Config, appendPath(path, "config"))...)
    }

    if svc.Cpus != nil {
        errs = append(errs, validateCPUs(svc.Cpus, appendPath(path, "cpus"))...)
    }
    if svc.Memory != nil {
        errs = append(errs, validateMemory(svc.Memory, appendPath(path, "memory"))...)
    }

    return errs
}

validateMCPServiceConfig() — the per-type config validator to use as a template:

// lines 283–330
func validateMCPServiceConfig(config map[string]any, path []string) []error {
    var errs []error

    requiredFields := []string{"llm_provider", "llm_model"}
    for _, field := range requiredFields {
        if _, ok := config[field]; !ok {
            err := fmt.Errorf("missing required field '%s'", field)
            errs = append(errs, newValidationError(err, path))
        }
    }

    if val, exists := config["llm_provider"]; exists {
        provider, ok := val.(string)
        if !ok {
            err := errors.New("llm_provider must be a string")
            errs = append(errs, newValidationError(err, appendPath(path, mapKeyPath("llm_provider"))))
        } else {
            validProviders := []string{"anthropic", "openai", "ollama"}
            if !slices.Contains(validProviders, provider) {
                err := fmt.Errorf("unsupported llm_provider '%s' (must be one of: %s)",
                    provider, strings.Join(validProviders, ", "))
                errs = append(errs, newValidationError(err, appendPath(path, mapKeyPath("llm_provider"))))
            }

            switch provider {
            case "anthropic":
                if _, ok := config["anthropic_api_key"]; !ok {
                    err := errors.New("missing required field 'anthropic_api_key' for anthropic provider")
                    errs = append(errs, newValidationError(err, path))
                }
            case "openai":
                if _, ok := config["openai_api_key"]; !ok {
                    err := errors.New("missing required field 'openai_api_key' for openai provider")
                    errs = append(errs, newValidationError(err, path))
                }
            case "ollama":
                if _, ok := config["ollama_url"]; !ok {
                    err := errors.New("missing required field 'ollama_url' for ollama provider")
                    errs = append(errs, newValidationError(err, path))
                }
            }
        }
    }

    return errs
}

A.3 Image Registry (`server/internal/orchestrator/swarm/service_images.go`)

Register the image in NewServiceVersions():

// lines 39–68
func NewServiceVersions(cfg config.Config) *ServiceVersions {
    versions := &ServiceVersions{
        cfg:    cfg,
        images: make(map[string]map[string]*ServiceImage),
    }

    // MCP service versions
    versions.addServiceImage("mcp", "latest", &ServiceImage{
        Tag: serviceImageTag(cfg, "postgres-mcp:latest"),
    })

    // ← add your service here:
    // versions.addServiceImage("my-service", "1.0.0", &ServiceImage{
    //     Tag: serviceImageTag(cfg, "my-service:1.0.0"),
    // })

    return versions
}

Supporting types:

type ServiceImage struct {
    Tag                string                  `json:"tag"`
    PostgresConstraint *host.VersionConstraint `json:"postgres_constraint,omitempty"`
    SpockConstraint    *host.VersionConstraint `json:"spock_constraint,omitempty"`
}

// GetServiceImage resolves (serviceType, version) → *ServiceImage.
// Returns an error if the type or version is unregistered.
func (sv *ServiceVersions) GetServiceImage(serviceType string, version string) (*ServiceImage, error) {
    versionMap, ok := sv.images[serviceType]
    if !ok {
        return nil, fmt.Errorf("unsupported service type %q", serviceType)
    }
    image, ok := versionMap[version]
    if !ok {
        return nil, fmt.Errorf("unsupported version %q for service type %q", version, serviceType)
    }
    return image, nil
}

// serviceImageTag prepends the configured registry host unless the image
// reference already contains a registry prefix.
func serviceImageTag(cfg config.Config, imageRef string) string {
    if strings.Contains(imageRef, "/") {
        parts := strings.Split(imageRef, "/")
        firstPart := parts[0]
        if strings.Contains(firstPart, ".") || strings.Contains(firstPart, ":") || firstPart == "localhost" {
            return imageRef
        }
    }
    if cfg.DockerSwarm.ImageRepositoryHost == "" {
        return imageRef
    }
    return fmt.Sprintf("%s/%s", cfg.DockerSwarm.ImageRepositoryHost, imageRef)
}

A.4 Container Spec (`server/internal/orchestrator/swarm/service_spec.go`)

ServiceContainerSpecOptions — the input struct:

type ServiceContainerSpecOptions struct {
    ServiceSpec        *database.ServiceSpec
    ServiceInstanceID  string
    DatabaseID         string
    DatabaseName       string
    HostID             string
    ServiceName        string
    Hostname           string
    CohortMemberID     string
    ServiceImage       *ServiceImage
    Credentials        *database.ServiceUser
    DatabaseNetworkID  string
    Port               *int
    DataPath           string                        // Host-side directory for bind mount
}

ServiceContainerSpec() — builds the Docker Swarm ServiceSpec:

func ServiceContainerSpec(opts *ServiceContainerSpecOptions) (swarm.ServiceSpec, error) {
    labels := map[string]string{
        "pgedge.component":           "service",
        "pgedge.service.instance.id": opts.ServiceInstanceID,
        "pgedge.service.id":          opts.ServiceSpec.ServiceID,
        "pgedge.database.id":         opts.DatabaseID,
        "pgedge.host.id":             opts.HostID,
    }

    // Merge user-provided extra labels
    if opts.ServiceSpec.OrchestratorOpts != nil && opts.ServiceSpec.OrchestratorOpts.Swarm != nil {
        for k, v := range opts.ServiceSpec.OrchestratorOpts.Swarm.ExtraLabels {
            labels[k] = v
        }
    }

    networks := []swarm.NetworkAttachmentConfig{
        {Target: "bridge"},
        {Target: opts.DatabaseNetworkID},
    }

    image := opts.ServiceImage.Tag
    ports := buildServicePortConfig(opts.Port)

    // Bind mount for config/auth files
    mounts := []mount.Mount{
        docker.BuildMount(opts.DataPath, "/app/data", false),
    }

    // ... resource limits omitted for brevity ...

    return swarm.ServiceSpec{
        TaskTemplate: swarm.TaskSpec{
            ContainerSpec: &swarm.ContainerSpec{
                Image:    image,
                Labels:   labels,
                Hostname: opts.Hostname,
                User:     fmt.Sprintf("%d", mcpContainerUID),
                Command:  []string{"/app/pgedge-postgres-mcp"},
                Args:     []string{"-config", "/app/data/config.yaml"},
                Healthcheck: &container.HealthConfig{
                    Test:        []string{"CMD-SHELL", "curl -f http://localhost:8080/health || exit 1"},
                    StartPeriod: time.Second * 30,
                    Interval:    time.Second * 10,
                    Timeout:     time.Second * 5,
                    Retries:     3,
                },
                Mounts: mounts,
            },
            Networks: networks,
            Placement: &swarm.Placement{
                Constraints: []string{
                    "node.id==" + opts.CohortMemberID,
                },
            },
            Resources: resources,
        },
        EndpointSpec: &swarm.EndpointSpec{
            Mode:  swarm.ResolutionModeVIP,
            Ports: ports,
        },
        Annotations: swarm.Annotations{
            Name:   opts.ServiceName,
            Labels: labels,
        },
    }, nil
}

Configuration delivery — Config is delivered via bind-mounted YAML files, not environment variables. The container spec sets up the bind mount and overrides the entrypoint to pass the config path:

// Bind mount for config/auth files
mounts := []mount.Mount{
    docker.BuildMount(opts.DataPath, "/app/data", false),
}

// Container entrypoint passes config file path
Command: []string{"/app/pgedge-postgres-mcp"},
Args:    []string{"-config", "/app/data/config.yaml"},

The config files themselves are generated by MCPConfigResource (see server/internal/orchestrator/swarm/mcp_config_resource.go), which writes three files to the data directory:

config.yaml — CP-owned, regenerated on every Create/Update
tokens.yaml — application-owned, written only on first Create
users.yaml — application-owned, written only on first Create

For a new service type, create an analogous config resource that writes your service's config files to the data directory. The DirResource and bind mount are generic and reusable.

buildServicePortConfig() — port publication:

// lines 175–196
func buildServicePortConfig(port *int) []swarm.PortConfig {
    if port == nil {
        return nil
    }
    config := swarm.PortConfig{
        PublishMode: swarm.PortConfigPublishModeHost,
        TargetPort:  8080,
        Name:        "http",
        Protocol:    swarm.PortConfigProtocolTCP,
    }
    if *port > 0 {
        config.PublishedPort = uint32(*port)
    } else if *port == 0 {
        config.PublishedPort = 0
    }
    return []swarm.PortConfig{config}
}

A.5 Domain Model (`server/internal/database/spec.go`)

// lines 116–125
type ServiceSpec struct {
    ServiceID   string         `json:"service_id"`
    ServiceType string         `json:"service_type"`
    Version     string         `json:"version"`
    HostIDs     []string       `json:"host_ids"`
    Config      map[string]any `json:"config"`
    Port        *int           `json:"port,omitempty"`
    CPUs        *float64       `json:"cpus,omitempty"`
    MemoryBytes *uint64        `json:"memory,omitempty"`
}

This struct is service-type-agnostic. No changes needed when adding a new type.

A.6 E2E Test Pattern (`e2e/service_provisioning_test.go`)

Complete example of provisioning a service and verifying it reaches "running" state:

// lines 18–147
func TestProvisionMCPService(t *testing.T) {
    t.Parallel()

    host1 := fixture.HostIDs()[0]

    ctx, cancel := context.WithTimeout(context.Background(), 10*time.Minute)
    defer cancel()

    t.Log("Creating database with MCP service")

    db := fixture.NewDatabaseFixture(ctx, t, &controlplane.CreateDatabaseRequest{
        Spec: &controlplane.DatabaseSpec{
            DatabaseName: "test_mcp_service",
            DatabaseUsers: []*controlplane.DatabaseUserSpec{
                {
                    Username:   "admin",
                    Password:   pointerTo("testpassword"),
                    DbOwner:    pointerTo(true),
                    Attributes: []string{"LOGIN", "SUPERUSER"},
                },
            },
            Port: pointerTo(0),
            Nodes: []*controlplane.DatabaseNodeSpec{
                {
                    Name:    "n1",
                    HostIds: []controlplane.Identifier{controlplane.Identifier(host1)},
                },
            },
            Services: []*controlplane.ServiceSpec{
                {
                    ServiceID:   "mcp-server",
                    ServiceType: "mcp",
                    Version:     "latest",
                    HostIds:     []controlplane.Identifier{controlplane.Identifier(host1)},
                    Config: map[string]any{
                        "llm_provider":      "anthropic",
                        "llm_model":         "claude-sonnet-4-5",
                        "anthropic_api_key": "sk-ant-test-key-12345",
                    },
                },
            },
        },
    })

    t.Log("Database created, verifying service instances")

    require.NotNil(t, db.ServiceInstances, "ServiceInstances should not be nil")
    require.Len(t, db.ServiceInstances, 1, "Expected 1 service instance")

    serviceInstance := db.ServiceInstances[0]
    assert.Equal(t, "mcp-server", serviceInstance.ServiceID)
    assert.Equal(t, string(host1), serviceInstance.HostID)
    assert.NotEmpty(t, serviceInstance.ServiceInstanceID)

    validStates := []string{"creating", "running"}
    assert.Contains(t, validStates, serviceInstance.State)

    // Poll until running
    if serviceInstance.State == "creating" {
        t.Log("Service is still creating, waiting for running...")
        maxWait := 5 * time.Minute
        pollInterval := 5 * time.Second
        deadline := time.Now().Add(maxWait)

        for time.Now().Before(deadline) {
            err := db.Refresh(ctx)
            require.NoError(t, err)
            if len(db.ServiceInstances) > 0 && db.ServiceInstances[0].State == "running" {
                break
            }
            time.Sleep(pollInterval)
        }

        require.Len(t, db.ServiceInstances, 1)
        assert.Equal(t, "running", db.ServiceInstances[0].State)
    }

    // Verify status/connection info
    serviceInstance = db.ServiceInstances[0]
    if serviceInstance.Status != nil {
        assert.NotNil(t, serviceInstance.Status.Hostname)
        assert.NotNil(t, serviceInstance.Status.Ipv4Address)

        foundHTTPPort := false
        for _, port := range serviceInstance.Status.Ports {
            if port.Name == "http" && port.ContainerPort != nil && *port.ContainerPort == 8080 {
                foundHTTPPort = true
                break
            }
        }
        assert.True(t, foundHTTPPort, "HTTP port (8080) should be configured")
    }
}

A.7 Step-by-Step Instructions for Adding a New Service

When a developer asks you to add a new service type (e.g., "my-service"), follow these steps in order:

api/apiv1/design/database.go: Add "my-service" to the g.Enum() call on service_type (see A.1). Run make -C api generate.
server/internal/api/apiv1/validate.go: Change the allowlist check in validateServiceSpec() to accept "my-service". Write a validateMyServiceConfig() function following the pattern in validateMCPServiceConfig() (see A.2). Add a dispatch branch in validateServiceSpec().
server/internal/orchestrator/swarm/service_images.go: Add a versions.addServiceImage("my-service", ...) call in NewServiceVersions() (see A.3).
Config resource: Create a config resource (analogous to MCPConfigResource in mcp_config_resource.go) that generates your service's config files and writes them to the data directory. The DirResource creates the host-side directory; your config resource writes files into it.
server/internal/workflows/plan_update.go: Add a case to resolveTargetSessionAttrs() mapping your service's config to the appropriate target_session_attrs value (see Database Connection Topology).
server/internal/orchestrator/swarm/orchestrator.go: Wire your config resource into the resource chain in GenerateServiceInstanceResources(), in the same position as MCPConfigResource (between DirResource and ServiceInstanceSpec).
server/internal/orchestrator/swarm/service_spec.go: If your service needs a different health check endpoint, different port, entrypoint, or additional mounts, modify ServiceContainerSpec() to branch on ServiceType (see A.4).
server/internal/orchestrator/swarm/service_spec_test.go: Add table-driven test cases for the new service type's container spec, bind mounts, and port config.
server/internal/orchestrator/swarm/service_images_test.go: Add test cases for GetServiceImage() with the new type and version(s).
e2e/service_provisioning_test.go: Add E2E tests following the patterns in A.6: single-host provision, multi-host, add/remove from existing database, stability (unrelated update doesn't recreate), bad version failure + recovery.

Files that do NOT need changes: spec.go, service_instance.go, resources.go, end.go, any store/etcd code.

Adding a Supported Service

Overview

Design Constraint: Services Depend on the Database Only

API Layer

Validation

API validation (HTTP 400)

Workflow-time validation

Service Image Registry

Data structures

Registering a new service type

Version constraints

Resource Lifecycle

Dependency chain

What each resource does

How resources are generated

Container Spec

Configuration delivery

Workflow Integration

PlanUpdate

getServiceResources

EndState

Database Connection Topology

Generic layer: BuildServiceHostList

Service-specific layer: resolveTargetSessionAttrs

Services without multi-host support

Config generation (per service type)

What a new service type inherits automatically

What a new service type must implement

Optional: database_connection

Domain Model

ServiceSpec

ServiceInstance

ID generation

ServiceResources

Testing

Unit tests

Golden plan tests

E2E tests

API Usage & Verification

Creating a Database with a Service

Validation Error Response

Reading a Database with Service Instances

Updating Services on an Existing Database

Service Health & Failure in API State

Checklist: Adding a New Service Type

What Doesn't Change

Future Work

Appendix: MCP Reference Implementation for AI-Assisted Development

A.1 API Enum (api/apiv1/design/database.go)

A.2 Validation (server/internal/api/apiv1/validate.go)

A.3 Image Registry (server/internal/orchestrator/swarm/service_images.go)

A.4 Container Spec (server/internal/orchestrator/swarm/service_spec.go)

A.5 Domain Model (server/internal/database/spec.go)

A.6 E2E Test Pattern (e2e/service_provisioning_test.go)

A.7 Step-by-Step Instructions for Adding a New Service

Generic layer: `BuildServiceHostList`

Service-specific layer: `resolveTargetSessionAttrs`

Optional: `database_connection`

A.1 API Enum (`api/apiv1/design/database.go`)

A.2 Validation (`server/internal/api/apiv1/validate.go`)

A.3 Image Registry (`server/internal/orchestrator/swarm/service_images.go`)

A.4 Container Spec (`server/internal/orchestrator/swarm/service_spec.go`)

A.5 Domain Model (`server/internal/database/spec.go`)

A.6 E2E Test Pattern (`e2e/service_provisioning_test.go`)