pgEdge RAG Server
The RAG (Retrieval-Augmented Generation) service runs an intelligent query server alongside your database. The service uses vector and keyword search to retrieve relevant document chunks from PostgreSQL and synthesizes LLM-generated answers based on the retrieved context. For more information, see the pgEdge RAG Server project.
Overview
The Control Plane provisions a RAG service container on each specified
host. The service connects to the database using an existing user
specified in the connect_as field, which must be defined in
database_users, and automatically embeds that user's credentials in
the service configuration. Client applications submit natural language
queries to the service, which performs hybrid vector and keyword search
against document tables and returns LLM-synthesized answers with source
citations.
See Managing Services for instructions on adding, updating, and removing services. The sections below cover RAG-specific configuration.
Database Prerequisites
Before deploying a RAG service, your PostgreSQL database must have the following items configured:
- The pgvector extension must be installed and enabled.
- The database must have document tables with text and vector columns.
- An HNSW index on vector columns enables fast similarity search.
- A GIN index on text columns enables keyword search (BM25).
The Control Plane can automatically provision all of these during
database creation using the scripts.post_database_create hook. See
Preparing the Database for a complete
example. Alternatively, you can provision these manually after
database creation.
Configuration Reference
All configuration fields are provided in the config object of the
service spec.
Service Connection
The connect_as field at the service level specifies which database
user the RAG service authenticates as. This user must already be
defined in the database_users array when creating the database. The
Control Plane automatically embeds that user's credentials in the
service configuration.
The following example shows the connect_as field in the service
spec:
{
"service_id": "rag",
"service_type": "rag",
"connect_as": "app_read_only",
"config": { ... }
}
In this example, app_read_only must be defined in database_users:
{
"username": "app_read_only",
"password": "your_password",
"attributes": ["LOGIN"]
}
Pipeline Configuration
The pipelines array (required) defines one or more RAG workflows.
Each pipeline specifies which tables to search, which embedding
provider to use, and which LLM to use to generate answers.
The following table describes the pipeline configuration fields:
| Field | Type | Description |
|---|---|---|
pipelines[].name |
string | Required. Pipeline identifier used in query URLs. Lowercase alphanumeric, hyphens, and underscores. Must not start with a hyphen. |
pipelines[].description |
string | Optional. Human-readable pipeline description. |
pipelines[].tables[] |
array | Required. Array of table specifications. See Table Configuration. |
pipelines[].embedding_llm |
object | Required. Embedding provider config. See Embedding Configuration. |
pipelines[].rag_llm |
object | Required. LLM provider config. See LLM Configuration. |
pipelines[].token_budget |
integer | Optional. Max tokens for context documents sent to the LLM. |
pipelines[].top_n |
integer | Optional. Number of documents to retrieve per query. |
pipelines[].system_prompt |
string | Optional. Custom system prompt prepended to every LLM request for this pipeline. |
pipelines[].search |
object | Optional. Search behavior settings. See Search Configuration. |
Embedding Configuration
The embedding_llm object configures the embedding provider used to
vectorize each incoming query. The embedding vector is then used for
similarity search against stored document vectors. All required fields
must be set; api_key is not required for ollama.
The following table describes the embedding configuration fields:
| Field | Type | Description |
|---|---|---|
provider |
string | Required. The embedding provider. One of: openai, voyage, ollama. |
model |
string | Required. The embedding model name (e.g., text-embedding-3-small, voyage-3, nomic-embed-text). |
api_key |
string | API key for the provider. Required for openai and voyage. Not used for ollama. |
base_url |
string | Optional. Custom base URL for the provider API. Required for ollama - set this to the network-accessible address of your Ollama server (e.g., http://192.168.1.10:11434). |
LLM Configuration
The rag_llm object configures the LLM provider used to synthesize
the final answer from retrieved documents. api_key is required for
all providers except ollama.
The following table describes the LLM configuration fields:
| Field | Type | Description |
|---|---|---|
provider |
string | Required. The LLM provider. One of: anthropic, openai, ollama. |
model |
string | Required. The model name (e.g., claude-sonnet-4-5, gpt-4o, llama3.2). |
api_key |
string | API key for the provider. Required for anthropic and openai. Not used for ollama. |
base_url |
string | Optional. Custom base URL for API gateway routing. Required for ollama - set this to the network-accessible address of your Ollama server (e.g., http://192.168.1.10:11434). |
Note
If embedding_llm and rag_llm share the same provider and both
specify an api_key, the values must be identical. The pgEdge RAG
Server maintains one key slot per provider and cannot reconcile
two different values.
Table Configuration
Each table in a pipeline specifies how to access document text and embeddings. The following table describes the table configuration fields:
| Field | Type | Description |
|---|---|---|
table |
string | Required. The table or view name containing documents. |
text_column |
string | Required. Column name containing the document text. |
vector_column |
string | Required. Column name containing the embedding vectors. |
id_column |
string | Optional. Column name for document IDs. Defaults to the table's primary key. Required for views. |
Search Configuration
The search object tunes how documents are retrieved before being
passed to the LLM. The following table describes the search
configuration fields:
| Field | Type | Default | Description |
|---|---|---|---|
hybrid_enabled |
boolean | true |
Enable hybrid search combining vector similarity and BM25 keyword matching. Set to false for vector-only search. |
vector_weight |
float | 0.5 |
Weight for vector search versus BM25 (0.0-1.0). Higher values prioritize semantic relevance. |
Defaults Configuration
The optional defaults object sets fallback values applied to any
pipeline that does not specify its own token_budget or top_n. The
following table describes the defaults configuration fields:
| Field | Type | Description |
|---|---|---|
defaults.token_budget |
integer | Default max tokens for context documents. Must be a positive integer. |
defaults.top_n |
integer | Default number of documents to retrieve. Must be a positive integer. |
Preparing the Database
Before deploying a RAG service, you must prepare your PostgreSQL
database with pgvector, document tables, and indexes. The Control
Plane automatically executes these during database creation when you
include them in the scripts.post_database_create array in your
database specification.
Required Schema
Include the following SQL statements in scripts.post_database_create
to automatically initialize the database schema during creation:
-- Enable pgvector extension
CREATE EXTENSION IF NOT EXISTS vector;
-- Create documents table with embeddings
CREATE TABLE IF NOT EXISTS documents_content_chunks (
id BIGSERIAL PRIMARY KEY,
content TEXT NOT NULL,
embedding vector(1536),
title TEXT,
source TEXT,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
-- HNSW index for vector similarity search
CREATE INDEX IF NOT EXISTS documents_embedding_idx
ON documents_content_chunks USING hnsw (embedding vector_cosine_ops);
-- GIN index for keyword search (BM25)
CREATE INDEX IF NOT EXISTS documents_content_idx
ON documents_content_chunks USING gin (to_tsvector('english', content));
These statements are included as individual entries in the
scripts.post_database_create array (see examples below).
Vector Dimensions
Adjust the vector(N) dimension to match your embedding model. The
following table shows common models and their vector dimensions:
| Provider | Model | Dimensions |
|---|---|---|
| OpenAI | text-embedding-3-small |
1536 |
| OpenAI | text-embedding-3-large |
3072 |
| Voyage AI | voyage-3 / voyage-3-large |
1024 |
| Ollama | nomic-embed-text |
768 |
| Ollama | Other models | Check model documentation |
Examples
The following examples show how to configure the RAG service for
common use cases. The first example includes the complete
scripts.post_database_create setup to automatically provision the
database schema (pgvector extension, tables, and indexes) using
vector(1536) for OpenAI embeddings. Subsequent examples focus on
service configuration variations and omit the schema setup for brevity.
If you use a different embedding model, adjust the vector(N) dimension
in your schema to match - for example, vector(1024) for voyage-3 or
vector(768) for nomic-embed-text.
Minimal (OpenAI + Anthropic)
In the following example, a curl command provisions a RAG service
that uses OpenAI for embeddings and Anthropic Claude to generate answers:
curl -X POST http://host-1:3000/v1/databases \
-H 'Content-Type: application/json' \
--data '{
"id": "knowledge-base",
"spec": {
"database_name": "knowledge_base",
"database_users": [
{
"username": "admin",
"password": "admin_password",
"db_owner": true,
"attributes": ["SUPERUSER", "LOGIN"]
}
],
"port": 5432,
"nodes": [
{ "name": "n1", "host_ids": ["host-1"] }
],
"scripts": {
"post_database_create": [
"CREATE EXTENSION IF NOT EXISTS vector",
"CREATE TABLE IF NOT EXISTS documents_content_chunks (id BIGSERIAL PRIMARY KEY, content TEXT NOT NULL, embedding vector(1536), title TEXT, source TEXT)",
"CREATE INDEX ON documents_content_chunks USING hnsw (embedding vector_cosine_ops)",
"CREATE INDEX ON documents_content_chunks USING gin (to_tsvector('\''english'\'', content))"
]
},
"services": [
{
"service_id": "rag",
"service_type": "rag",
"version": "latest",
"host_ids": ["host-1"],
"port": 9200,
"connect_as": "admin",
"config": {
"pipelines": [
{
"name": "default",
"description": "Main RAG pipeline",
"tables": [
{
"table": "documents_content_chunks",
"text_column": "content",
"vector_column": "embedding"
}
],
"embedding_llm": {
"provider": "openai",
"model": "text-embedding-3-small",
"api_key": "sk-..."
},
"rag_llm": {
"provider": "anthropic",
"model": "claude-sonnet-4-5",
"api_key": "sk-ant-..."
},
"token_budget": 4000,
"top_n": 10
}
]
}
}
]
}
}'
OpenAI End-to-End
In the following example, OpenAI is used for both embeddings and to generate answers:
curl -X POST http://host-1:3000/v1/databases \
-H 'Content-Type: application/json' \
--data '{
"id": "knowledge-base",
"spec": {
"database_name": "knowledge_base",
"database_users": [
{
"username": "admin",
"password": "admin_password",
"db_owner": true,
"attributes": ["SUPERUSER", "LOGIN"]
}
],
"nodes": [
{ "name": "n1", "host_ids": ["host-1"] }
],
"services": [
{
"service_id": "rag",
"service_type": "rag",
"version": "latest",
"host_ids": ["host-1"],
"port": 9200,
"connect_as": "admin",
"config": {
"pipelines": [
{
"name": "default",
"tables": [
{
"table": "documents_content_chunks",
"text_column": "content",
"vector_column": "embedding"
}
],
"embedding_llm": {
"provider": "openai",
"model": "text-embedding-3-small",
"api_key": "sk-..."
},
"rag_llm": {
"provider": "openai",
"model": "gpt-4o",
"api_key": "sk-..."
}
}
]
}
}
]
}
}'
Voyage AI with Vector-Only Search
In the following example, Voyage AI is used for embeddings and the service is configured for vector-only search (disabling BM25 keyword matching):
curl -X POST http://host-1:3000/v1/databases \
-H 'Content-Type: application/json' \
--data '{
"id": "knowledge-base",
"spec": {
"database_name": "knowledge_base",
"database_users": [
{
"username": "admin",
"password": "admin_password",
"db_owner": true,
"attributes": ["SUPERUSER", "LOGIN"]
}
],
"nodes": [
{ "name": "n1", "host_ids": ["host-1"] }
],
"services": [
{
"service_id": "rag",
"service_type": "rag",
"version": "latest",
"host_ids": ["host-1"],
"port": 9200,
"connect_as": "admin",
"config": {
"pipelines": [
{
"name": "default",
"tables": [
{
"table": "documents_content_chunks",
"text_column": "content",
"vector_column": "embedding"
}
],
"embedding_llm": {
"provider": "voyage",
"model": "voyage-3",
"api_key": "pa-..."
},
"rag_llm": {
"provider": "anthropic",
"model": "claude-sonnet-4-5",
"api_key": "sk-ant-..."
},
"search": {
"hybrid_enabled": false
}
}
]
}
}
]
}
}'
Ollama (Self-Hosted)
In the following example, the RAG service uses a self-hosted Ollama
server for both embeddings and answer generation. No API key is
required; the Ollama server URL is provided via base_url:
curl -X POST http://host-1:3000/v1/databases \
-H 'Content-Type: application/json' \
--data '{
"id": "knowledge-base",
"spec": {
"database_name": "knowledge_base",
"database_users": [
{
"username": "admin",
"password": "admin_password",
"db_owner": true,
"attributes": ["SUPERUSER", "LOGIN"]
}
],
"nodes": [
{ "name": "n1", "host_ids": ["host-1"] }
],
"services": [
{
"service_id": "rag",
"service_type": "rag",
"version": "latest",
"host_ids": ["host-1"],
"port": 9200,
"connect_as": "admin",
"config": {
"pipelines": [
{
"name": "default",
"tables": [
{
"table": "documents_content_chunks",
"text_column": "content",
"vector_column": "embedding"
}
],
"embedding_llm": {
"provider": "ollama",
"model": "nomic-embed-text",
"base_url": "http://ollama-host:11434"
},
"rag_llm": {
"provider": "ollama",
"model": "llama3.2",
"base_url": "http://ollama-host:11434"
}
}
]
}
}
]
}
}'
Multiple Pipelines with Shared Defaults
In the following example, two pipelines share default values for
token_budget and top_n, set with the defaults properties:
curl -X POST http://host-1:3000/v1/databases \
-H 'Content-Type: application/json' \
--data '{
"id": "knowledge-base",
"spec": {
"database_name": "knowledge_base",
"database_users": [
{
"username": "admin",
"password": "admin_password",
"db_owner": true,
"attributes": ["SUPERUSER", "LOGIN"]
}
],
"nodes": [
{ "name": "n1", "host_ids": ["host-1"] }
],
"services": [
{
"service_id": "rag",
"service_type": "rag",
"version": "latest",
"host_ids": ["host-1"],
"port": 9200,
"connect_as": "admin",
"config": {
"defaults": {
"token_budget": 4000,
"top_n": 10
},
"pipelines": [
{
"name": "docs",
"description": "Product documentation",
"tables": [
{
"table": "doc_chunks",
"text_column": "content",
"vector_column": "embedding"
}
],
"embedding_llm": {
"provider": "openai",
"model": "text-embedding-3-small",
"api_key": "sk-..."
},
"rag_llm": {
"provider": "anthropic",
"model": "claude-sonnet-4-5",
"api_key": "sk-ant-..."
}
},
{
"name": "support",
"description": "Support ticket history",
"tables": [
{
"table": "ticket_chunks",
"text_column": "body",
"vector_column": "embedding"
}
],
"embedding_llm": {
"provider": "openai",
"model": "text-embedding-3-small",
"api_key": "sk-..."
},
"rag_llm": {
"provider": "anthropic",
"model": "claude-sonnet-4-5",
"api_key": "sk-ant-..."
},
"top_n": 5
}
]
}
}
]
}
}'
Deployment Guide
This section shows the complete flow from database creation to a working pipeline query.
Step 1 - Create the Database
Include scripts.post_database_create to automatically provision the
pgvector schema during database creation. This avoids any manual setup
after deployment. Use a fixed port value for the RAG service so the
URL stays stable across container restarts.
curl -X POST http://host-1:3000/v1/databases \
-H 'Content-Type: application/json' \
--data '{
"id": "knowledge-base",
"spec": {
"database_name": "knowledge_base",
"database_users": [
{
"username": "admin",
"password": "admin_password",
"db_owner": true,
"attributes": ["SUPERUSER", "LOGIN"]
},
{
"username": "app_read_only",
"password": "readonly_password",
"attributes": ["LOGIN"]
}
],
"port": 5432,
"nodes": [
{ "name": "n1", "host_ids": ["host-1"] }
],
"scripts": {
"post_database_create": [
"CREATE EXTENSION IF NOT EXISTS vector",
"CREATE TABLE IF NOT EXISTS documents_content_chunks (id BIGSERIAL PRIMARY KEY, content TEXT NOT NULL, embedding vector(1536), title TEXT, source TEXT)",
"CREATE INDEX ON documents_content_chunks USING hnsw (embedding vector_cosine_ops)",
"CREATE INDEX ON documents_content_chunks USING gin (to_tsvector('\''english'\'', content))",
"GRANT SELECT ON documents_content_chunks TO app_read_only"
]
},
"services": [
{
"service_id": "rag",
"service_type": "rag",
"version": "latest",
"host_ids": ["host-1"],
"port": 9200,
"connect_as": "app_read_only",
"config": {
"pipelines": [
{
"name": "default",
"description": "Main RAG pipeline",
"tables": [
{
"table": "documents_content_chunks",
"text_column": "content",
"vector_column": "embedding"
}
],
"embedding_llm": {
"provider": "openai",
"model": "text-embedding-3-small",
"api_key": "sk-..."
},
"rag_llm": {
"provider": "anthropic",
"model": "claude-sonnet-4-5",
"api_key": "sk-ant-..."
},
"token_budget": 4000,
"top_n": 10
}
]
}
}
]
}
}'
Step 2 - Check the Database and Service Status
Run the following command after approximately 60-90 seconds to check that the database is ready and the RAG service is running:
curl -s http://host-1:3000/v1/databases/knowledge-base
In the response, look for the following items:
- The
state: "available"field at the top level confirms that the database is provisioned and healthy. - The
service_ready: truefield insideservice_instances[].statusconfirms that the RAG container is up and accepting requests.
{
state: "available"
instances: [
{
state: "available"
postgres: {
patroni_state: "running"
role: "primary"
}
}
]
service_instances: [
{
state: "running"
status: {
service_ready: true
ports: [
{
container_port: 8080
host_port: 9200
name: "tcp"
}
]
last_health_at: "2026-04-22T10:00:00Z"
}
}
]
}
The host_port value is the port to use when querying the RAG
service. If you used a fixed port: 9200 in the service spec, the
host port will always be 9200.
Tip
Use a fixed port value (e.g. 9200) in the service spec rather
than port: 0. When port: 0 is used, Docker assigns a random
host port that changes each time the RAG container is replaced
(e.g. after an API key update), requiring you to look up the new
port each time.
Step 3 - Load Documents
The RAG service needs documents with embeddings in the database before
it can answer queries. The following Python script generates embeddings
using OpenAI and inserts them into documents_content_chunks:
#!/usr/bin/env python3
import psycopg2
from psycopg2.extras import execute_values
from openai import OpenAI
import os
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
conn = psycopg2.connect(
host=os.environ.get("DB_HOST", "host-1"),
port=int(os.environ.get("DB_PORT", "5432")),
user=os.environ.get("DB_USER", "admin"),
password=os.environ.get("DB_PASSWORD", "admin_password"),
database=os.environ.get("DB_NAME", "knowledge_base"),
)
cur = conn.cursor()
documents = [
{"title": "My Doc", "content": "Full document text goes here...", "source": "docs"},
]
def chunk_text(text, size=500, overlap=50):
return [text[i:i+size] for i in range(0, len(text), size-overlap) if text[i:i+size].strip()]
for doc in documents:
chunks = chunk_text(doc["content"])
resp = client.embeddings.create(model="text-embedding-3-small", input=chunks)
embeddings = [item.embedding for item in resp.data]
execute_values(cur,
"INSERT INTO documents_content_chunks (content, embedding, title, source) VALUES %s",
[(c, e, doc["title"], doc["source"]) for c, e in zip(chunks, embeddings)],
)
conn.commit()
print(f"Loaded {len(chunks)} chunks from '{doc['title']}'")
cur.close()
conn.close()
Install the dependencies and run the script with the following commands:
pip install psycopg2-binary openai
export OPENAI_API_KEY="sk-..."
export DB_HOST="host-1"
export DB_USER="admin"
export DB_PASSWORD="admin_password"
export DB_NAME="knowledge_base"
python3 load_documents.py
To verify that documents were inserted, run the following query:
psql "postgresql://admin:admin_password@host-1:5432/knowledge_base" \
-c "SELECT COUNT(*), COUNT(embedding) FROM documents_content_chunks;"
Step 4 - Query the Pipeline
Send a query to the RAG service using the following command:
curl -X POST http://host-1:9200/v1/pipelines/default \
-H "Content-Type: application/json" \
-d '{
"query": "How does multi-active replication work?",
"include_sources": true
}'
A successful response looks like this:
{
"answer": "Multi-active replication allows multiple PostgreSQL nodes to accept writes simultaneously...",
"sources": [
{"id": "5", "content": "...", "score": 0.00820},
{"id": "1", "content": "...", "score": 0.00806}
],
"tokens_used": 1243
}
sources is only populated when include_sources: true is set in
the request.
Step 5 - Update the Service Config
To update the service (for example, to rotate an API key or change
the LLM model), submit a POST /v1/databases/{id} with the complete
updated spec. The update endpoint requires all fields - include
database_name, nodes, database_users, and the full services
array:
curl -X POST http://host-1:3000/v1/databases/knowledge-base \
-H 'Content-Type: application/json' \
--data '{
"spec": {
"database_name": "knowledge_base",
"port": 5432,
"nodes": [
{ "name": "n1", "host_ids": ["host-1"] }
],
"database_users": [
{
"username": "admin",
"password": "admin_password",
"db_owner": true,
"attributes": ["SUPERUSER", "LOGIN"]
},
{
"username": "app_read_only",
"password": "readonly_password",
"attributes": ["LOGIN"]
}
],
"services": [
{
"service_id": "rag",
"service_type": "rag",
"version": "latest",
"host_ids": ["host-1"],
"port": 9200,
"connect_as": "app_read_only",
"config": {
"pipelines": [
{
"name": "default",
"tables": [
{
"table": "documents_content_chunks",
"text_column": "content",
"vector_column": "embedding"
}
],
"embedding_llm": {
"provider": "openai",
"model": "text-embedding-3-small",
"api_key": "sk-..."
},
"rag_llm": {
"provider": "anthropic",
"model": "claude-sonnet-4-5",
"api_key": "sk-ant-NEW-KEY"
},
"token_budget": 4000,
"top_n": 10
}
]
}
}
]
}
}'
The RAG service container is replaced with the new configuration.
Poll the database status until state is "available" and
service_ready is true before sending queries.
Querying the RAG Service
Once the service is running, submit queries to retrieve answers based on your documents.
List Available Pipelines
To list all configured pipelines, send the following request:
curl http://host-1:9200/v1/pipelines
Query a Pipeline
To submit a query to a pipeline, send a POST request with the query text:
curl -X POST http://host-1:9200/v1/pipelines/default \
-H "Content-Type: application/json" \
-d '{
"query": "How does RAG improve LLM responses?",
"include_sources": true
}'
Request Fields
The following table describes the query request fields:
| Field | Type | Default | Description |
|---|---|---|---|
query |
string | - | Required. The natural language question to answer. |
include_sources |
boolean | false |
Return the source documents used to generate the answer. |
top_n |
integer | - | Override the pipeline's top_n for this request. |
stream |
boolean | false |
Stream the answer as Server-Sent Events. |
Response Format
A successful query response looks like this:
{
"answer": "RAG (Retrieval-Augmented Generation) improves LLM responses by retrieving relevant documents from your database before generating answers. This grounds the LLM in your specific data, reducing hallucinations and improving accuracy...",
"sources": [
{
"id": "42",
"content": "The RAG service enables retrieval-augmented generation workflows...",
"score": 0.00820
}
],
"tokens_used": 1243
}
sources is only populated when include_sources is true in the
request.
The RAG service's hybrid search combines two complementary techniques, merged using Reciprocal Rank Fusion (RRF):
- Vector similarity search retrieves documents semantically similar to the query using cosine distance on embeddings.
- BM25 keyword search retrieves documents with exact keyword matches using TF-IDF scoring.
This combination ensures the LLM receives context that is both semantically relevant and keyword-relevant. Documents appearing in both result sets receive higher scores, naturally prioritizing highly-relevant results.
Token Budget
The token_budget field controls how much context is sent to the LLM.
The service ranks documents and packs them in order until the budget
is exhausted. The final document is truncated at a sentence boundary.
Increase the budget to send more context, or decrease it to reduce
LLM costs.
Troubleshooting
The following sections describe common issues and how to resolve them.
About Automated Scripts
The scripts.post_database_create field executes SQL automatically
during database creation. The following details apply:
| Property | Details |
|---|---|
| Execution timing | Scripts run once, immediately after Spock is initialized. |
| Transactional | All statements execute within a single transaction. |
| No re-execution | If you update the database spec later, scripts are not re-run. |
| Constraints | Some SQL commands are not allowed within transactions, including VACUUM, ANALYZE, CREATE INDEX CONCURRENTLY, CREATE DATABASE, and DROP DATABASE. |
If a script fails during database creation, you can use
update-database to retry after fixing the problematic statement.
Service Fails to Start
To diagnose a service that fails to start, check database connectivity and user permissions.
To verify that the database is accessible, run the following command:
psql -h host-1 -U admin -d knowledge_base -c "SELECT 1"
To verify that the service user (app_read_only) exists and has table
access, run the following query:
\du+ app_read_only
\dt documents_content_chunks
Poor Query Results
To diagnose poor query results, verify that documents are loaded and embeddings are present.
To check document counts and embedding coverage, run the following queries:
SELECT COUNT(*) FROM documents_content_chunks;
SELECT COUNT(*) FROM documents_content_chunks WHERE embedding IS NOT NULL;
To find documents similar to a test query embedding, run the following query:
SELECT id, content, 1 - (embedding <=> '[0.1, 0.2, ...]'::vector) as similarity
FROM documents_content_chunks
ORDER BY similarity DESC
LIMIT 5;
Start with factual, keyword-based questions before complex analytical questions to verify that the pipeline is working correctly.
Empty Context Window
If the RAG service returns limited context, the token budget may be exhausted. Increase the budget in the pipeline configuration:
"token_budget": 8000
Alternatively, store smaller, more focused document chunks to fit more context within the budget.
Responsibility Summary
The following table summarizes which tasks are handled by the Control Plane and which are your responsibility:
| Step | Who | How |
|---|---|---|
| Provision schema (pgvector, tables, indexes) | Control Plane | scripts.post_database_create in database spec |
| Deploy RAG container | Control Plane | Automatic on POST /v1/databases |
| Inject database credentials | Control Plane | Automatic via connect_as field |
| Health monitoring and restart | Control Plane | Automatic |
| Generate embeddings | You | Call OpenAI / Voyage / Ollama API |
| Load documents into table | You | INSERT using psycopg2 or any Postgres client |
| Submit queries | Your application | POST /v1/pipelines/{name} on the RAG service |
Next Steps
The following resources provide more information on related topics:
- The Managing Services guide describes how to add, update, and remove services.
- The pgEdge RAG Server repository contains the pgEdge RAG Server source code.
- The pgEdge RAG Server Documentation covers the pgEdge RAG Server API and configuration in detail.
- The pgvector Documentation explains how to install and use the pgvector extension.