Skip to content

Configuration

pgEdge Vectorizer can be configured through PostgreSQL's GUC (Grand Unified Configuration) system. These settings control how the extension connects to embedding providers, manages background workers, processes text chunks, and maintains the processing queue. Most settings can be changed by any user and take effect after reloading the configuration with pg_reload_conf(), though some require a server restart.

Provider Settings

These settings configure the connection to your embedding provider, including the API endpoint, authentication, and model selection.

Parameter Default Description Reload Restart Superuser
pgedge_vectorizer.provider openai Embedding provider (openai, voyage, ollama) No No No
pgedge_vectorizer.api_key_file ~/.pgedge-vectorizer-llm-api-key API key file path (not needed for Ollama) No No No
pgedge_vectorizer.api_url https://api.openai.com/v1 API endpoint No No No
pgedge_vectorizer.model text-embedding-3-small Model name No No No

Worker Settings

These settings control the background workers that process the embedding queue, including concurrency, batch sizes, and retry behavior.

Parameter Default Description Reload Restart Superuser
pgedge_vectorizer.num_workers 2 Number of workers No Yes Yes
pgedge_vectorizer.databases (empty) Comma-separated list of databases to monitor Yes Yes No
pgedge_vectorizer.batch_size 10 Batch size for embeddings Yes No No
pgedge_vectorizer.max_retries 3 Max retry attempts Yes No No
pgedge_vectorizer.worker_poll_interval 1000 Poll interval in ms Yes No No

Chunking Settings

These settings determine how text content is split into chunks before embedding generation.

Parameter Default Description Reload Restart Superuser
pgedge_vectorizer.auto_chunk true Enable auto-chunking Yes No No
pgedge_vectorizer.default_chunk_strategy token_based Chunking strategy Yes No No
pgedge_vectorizer.default_chunk_size 400 Chunk size in tokens Yes No No
pgedge_vectorizer.default_chunk_overlap 50 Overlap in tokens Yes No No
pgedge_vectorizer.strip_non_ascii true Strip non-ASCII characters (emoji, box-drawing, etc.) Yes No No

Queue Management

These settings control automatic cleanup of completed queue items to prevent unbounded growth.

Parameter Default Description Reload Restart Superuser
pgedge_vectorizer.auto_cleanup_hours 24 Automatically delete completed queue items older than this many hours. Set to 0 to disable. Workers clean up once per hour. Yes No No