pgEdge Document Loader
Overview
The pgEdge Document Loader is a command-line tool written in Go that loads documents from various formats into a PostgreSQL database. The tool automatically converts documents to Markdown format and extracts metadata before storing them in the database.
Supported Formats
The tool supports the following document formats:
- HTML (
.html,.htm) - Extracts title from<title>tag - Markdown (
.md) - Extracts title from first#heading - reStructuredText (
.rst) - Extracts title from underlined headings
Key Features
- Automatic document format detection
- Conversion to Markdown format
- Metadata extraction (title, filename, timestamps)
- Flexible column mapping
- Support for single files, directories, and glob patterns
- Update or insert mode (upsert functionality)
- Transaction-based processing with automatic rollback on errors
- Configuration file support for reusable setups
- Secure password handling (environment variable, .pgpass, or interactive)
Quick Start
-
Install the tool:
make install -
Create a database table (see Database Setup)
-
Run the tool:
pgedge-docloader \ --source ./docs \ --db-host localhost \ --db-name mydb \ --db-user myuser \ --db-table documents \ --col-doc-content content \ --col-file-name filename
Documentation
License
This project is licensed under the PostgreSQL License. See LICENCE.md for details.