Using Git Repository Sources
As an alternative to local files, pgEdge Document Loader can clone and process documentation directly from Git repositories. This is useful for:
- Loading documentation from remote repositories without manual cloning
- Processing specific branches or tags (e.g., versioned documentation)
- Automated pipelines that fetch and load docs from source control
Git Source Options
| Option | Required | Description |
|---|---|---|
--git-url |
Yes* | Git repository URL to clone |
--git-branch |
No | Branch to checkout (default: repository default) |
--git-tag |
No | Tag to checkout (mutually exclusive with branch) |
--git-doc-path |
No | Path within repository to process |
--git-clone-dir |
No | Directory to store cloned repositories |
--git-keep-clone |
No | Keep cloned repository after processing |
--git-skip-fetch |
No | Skip fetch if repository already exists |
*Either --source or --git-url is required, but not both.
Basic Usage
Clone a repository and process all supported files from the root:
pgedge-docloader \
--git-url https://github.com/org/docs-repo.git \
--db-host localhost \
--db-name mydb \
--db-user myuser \
--db-table documents \
--col-doc-content content \
--col-file-name filename
Processing a Specific Directory
Use --git-doc-path to process files from a specific directory within the
repository:
pgedge-docloader \
--git-url https://github.com/org/project.git \
--git-doc-path docs/api \
--db-host localhost \
--db-name mydb \
--db-user myuser \
--db-table documents \
--col-doc-content content
The --git-doc-path option supports glob patterns:
# Process only markdown files in the docs directory
pgedge-docloader \
--git-url https://github.com/org/project.git \
--git-doc-path "docs/**/*.md" \
--config config.yml
Working with Branches and Tags
Checkout a Specific Branch
pgedge-docloader \
--git-url https://github.com/org/docs.git \
--git-branch main \
--git-doc-path docs \
--config config.yml
Checkout a Specific Tag
Use tags for versioned documentation:
pgedge-docloader \
--git-url https://github.com/org/project.git \
--git-tag v2.0.0 \
--git-doc-path docs \
--set-column version="2.0.0" \
--config config.yml
Note
--git-branch and --git-tag are mutually exclusive. You cannot specify
both options at the same time.
Persistent Clone Directory
By default, repositories are cloned to a temporary directory and removed after processing. For repeated runs, you can specify a persistent clone directory:
pgedge-docloader \
--git-url https://github.com/org/docs.git \
--git-clone-dir /var/cache/docloader/repos \
--git-keep-clone \
--config config.yml
On subsequent runs with --git-skip-fetch, the tool will reuse the existing
clone without fetching updates:
pgedge-docloader \
--git-url https://github.com/org/docs.git \
--git-clone-dir /var/cache/docloader/repos \
--git-keep-clone \
--git-skip-fetch \
--config config.yml
Configuration File Example
Git source options can also be specified in a configuration file:
# Git source configuration
git-url: https://github.com/org/docs-repo.git
git-branch: main
git-doc-path: docs
git-clone-dir: /var/cache/docloader/repos
git-keep-clone: true
# Database configuration
db-host: localhost
db-name: mydb
db-user: myuser
db-table: documents
# Column mappings
col-doc-content: content
col-file-name: filename
col-doc-title: title
# Custom metadata
custom-columns:
source: "git-repo"
project: "my-project"
Then run with:
pgedge-docloader --config config.yml
Authentication
HTTPS URLs
For public repositories, use the HTTPS URL directly:
--git-url https://github.com/org/public-repo.git
For private repositories, you can use a personal access token in the URL:
--git-url https://[email protected]/org/private-repo.git
Or configure Git credential helpers before running the tool.
SSH URLs
For SSH authentication, ensure your SSH keys are configured:
--git-url [email protected]:org/repo.git
Error Handling
The tool will fail with a clear error message if:
- Git is not installed on the system
- The repository URL is invalid or inaccessible
- The specified branch or tag does not exist
- The
--git-doc-pathdoes not exist in the repository
Best Practices
-
Use tags for versioned docs: When loading documentation for specific software versions, use
--git-tagto ensure consistency. -
Cache clones for repeated runs: Use
--git-clone-dirand--git-keep-cloneto avoid re-cloning on every run. -
Use
--git-skip-fetchcarefully: Only skip fetching when you're sure the local clone is up-to-date. -
Set version metadata: Use
--set-columnto add version information when processing tagged releases:--git-tag v1.2.3 --set-column version="1.2.3"