Converting and Loading SGML/DocBook Documents
Extensions: .sgml, .sgm, .xml
During an SGML conversion:
- documents are converted to Markdown format.
- the title is extracted from
<title>or<refentrytitle>tags (PostgreSQL-style reference pages use<refentrytitle>). -
DocBook section tags are converted to Markdown headings:
<chapter>,<appendix>,<article>,<book>→#(level 1)<sect1>,<refsect1>,<refsynopsisdiv>,<section>→##(level 2)<sect2>,<refsect2>→###(level 3)<sect3>,<refsect3>→####(level 4)<sect4>→#####(level 5)<sect5>→######(level 6)
-
Inline code elements are converted to backticks:
<literal>,<command>,<filename>,<function>,<type>,<varname>,<option>,<parameter>,<constant>,<replaceable> <programlisting>and<screen>converted to fenced code blocks<emphasis>converted to italic (*text*)- Lists (
<itemizedlist>,<orderedlist>) are converted to Markdown lists. - Links (
<ulink>) are converted to Markdown link format. - Cross-references (
<xref>) are converted to inline code with the linkend. - HTML entities are automatically decoded.
- Comments and DOCTYPE declarations are stripped.
Example
Input DocBook:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook V4.2//EN">
<book>
<title>PostgreSQL Guide</title>
<chapter>
<title>Getting Started</title>
<para>Use the <command>psql</command> command to connect.</para>
<sect1>
<title>Installation</title>
<para>Download from <ulink url="https://postgresql.org">the website</ulink>.</para>
</sect1>
</chapter>
</book>
Extracted title: PostgreSQL Guide
Converted Markdown:
# PostgreSQL Guide
# Getting Started
Use the `psql` command to connect.
## Installation
Download from [the website](https://postgresql.org).
PostgreSQL Reference Pages
The converter includes special handling for PostgreSQL-style reference pages using <refentry>:
<refentry>
<refmeta><refentrytitle>SELECT</refentrytitle></refmeta>
<refnamediv>
<refname>SELECT</refname>
<refpurpose>retrieve rows from a table</refpurpose>
</refnamediv>
<refsect1>
<title>Description</title>
<para>SELECT retrieves rows from tables.</para>
</refsect1>
</refentry>
This converts to:
# SELECT
## SELECT
retrieve rows from a table
## Description
SELECT retrieves rows from tables.
Limitations
- Not all DocBook elements are fully supported
- Complex nested structures may not convert perfectly
- Only basic conversion is performed for most elements
- Tables and complex formatting may require manual adjustment