3. DocBook - What's That?

3.1. Some IPCop History

When the IPCop documentation project was started, the first thing the original authors faced was which tool to use to create the documentation.

One obvious suggestion was Microsoft Word. Unfortunately, some of the developers did not own a copy of Word. In addition, several were opposed to having an Open Source project dependent on a relatively expensive program product. Finally, there was always the problem of which version of Word to use.

The initial solution seized upon was to use TWiki, a web collaboration platform. With TWiki, anyone can update manuals, etc. Unfortunately, TWiki has a few drawbacks. First, without lots of work, it produces one giant web page per manual. This is not much of a problem for those with high-speed connections to the Internet, but for those less fortunate, just looking up a question in an online manual can take a very long time. It is also relatively difficult to edit large manuals using TWiki's built in editor.

The decision was reached to move most of the documentation off TWiki, except for the FAQ. It will benefit from the way modifications and additions can quickly and universally be made using TWiki.

Another problem is generating offline manuals. During installation and sometimes during administration tasks, online manuals are not available. The obvious solution was to provide these manuals in PDF format. Unfortunately, TWiki doesn't provide a way to convert the data to PDF. In the end, Microsoft Word was used to create the PDF files for release 0.1.

3.2. DocBook

The most popular method of writing documentation for open source projects is DocBook. DocBook will convert a source file into HTML, PDF, PS and text, among others. Initially, DocBook source files were written in SGML format. More recently, XML format has been used. SGML is being abandoned.

3.2.1. DocBook - Pro's

As far as the IPCop project is concerned, there are several advantages to DocBook.

  • All necessary output formats can be generated from the same file.

  • The source can easily be checked into and maintained via CVS.

  • DocBook is open source, as are the tools to create output from it.

  • The DocBook tool chain runs on most common operating systems. Much of the tool set is written in Java.

  • DocBook XML uses XML's UTF-8 character set. This makes national language support, translations, a lot easier.

  • Once you have a syntactically correct document in DocBook XML, every one of DocBook's many output formats will be generated with no errors or surprises.

3.2.2. DocBook - Con's

  • Few folks are familiar with DocBook. There is a steep learning curve.

  • The DocBook tool chain may not be installed on your computer.

    Most Linux distributions come with DocBook available for installation, although you may have to specify a documentation feature to get it. It is difficult to put together a working tool chain on your own.

    Recently, E-Novative has put together a DocBook distribution for Windows systems that is extremely easy to install.

  • DocBook is structure or function oriented, not format oriented. An author has to identify portions of the document as sections, appendices, preface, etc. It has many software oriented elements like guimenu, guimenuitem, guilabel, and keycap. Other elements include author, firstname and surname elements. DocBook's designers hope that database engines will eventually be able to provide detailed knowledge of DocBook documentation using the structural elements.

    This makes it difficult for programs to convert other formats of data such as RTF or HTML files to DocBook. Even if these files were generated from DocBook originally, too much data has been lost.

  • Because of the above problem, there is no WYSIWYG editor for DocBook.

    Most folks use XML editors to create and maintain their DocBook files. James Brice has found two. The freeware XML Cooktop editor for Windows has been used to generate HTML. Another editor, TextPad, is not freeware, but demoware. It even has a DocBook plug-in.

3.2.3. DocBook Files

A DocBook file is a text file. It usually has a file name ending in .xml or .sgml.

Think of the DocBook source as similar to C source. The DocBook tool chain can convert source to various output formats.

DocBook uses the facilities of XML, but its DTDs and style sheets control the output transformation process. The DocBook DTD must not be changed, or the language is no longer DocBook. The XML style sheets, XSLs can be changed by brave souls. We are using the Linux Documentation Project's XSLs for HTML.

For XML files, the first line consists of the usual XML declaration line:

<?xml version="1.0" encoding="UTF-8"?>
This is followed by a document type declaration, as follows:
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
    "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd" [
    <!ENTITY imagepath "../images/install/" >
    <!ENTITY fdlfile SYSTEM "./fdlappendix.xml" >
    ]>
This declaration states that the first element in this file will be a <book>. It is possible to specify many other elements as the first element, including part, article, chapter, etc. Next, it specifies the public identifier for the document type definition, DTD, is:
"-//OASIS//DTD DocBook XML V4.2//EN"
The next line specifies the location of the DTD. This is a URI, and can be a URL or a file: location. The two lines that start with <!ENTITY define entity declarations similar to #define statements in C. They may be referenced by &imagepath; and &fdlfile; elsewhere in the document. In the case of this document they represent the directory path to the graphic images used in the document, (&imagepath;) and the file containing the GNU Free Documentation License, (&fdlfile;). After being defined in the document type declaration, these “entities” can be included anywhere in the document source. When they are encountered, they will be expanded or cause the file in question to be included.

In addition to any entities you define, DocBook has many predefined entities. Most of these should be familiar to anyone that has looked at HTML or XML. These include < for the < sign and   for a non-breaking space.

The rest of the document consists of elements, comments and document text. Elements all start with the <, less than sign and end with the >, greater than sign. Every element needs to be terminated with </element>. For example,

<para>
    This is a paragraph.
    It contains a couple of sentences.
</para>
There are about 400 of these elements. Don't worry, though. You will only need to use a few of these. Look at some existing documentation to get examples.

Some of the elements will contain more information, called attributes. For example the <filename> element has an optional attribute of class, which can be "directory", "headerfile" or "symlink". So a DocBook file might contain:

<filename class="directory">
    /var/ipcop/
</filename>

Comments only appear in XML files, never in the output. They start with “<!--” characters and end with “-->” characters. Comments can span lines. This makes it easy to comment out an area of a document by inserting a “<!--” line before the desired area and inserting a “-->” line after it.

If you are still confused, look at the XML input file for this Guide, IPCopDoc/Authors-Guide.xml, in the IPCop CVS repository. Or use one of the many tutorials available on the net. A few links are mentioned below.