docBookAbout six years ago, my company got a new gig to create an HTML version of an API specification using DocBook. I was new to DocBook, but not to XML… how hard could it be? Not so hard, if you have an entire weekend to give up to get it done!

What is DocBook?

DocBook is an XML schema and a complex set of XSLT stylesheets which allow you to write your content in XML, then depending on your output methods, can output that content in HTML, PDF, and likely many other ways. If you’d like more detail, head over to DocBook.org, the folks who maintain DocBook.

DocBook is one of a handful of tools that allow you enter your data once, then output i a variety of ways without too much of a headache. Theoretically, once you get everything all set up, you can execute a command to get a lovely PDF, or execute another command to get a navigable HTML-version of your document. Shiny, right?

To work in DocBook, your first step is to set up the toolchain. For those new to technical communications, a toolchain is the set of programs, utilities, compilers, etc. that work together to provide you with what you need to get the job done. It’s a chain of tools — get it? I found the process much easier said than done, and so decided to carefully document each step along the journey.

Installing the toolchain

I turned to Bob Stayton’s book “DocBook XSL: The Complete Guide” to bone up on the intricacies of XSL/DocBook, specifically to guide me through setting up an XML/DocBook development environment (toolchain). Although the book helped me feel much more confident about my looming DocBook project, the author forgot that he is ensconced in a rarified world where people intuitively grok the difference between XML, XSL, and XSLT and how all this fits in with DTDs or schemas. And, oh yes, a world where everyone is comfortable using a command-line interface (CLI — yet another TLA).

Sidenote: I believe nowadays, the book DocBook 5: The Definitive Guide would be where I would turn first.

As a first step in my toolchain project, I went to the root of my hard drive and created a folder called xmldev. (I was using Windows 7 at the time, BTW.) I followed the directions in Stayton’s book and took copious notes along the way, downloading and installing one component at a time. Here are the steps I followed down the path toward toolchain gold.

1. Install the DocBook DTD

The URL http://www.oasis-open.org/docbook/xml takes you to a file listing which isn’t all that easy to figure out. I mean, sure — click on the latest version — but which files do you download? I instead went to the docbook.org site, learned that the current version was now 5.0, and was taken to a page with the files (http://www.docbook.org/xml/5.0/), including a zip that seemed like “the DTD.” There was also a nice list that told me about the files in the zip. I extracted the zip into c:/xmldev/docbook. Done!

2a. Install XSLT Processor (part 1): Saxon

I confess to being a GUI-dependent computer user. Yes, back in the days before Windows I was perfectly happy and actually even quite adept at getting around at the C:> prompt. But after so many years as a Windows user, I barely know how to deal with cli systems any more. Stayton described several reliable XSLT processors in his book, and I chose Saxon –somehow concluding that it would be forgiving to a GUI-gal like myself.
I went to http://sourceforge.net/, typed “saxon” in the search box, and was taken to a page with a nice, big Download button. Ahh — sweet, GUI comfort! I extracted the zip into c:/xmldev/saxon.

Here’s where things got sticky. Stayton’s book listed some JAR (Java archive) files that I needed to locate, and of course, they weren’t there. First of all, the actual names of the jar files in the current version of Saxon differ from what is in the book, but also there are some files Stayton referenced that just weren’t there, even taking the new naming conventions into consideration. However Stayton indicates in his book that the files I wasn’t  finding weren’t necessarily *required*, so after searching for them for a while, I finally left well enough alone.

Next Stayton says to locate the DocBook extension for Saxon, which again, didn’t exist anywhere. After a lot of googling and searching and Wiki-ing I finally found a place where I could download a file which, while it had a slightly different name, I figured was probably the correct file.

Next I needed to update my CLASSPATH to tell Java where to find the JAR files. Stayton’s directions on how to update the CLASSPATH were direct and simple to follow, and made me feel like a proud übernerd.

With my shiny new cocky attitude I read the next line in Stayton’s directions – to execute the command: java com.icl.saxon.StyleSheet -t. Yeah — just execute that command. No more GUI for me. I assumed this meant I needed to use Run/cmd to open a DOS box to run the command, but this resulted in the error message that “‘Java’ is not recognized as an internal or external command.” Back in Windows Explorer I click around to find where Java is installed (in “Program files (x86)\java\jre6\bin” — but don’t assume it’s in the same place on your computer!), then went back to the DOS box, cd to that folder and try again. The result? An error that basically seemed to result from the lack of the necessary stylesheet. This time, no amount of googling, searching, or wiki-ing could tell me what to do to try to fix it or even how big a problem it was that the error existed in the first place. So I decided to uninstall Saxon and try another processor.

All of the above “Install XSLT Processor” stuff (admittedly, including taking notes for this missive), burned almost 2 hours, the results of which were about to be deleted.

What I learned from this exercise is that Java is a mysterious chimera that lurks in the nether-region between Windows and the silicon, flitting about and avoiding direct access by its human users.

2b. Install XSLT Processor (part 2): xsltproc

The other two choices in Stayton’s primary list are Xalan and xsltproc. Nothing in the descriptions made one look any better or worse than the other, so I decided on xsltproc, simply because I recognized the author’s name as someone I’ve seen here and there in XML discussion forums. But after following links and exploring and redirecting and poking here and there, I simply could not find a straightforward place to go download it.

2c. Install XSLT Processor (part 2 (again)): Xalan

Did I say I selected xsltproc? I meant to say that I selected Xalan. It was easy — I googled for Xalan, followed the link for the C version (because I’d had enough of Java for one day thank you very much), and downloaded the two binary files I needed: one for Xalan, and one for Xerces, the XML parser. I extracted Xalan into c:/xmldev/xalan, and I extracted Xerces into c:/xmldev/xerces.

Next I had to dink with the path, much as I edited the CROSSPATH above. The directions on the Apache site (where I got the Xalan and Xerces downloads) looked Greek on the first and second readings, but really they worked fine. To test that everything was installed, I opened a DOS box (Start menu-> Run-> cmd) and entered the following command:

xalan -?

It worked! It found Xalan and listed all the command line options. Nice! Now, Stayton’s book gives some extra steps for setting things up if you install the Java version, but since I went the C route I’m not sure what I should do. I’ll just slip on these here blinders and hope it doesn’t matter!

3a. Install XSL-FO Processor (part 1): FOP

My project involves outputting HTML files as well as PDFs, so I know that I will also need an FO Processor. I wanted to use FOP because that is what my client used before, but after a fairly straightforward download, an easy install (extracted into c:\xmldev\fop\), and updating CLASSPATH and PATH accordingly — it wouldn’t work. It’s down to the same Java demon I encountered before. Yes, it is probably operator error, but without help, I have no clue how to fix it. So on to the only C-version of a free FO Processor in the list: xmlroff.

[Correction: this project was redefined to only output the doc in HTML… which made me grumble and wonder why use Docbook at all. I still wonder that.]

3b. Install XSL-FO Processor (part 2): xmlroff

Sadly, it turns out that xmlroff, while available as a C app, is only available in the uncompiled source, which means it might as well be written in a Swahili-Sanskript patois But after spending nearly 4 hours of my Saturday on this project, I’m not about to call it quits!

4. Reinstall Java and re-test.

It looks like the only way I’m going to get a full functioning toolchain is to take a step back and try to figure out why I can’t get these Java-based apps to run. So I went to http://java.com/en/download/manual.jsp where Sun provides a great link to test your installed version. When I ran the test, it didn’t pass. I downloaded the latest version, reran the test to make sure it was installed and running, and — success! Java runs!

After re-running the above test for FOP, I get the error “Java is not recognized as an internal or external command.” Google to rescue again: I found a support forum where someone suggested copying the Java folder from the “Program Files (x86)” folder into the “Program Files “ folder, then putting the new location into the Path environment variable. Once I did that, the FOP test ran fine.

5. Mission Accomplished.

By now I’ve burned about ten hours on this toolchain, but in the end, due more to stubbornness and dumb-luck than skill, I have a full toolchain.

Here’s what I have:

1. DTD: DocBook
2. XSLT Processor: Xalan (to create HTML and FO files)
3. FO Processor: FOP (to create PDFs)

And here’s the finished output (that has been updated numerous times by myself and others since this post was originally written): Khronos: OpenCL Reference Pages. (Important note: The PDF of the OpenCL specification was not created using Docbook)

The moral of the story

I wrote the text for this post something like 6 years ago and am only now sharing it. Since then I have become a Mac user, and have played around with other tools. Would it be easier now to install a Docbook toolchain? Perhaps. I really don’t know. If anybody out there has helpful tidbits I should add to make this more useful (or up to date), I’d love to see it and I’ll add it here.

 
Set your Twitter account name in your settings to use the TwitterBar Section.