Chapter 3. DocBook Indexing Guidelines

These guidelines are specific to indexing in DocBook. Note that O’Reilly books are generally indexed by professionals during Production, and this document is what we usually give to vendors. We’ve stored it here for cases where authors are adding their own indexterms, if first approved by Editorial and Production.

Tools and Validating Markup

You can choose the tools you use to enter the index tags, but you must use an XML editor of some kind (XMLmind, oXygen, etc.). Do NOT attempt to index in MS Word because Word mangles DocBook (despite claims to the contrary). We have a simple indexing macro for XMLmind that makes the indexing process easier. See the XMLmind Editor Tutorial for info on getting the macro.

Whatever method you decide to use to index the book, the XML files that you return to us must be well-formed and valid DocBook 4.5. Most XML editors will check validity for you, and there are several other tools (e.g., xmllint) that can do so as well. If you are unsure about your tools, please ask .

  • If you are working on a DocBook book in Atlas, you’ll need to use one of the validity tools mentioned above. Atlas does not check validity for you.
  • If you are working on a DocBook book using the older Subversion toolchain, you can check validity by using the SVN commit hook (orm:commitpdf). This method uses xmllint, and you can see the log that xmllint generates when you try to create a PDF in the repo (pdf/.buildlog). Please see the DocBook Authoring Guidelines more info on using the SVN commit hook.

Proper Markup

Avoid inserting <indexterm>s inside elements such as <literal>, <emphasis>, <title>, <sectN>. Generally, they should be at the end of <para>s, before the final period.

  • NEVER, EVER! insert <indexterm>s after a closing </para> and before an opening <para>. They should almost always be inserted at the end of <para>s (before the closing tag).
  • Do not insert <indexterm>s inside <title> or <sectN> tags.
  • Do not create a new <para> to insert an <indexterm>. This goes for endofrange terms as well.
  • Do not insert <indexterm>s inside <screen>, <programlisting>, <table>, <figure>, etc. Insert them at the end of the preceding <para> instead.
  • However, if you encounter a situation where there are no <para>s to insert an <indexterm>, you can insert the <indexterm> elsewhere, such as directly after the last line of a code block (but not inside the code block) or within a <para> in a <table>.

Here are some examples of what to do and what not to do.

Do this:

Use <literal>SELECT</literal><indexterm><primary>SELECT statements</primary></indexterm> statements to show...

Not this:

Use <literal><indexterm><primary>SELECT statements</primary></indexterm>SELECT</literal> statements to show...

Do this:

<para><literal>NULL</literal> values also behave
specially with respect to sorting and
summary operations<indexterm class="endofrange"
startref="ch03_nullvalues"></indexterm>.</para>

Not this:

<para><literal>NULL</literal> values also behave
specially with respect to sorting and
summary operations.</para>

<para>
<indexterm class="endofrange" startref="ch03_nullvalues"></indexterm>
</para>

Indexing Syntax

Basic index entry:

<indexterm><primary>index entry syntax, level 1</primary></indexterm>

Secondary entry (subentry):

 <indexterm>
    <primary>index entry syntax</primary>
    <secondary>for a subentry</secondary>
 </indexterm>

Tertiary entry (sub-subentry):

 <indexterm>
    <primary>index entry syntax</primary>
    <secondary>for a subentry</secondary>
    <tertiary>with a subentry</tertiary>
 </indexterm>

An index entry with a range:

This book is full of geeky text with DocBook XML markup, which starts here:
<indexterm class="startofrange" id="geekytext"><primary>geeky DocBook XML text</primary></indexterm>blah blah blah Ajax
blah blah blah Ruby on Rails
blah blah blah spreading the knowledge of innovators
...
...
and ends here<indexterm class="endofrange" startref="geekytext">

The endofrange entry does not contain a <primary> or <secondary> tag. It only has a startref attribute that references the startofrange entry. Do not place the endofrange entry on its own line or the processor will add excess whitespace in the PDF.

An index entry with a "(see)" and no page reference:

 <indexterm>
   <primary>geeky DocBook XML text</primary>
   <see>even more geeky DocBook XML text</see>
 </indexterm>

Or, with a subentry:

 <indexterm>
   <primary>DocBook XML text</primary>
   <secondary>geeky</secondary>
   <see>even more geeky DocBook XML text</see>
 </indexterm>

Changing how an entry is alphabetized:

  <indexterm>
    <primary sortas="elite">l33t</primary>
  </indexterm>

A "(see also)" entry:

  <indexterm>
    <primary>foo</primary>
    <seealso>bar</seealso>
  </indexterm>