By running ocrdjvu on the djvu files I've managed to make searchable versions of some of the ICL manuals.
To run OCR on a .djvu file the following complex command can be used:
ocrodjvu --save-script xxx.djvused xxx.djvu(Producing a djvused script that can be used to include the OCR text in the .djvu file. Alternatively used the --in-place option to modify the .djvu file directlty.)
Further information can be extracted from the OCRed text, notably the document outline (making it easy to jump to specific pages) and a clickable contents list.
The Debian versions of ocropus and tesseract are slightly out-of-date and they work very badly on 64bit systems. Debian bug #590672. The results on 32bit systems are acceptable.
The old version of ocropus has some problems recognising text that touches the edge of the page. Debian bug #575484. The Debian bugreport contains a couple of minor patches for ocropus/ocrodjvu that work around this bug.
ocropus seems to have problems recognising the large bold words which many manual pages start:
Branch on Double Indexing
...
This makes automatic generation of bookmarks something of a pig.
The gnome document viewing program,
evince knows how to select
text from an OCRed djvu file, but doesn't show the selection area
on the screen: Gnome bug
Bug 448739 - Evince cannot select text in djvu documents.
Evince doesn't show the document outline in the "Index" sidebar:
Bug 592806 - empty index for .djvu file
.
Results
The raw OCR output for the files I've done can be found in
OCR. These are formatted as "djvused" scripts, which are
used with the djvused command from the
djvulibre package to include
the text in the base djvu files:
$ djvused -f xxx.djvused -s xxx.djvu