Saxon-B index extension


Saxon-B is an open-source XQuery processor that is, unlike its closed-source equivalent Saxon-SA, not heavily optimized. When dealing with larger data sets, this results in long query times, especially due to the implicit sequence scanning needed for simple XPath expressions like $document/node[@id = 5]. In XSLT such problems are dealt with by allowing the user to define indexes on node sets using the <xsl:key> element. Later, nodes can be looked up quickly using the key() function.
Unfortunately, in XQuery no equivalent construction is defined. The commercial version of Saxon does however provide an XQuery extension allowing the user to do essentially the same. It consists of two functions, saxon:index() and saxon:find(), that respectively create an index and use it for fast lookup.
We implemented a similar extension for Saxon-B. This extension consists of an implementation of the IndexedSequence class, using a hash table and the index and find functions. The implementation is compatible with Saxon-SA as far as it is documented, thus it can be used as in the following example.
 declare namespace saxon="";
declare namespace java="";
declare variable $indexedNodes as java:net.sf.saxon.extra.IndexedSequence
:= saxon:index($document/node, saxon:expression("@id")); (: index nodes by id :)

(: ... :)
let $node := saxon:find($indexedNodes, 5) (: find node with id 5 :)
(: ... :)


Some complex queries on OpenStreetMap XML data nicely show the performance gain from using indexes with the extension.

Query:Without indexWith indexFile size
Extract motorway junctions in the Netherlands1465 sec.123 sec.12 MB
Find dual carriageways around Eindhoven76 sec.6 sec.880 kB


Patch for the current svn version of Saxon-B (verified with r233 and r263).

Further information

Written by in joint work with , 2008. The program is available under the terms of the Mozilla Public License Version 1.0 (same as Saxon-B).
I do not plan to actively maintain this extension, but questions are welcome (freek -at- vanwal -dot- nl).