Go forward to Description.
Go backward to Motivation.
Go up to Top.

Static search structures and GNU `gperf'
****************************************

   A "static search structure" is an Abstract Data Type with certain
fundamental operations, *e.g.*, *initialize*, *insert*, and *retrieve*.
Conceptually, all insertions occur before any retrievals.(1) It is a
useful data structure for representing *static search sets*.  Static
search sets occur frequently in software system applications.  Typical
static search sets include compiler reserved words, assembler
instruction opcodes, and built-in shell interpreter commands.  Search
set members, called "keywords", are inserted into the structure only
once, usually during program initialization, and are not generally
modified at run-time.

   Numerous static search structure implementations exist, *e.g.*,
arrays, linked lists, binary search trees, digital search tries, and
hash tables.  Different approaches offer trade-offs between space
utilization and search time efficiency.  For example, an $n$ element
sorted array is space efficient, though the average-case time
complexity for retrieval operations using binary search is proportional
to $\log n$.  Conversely, hash table implementations often locate a
table entry in constant time, but typically impose additional memory
overhead and exhibit poor worst case performance `aho, etc.'.

   *Minimal perfect hash functions* provide an optimal solution for a
particular class of static search sets.  A minimal perfect hash
function is defined by two properties:

   * It allows keyword recognition in a static search set using at most
     *one* probe into the hash table.  This represents the "perfect"
     property.

   * The actual memory allocated to store the keywords is precisely
     large enough for the keyword set, and *no larger*.  This is the
     "minimal" property.

   For most applications it is far easier to generate *perfect* hash
functions than *minimal perfect* hash functions `many bozos'.
Moreover, non-minimal perfect hash functions frequently execute faster
than minimal ones in practice `cichelli'.  This phenomena occurs since
searching a sparse keyword table increases the probability of locating
a "null" entry, thereby reducing string comparisons.  `gperf''s default
behavior generates *near-minimal* perfect hash functions for keyword
sets.  However, `gperf' provides many options that permit user control
over the degree of minimality and perfection.

   Static search sets often exhibit relative stability over time.  For
example, Ada's 63 reserved words have remained constant for nearly a
decade.  It is therefore frequently worthwhile to expend concerted
effort building an optimal search structure *once*, if it subsequently
receives heavy use multiple times.  `gperf' removes the drudgery
associated with constructing time- and space-efficient search
structures by hand.  It has proven a useful and practical tool for
serious programming projects.  Output from `gperf' is currently used in
several production and research compilers, including GNU C, GNU C++,
GNU Pascal, and GNU Modula 3.  (2) Each compiler utilizes `gperf' to
automatically generate static search structures that efficiently
identify their respective reserved keywords.

   ---------- Footnotes ----------

   (1)  In practice, `gperf' generates a `static' array containing
search set keywords and any associated attributes specified by the
user.  Thus, there is essentially no execution-time cost for the
insertions.

   (2)  The latter two compilers are not yet part of the official GNU
distribution.