packages icon



 QDBM(3)                          Man Page                           QDBM(3)
 Quick Database Manager                               Quick Database Manager

                                 2004-04-22



 NAME
      QDBM - quick database manager


 OVERVIEW
      QDBM is a library of routines for managing a database.  The database
      is a simple data file containing records, each is a pair of a key and
      a value.  Every key and value is serial bytes with variable length.
      Both binary data and character string can be used as a key and a
      value.  There is neither concept of data tables nor data types.
      Records are organized in hash table or B+ tree.

      As for database of hash table, each key must be unique within a
      database, so it is impossible to store two or more records with a key
      overlaps.  The following access methods are provided to the database:
      storing a record with a key and a value, deleting a record by a key,
      retrieving a record by a key.  Moreover, traversal access to every key
      are provided, although the order is arbitrary.  These access methods
      are similar to ones of DBM (or its followers: NDBM and GDBM) library
      defined in the UNIX standard.  QDBM is an alternative for DBM because
      of its higher performance.

      As for database of B+ tree, records whose keys are duplicated can be
      stored.  Access methods of storing, deleting, and retrieving are
      provided as with the database of hash table.  Records are stored in
      order by a comparing function assigned by a user.  It is possible to
      access each record with the cursor in ascending or descending order.
      According to this mechanism, forward matching search for strings and
      range search for integers are realized.  Moreover, transaction is
      available in database of B+ tree.


 EFFECTIVE IMPLEMENTATION OF HASH DATABASE
      QDBM is developed referring to GDBM for the purpose of the following
      three points: higher processing speed, smaller size of a database
      file, and simpler API.  They have been achieved.  Moreover, the
      following three restrictions of traditional DBM: a process can handle
      only one database, the size of a key and a value is bounded, a
      database file is sparse, are cleared.

      QDBM uses hash algorithm to retrieve records.  If a bucket array has
      sufficient number of elements, the time complexity of retrieval is
      `O(1)'.  That is, time required for retrieving a record is constant,
      regardless of the scale of a database.  It is also the same about
      storing and deleting.  Collision of hash values is managed by separate
      chaining.  Data structure of the chains is binary search tree.  Even
      if a bucket array has unusually scarce elements, the time complexity
      of retrieval is `O(log n)'.




                                    - 1 -      Formatted:  November 14, 2024






 QDBM(3)                          Man Page                           QDBM(3)
 Quick Database Manager                               Quick Database Manager

                                 2004-04-22



      QDBM attains improvement in retrieval by loading RAM with the whole of
      a bucket array.  If a bucket array is on RAM, it is possible to access
      a region of a target record by about one path of file operations.  A
      bucket array saved in a file is not read into RAM with the `read' call
      but directly mapped to RAM with the `mmap' call.  Therefore,
      preparation time on connecting to a database is very short, and two or
      more processes can share the same memory map.

      If the number of elements of a bucket array is about half of records
      stored within a database, although it depends on characteristic of the
      input, the probability of collision of hash values is about 56.7%
      (36.8% if the same, 21.3% if twice, 11.5% if four times, 6.0% if eight
      times).  In such case, it is possible to retrieve a record by two or
      less paths of file operations.  If it is made into a performance
      index, in order to handle a database containing one million of
      records, a bucket array with half a million of elements is needed.
      The size of each element is 4 bytes.  That is, if 2M bytes of RAM is
      available, a database containing one million records can be handled.

      QDBM provides two modes to connect to a database: `reader' and
      `writer'.  A reader can perform retrieving but neither storing nor
      deleting.  A writer can perform all access methods.  Exclusion control
      between processes is performed when connecting to a database by file
      locking.  While a writer is connected to a database, neither readers
      nor writers can be connected.  While a reader is connected to a
      database, other readers can be connect, but writers can not.
      According to this mechanism, data consistency is guaranteed with
      simultaneous connections in multitasking environment.

      Traditional DBM provides two modes of the storing operations: `insert'
      and `replace'.  In the case a key overlaps an existing record, the
      insert mode keeps the existing value, while the replace mode
      transposes it to the specified value.  In addition to the two modes,
      QDBM provides `concatenate' mode.  In the mode, the specified value is
      concatenated at the end of the existing value and stored.  This
      feature is useful when adding a element to a value as an array.
      Moreover, although DBM has a method to fetch out a value from a
      database only by reading the whole of a region of a record, QDBM has a
      method to fetch out a part of a region of a value.  When a value is
      treated as an array, this feature is also useful.

      Generally speaking, while succession of updating, fragmentation of
      available regions occurs, and the size of a database grows rapidly.
      QDBM deal with this problem by coalescence of dispensable regions and
      reuse of them, and featuring of optimization of a database.  When
      overwriting a record with a value whose size is greater than the
      existing one, it is necessary to remove the region to another position
      of the file.  Because the time complexity of the operation depends on
      the size of the region of a record, extending values successively is



                                    - 2 -      Formatted:  November 14, 2024






 QDBM(3)                          Man Page                           QDBM(3)
 Quick Database Manager                               Quick Database Manager

                                 2004-04-22



      inefficient.  However, QDBM deal with this problem by alignment.  If
      increment can be put in padding, it is not necessary to remove the
      region.

      As for many file systems, it is impossible to handle a file whose size
      is more than 2GB.  To deal with this problem, QDBM provides a
      directory database containing multiple database files.  Due to this
      feature, it is possible to handle a database whose total size is up to
      1TB in theory.  Moreover, because database files can be deployed on
      multiple disks, the speed of updating operations can be improved as
      with RAID-0 (striping).  It is also possible for the database files to
      deploy on multiple file servers using NFS and so on.


 USEFUL IMPLEMENTATION OF B+ TREE DATABASE
      Although B+ tree database is slower than hash database, it features
      ordering access to each record.  The order can be assigned by users.
      Records of B+ tree are sorted and arranged in logical pages.  Sparse
      index organized in B tree that is multiway balanced tree are
      maintained for each page.  Thus, the time complexity of retrieval and
      so on is `O(log n)'.  Cursor is provided to access each record in
      order.  The cursor can jump to a position specified by a key and can
      step forward or backward from the current position.  Because each page
      is arranged as double linked list, the time complexity of stepping
      cursor is `O(1)'.

      B+ tree database is implemented, based on above hash database.
      Because each page of B+ tree is stored as each record of hash
      database, B+ tree database inherits efficiency of storage management
      of hash database.  Because the header of each record is smaller and
      alignment of each page is calculated statistically, in most cases, the
      size of database file is cut by half compared to one of hash database.
      Although operation of many pages are required to update B+ tree, QDBM
      expedites the process by caching pages and reducing file operations.
      In most cases, because whole of the sparse index is cached on memory,
      it is possible to retrieve a record by one or less path of file
      operations.

      B+ tree database features transaction mechanism.  It is possible to
      commit a series of operations between the beginning and the end of the
      transaction in a lump, or to abort the transaction and perform
      rollback to the state before the transaction.  Even if the process of
      an application is crushed while the transaction, the database file is
      not broken.

      In case that QDBM is built with ZLIB, LZO, or BZIP2 enabled, a
      lossless data-compression library, the content of each page of B+ tree
      is compressed and stored in a file.  Because each record in a page has
      similar patterns, high efficiency of compression is expected due to



                                    - 3 -      Formatted:  November 14, 2024






 QDBM(3)                          Man Page                           QDBM(3)
 Quick Database Manager                               Quick Database Manager

                                 2004-04-22



      the Lempel-Ziv algorithm and the like.  In case handling text data,
      the size of a database is reduced to about 25%.  If the scale of a
      database is large and disk I/O is the bottleneck, featuring
      compression makes the processing speed improved to a large extent.


 SIMPLE BUT VARIOUS INTERFACES
      QDBM provides very simple APIs.  You can perform database I/O as usual
      file I/O with `FILE' pointer defined in ANSI C.  In the basic API of
      QDBM, entity of a database is recorded as one file.  In the extended
      API, entity of a database is recorded as several files in one
      directory.  Because the two APIs are very similar with each other,
      porting an application from one to the other is easy.

      APIs which are compatible with NDBM and GDBM are also provided.  As
      there are a lot of applications using NDBM or GDBM, it is easy to port
      them onto QDBM.  In most cases, it is completed only by replacement of
      header including (#include) and re-compiling.  However, QDBM can not
      handle database files made by the original NDBM or GDBM.

      In order to handle records on memory easily, the utility API is
      provided.  It implements memory allocating functions, sorting
      functions, extensible datum, array list, hash map, and so on.  Using
      them, you can handle records in C language cheaply as in such script
      languages as Perl or Ruby.

      B+ tree database is used with the advanced API.  The advanced API is
      implemented using the basic API and the utility API.  Because the
      advanced API is also similar to the basic API and the extended API, it
      is easy to learn how to use it.

      In order to handle an inverted index which is used by full-text search
      systems, the inverted API is provided.  If it is easy to handle an
      inverted index of documents, an application can focus on text
      processing and natural language processing.  Because this API does not
      depend on character codes nor languages, it is possible to implement a
      full-text search system which can respond to various requests from
      users.

      Along with APIs for C, QDBM provides APIs for C++, Java, Perl, and
      Ruby.  APIs for C are composed of seven kinds: the basic API, the
      extended API, the NDBM-compatible API, the GDBM-compatible API, the
      utility API, the advanced API, and the inverted API.  Command line
      interfaces corresponding to each API are also provided.  They are
      useful for prototyping, testing, debugging, and so on.  The C++ API
      encapsulates database handling functions of the basic API, the
      extended API, and the advanced API with class mechanism of C++.  The
      Java API has native methods calling the basic API, the extended API,
      and the advanced API with Java Native Interface.  The Perl API has



                                    - 4 -      Formatted:  November 14, 2024






 QDBM(3)                          Man Page                           QDBM(3)
 Quick Database Manager                               Quick Database Manager

                                 2004-04-22



      methods calling the basic API, the extended API, and the advanced API
      with XS language.  The Ruby API has method calling the basic API, the
      extended API, and the advanced API as modules of Ruby.  Moreover, CGI
      scripts for administration of databases and full-text search are
      provided.


 WIDE PORTABILITY
      QDBM is implemented being based on syntax of ANSI C (C89) and using
      only APIs defined in ANSI C or POSIX.  Thus, QDBM works on most UNIX
      and its compatible OSs.  As for C API, checking operations have been
      done at least on Linux 2.2, Linux 2.4, FreeBSD 4.8, FreeBSD 5.0, SunOS
      5.7, SunOS 5.8, SunOS 5.9, HP-UX 11.00, Cygwin 1.3.10, Mac OS X 10.2,
      and RISC OS 5.03.  Although a database file created by QDBM depends on
      byte order of the processor, to do with it, utilities to dump data in
      format which is independent to byte orders are provided.


 BUILDING
      For building a program using QDBM, the program should be linked with a
      library file `libqdbm.a' or `libqdbm.so'.  For example, the following
      command is executed to build `sample' from `sample.c'.

           gcc -I/usr/local/include -o sample sample.c -L/usr/local/lib


 AUTHOR
      QDBM is written by Mikio Hirabayashi.  You can contact the author by
      e-mail to <mikio@users.sourceforge.net>.  Any suggestion or bug report
      is welcome to the author.


 COPYRIGHT
      Copyright(c) 2000-2003 Mikio Hirabayashi

      QDBM is free software; you can redistribute it and/or modify it under
      the terms of the GNU Lesser General Public License as published by the
      Free Software Foundation; either version 2 of the License, or any
      later version.

      QDBM is distributed in the hope that it will be useful, but WITHOUT
      ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
      FITNESS FOR A PARTICULAR PURPOSE.  See the GNU Lesser General Public
      License for more details.

      You should have received a copy of the GNU Lesser General Public
      License along with QDBM; if not, write to the Free Software
      Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307
      USA.



                                    - 5 -      Formatted:  November 14, 2024






 QDBM(3)                          Man Page                           QDBM(3)
 Quick Database Manager                               Quick Database Manager

                                 2004-04-22



 SEE ALSO
      depot(3), curia(3), relic(3), hovel(3), cabin(3), villa(3), odeum(3),
      ndbm(3), gdbm(3)

















































                                    - 6 -      Formatted:  November 14, 2024