ACM SIGMOD Anthology VLDB dblp.uni-trier.de

Dictionary-Based Order-Preserving String Compression.

Gennady Antoshenkov: Dictionary-Based Order-Preserving String Compression. VLDB J. 6(1): 26-39(1997)
@article{DBLP:journals/vldb/Antoshenkov97,
  author    = {Gennady Antoshenkov},
  title     = {Dictionary-Based Order-Preserving String Compression},
  journal   = {VLDB J.},
  volume    = {6},
  number    = {1},
  year      = {1997},
  pages     = {26-39},
  ee        = {db/journals/vldb/Antoshenkov97.html},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}

Abstract

As no database exists without indexes, no index implementation exists without order-preserving key compression, in particular, without prefix and tail compression. However, despite the great potentials of making indexes smaller and faster, application of general compression methods to ordered data sets has advanced very little. This paper demonstrates that the fast dictionary-based methods can be applied to order-preserving compression almost with the same freedom as in the general case. The proposed new technology has the same speed and a compression rate only marginally lower than the traditional order-indifferent dictionary encoding. Procedures for encoding and generating the encode tables are described covering such order-related features as ordered data set restrictions, sensitivity and insensitivity to a character position, and one-symbol encoding of each frequent trailing character sequence. The experimental results presented demonstrate five-folded compression on real-life data sets and twelve-folded compression on Wisconsin benchmark text fields.

Key Words

Indexing, Order-preserving key compression

Copyright © 1997 by Springer, Berlin, Heidelberg. Permission to make digital or hard copies of the abstract is granted provided that copies are not made or distributed for profit or direct commercial advantage, and that copies show this notice along with the full citation.


Online Edition (Springer)

Citation Page

ACM SIGMOD Anthology

CDROM Version: Load the CDROM "Volume 4 Issue 1, Books, VLDB-j, TODS, ..." and ... DVD Version: Load ACM SIGMOD Anthology DVD 2" and ...

References

[1]
...
[2]
Jean-Loup Baer, Yi-Bing Lin: Improving Quicksort Performance with a Codewort Data Structure. IEEE Trans. Software Eng. 15(5): 622-631(1989) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[3]
Rudolf Bayer, Karl Unterauer: Prefix B-Trees. ACM Trans. Database Syst. 2(1): 11-26(1977) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[4]
...
[5]
Mike W. Blasgen, Richard G. Casey, Kapali P. Eswaran: An Encoding Method for Multifield Sorting and Indexing. Commun. ACM 20(11): 874-878(1977) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[6]
...
[7]
Goetz Graefe: Query Evaluation Techniques for Large Databases. ACM Comput. Surv. 25(2): 73-170(1993) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[8]
Jim Gray (Ed.): The Benchmark Handbook for Database and Transaction Systems (2nd Edition). Morgan Kaufmann 1993, ISBN 1-55860-292-5
Contents CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[9]
...
[10]
...
[11]
Alistair Moffat, Justin Zobel: Coding for Compression in Full-Text Retrieval Systems. Data Compression Conference 1992: 72-81 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[12]
A. Zandi, Balakrishna R. Iyer, Glen G. Langdon Jr.: Sort Order Preserving Data Compression for Extended Alphabets. Data Compression Conference 1993: 330-339 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[13]
Jacob Ziv, Abraham Lempel: Compression of Individual Sequences via Variable-Rate Coding. IEEE Transactions on Information Theory 24(5): 530-536(1978) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML

Copyright © Tue Dec 1 16:38:30 2009 by Michael Ley (ley@uni-trier.de)