ACM SIGMOD Anthology SIGIR dblp.uni-trier.de

Compression of Indexes with Full Positional Information in Very Large Text Databases.

Gordon Linoff, Craig Stanfill: Compression of Indexes with Full Positional Information in Very Large Text Databases. SIGIR 1993: 88-95
@inproceedings{DBLP:conf/sigir/LinoffS93,
  author    = {Gordon Linoff and
               Craig Stanfill},
  editor    = {Robert Korfhage and
               Edie M. Rasmussen and
               Peter Willett 0002},
  title     = {Compression of Indexes with Full Positional Information in Very
               Large Text Databases},
  booktitle = {Proceedings of the 16th Annual International ACM-SIGIR Conference
               on Research and Development in Information Retrieval. Pittsburgh,
               PA, USA, June 27 - July 1, 1993},
  publisher = {ACM},
  year      = {1993},
  isbn      = {0-89791-605-0},
  pages     = {88-95},
  ee        = {db/conf/sigir/LinoffS93.html},
  crossref  = {DBLP:conf/sigir/93},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}

Abstract

This paper describes a combination of compression methods which may be used to reduce the size of inverted indexes for very large text databases. These methods are Prefix Omission, Run-Length Encoding, and a novel family of numeric representations called n-s coding. Using these compression methods on two different text sources (the King James Version of the Bible and a sample of Wall Street Journal Stories), the compressed index occupies less than 40% of the size of the original text, even when both stopwords and numbers are included in the index. The decreased time required for I/O can almost fully compensate for the time needed to uncompress the postings. This research is part of an effort to handle very large text databases on the CM-5, a massively parallel MIMD supercomputer.

Copyright © 1993 by the ACM, Inc., used by permission. Permission to make digital or hard copies is granted provided that copies are not made or distributed for profit or direct commercial advantage, and that copies show this notice on the first page or initial screen of a display along with the full citation.


ACM SIGMOD Anthology

CDROM Version: Load the CDROM "Volume 2 Issue 3, SIGIR, DASFAA'97, OODBS'86" and ... DVD Version: Load ACM SIGMOD Anthology DVD 1" and ...

Printed Edition

Robert Korfhage, Edie M. Rasmussen, Peter Willett (Eds.): Proceedings of the 16th Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval. Pittsburgh, PA, USA, June 27 - July 1, 1993. ACM 1993, ISBN 0-89791-605-0
Contents CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML

Online Edition: ACM Digital Library

Citation page

Copyright © Fri Dec 11 20:18:08 2009 by Michael Ley (ley@uni-trier.de)