ACM SIGMOD Anthology SIGIR dblp.uni-trier.de

Pivoted Document Length Normalization.

Amit Singhal, Chris Buckley, Mandar Mitra: Pivoted Document Length Normalization. SIGIR 1996: 21-29
@inproceedings{DBLP:conf/sigir/SinghalBM96,
  author    = {Amit Singhal and
               Chris Buckley and
               Mandar Mitra},
  editor    = {Hans-Peter Frei and
               Donna Harman and
               Peter Sch{\"a}uble and
               Ross Wilkinson},
  title     = {Pivoted Document Length Normalization},
  booktitle = {Proceedings of the 19th Annual International ACM SIGIR Conference
               on Research and Development in Information Retrieval, SIGIR'96,
               August 18-22, 1996, Zurich, Switzerland (Special Issue of the
               SIGIR Forum)},
  publisher = {ACM},
  year      = {1996},
  isbn      = {0-89791-792-8},
  pages     = {21-29},
  ee        = {db/conf/sigir/SinghalBM96.html},
  crossref  = {DBLP:conf/sigir/96},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}

Abstract

Automatic information retrieval systems have to deal with documents of varying lengths in a text collection. Document length normalization is used to fairly retrieve documents of all lengths. In this study, we ohserve that a normalization scheme that retrieves documents of all lengths with similar chances as their likelihood of relevance will outperform another scheme which retrieves documents with chances very different from their likelihood of relevance. We show that the retrieval probabilities for a particular normalization method deviate systematically from the relevance probabilities across different collections. We present pivoted normalization, a technique that can be used to modify any normalization function thereby reducing the gap between the relevance and the retrieval probabilities. Training pivoted normalization on one collection, we can successfully use it on other (new) text collections, yielding a robust, collection independent normalization technique. We use the idea of pivoting with the well known cosine normalization function. We point out some shortcomings of the cosine function and present two new normalization functions - pivoted unique normalization and pivoted byte size normalization.

Copyright © 1996 by the ACM, Inc., used by permission. Permission to make digital or hard copies is granted provided that copies are not made or distributed for profit or direct commercial advantage, and that copies show this notice on the first page or initial screen of a display along with the full citation.


ACM SIGMOD Anthology

CDROM Version: Load the CDROM "Volume 2 Issue 3, SIGIR, DASFAA'97, OODBS'86" and ... DVD Version: Load ACM SIGMOD Anthology DVD 1" and ...

Printed Edition

Hans-Peter Frei, Donna Harman, Peter Schäuble, Ross Wilkinson (Eds.): Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR'96, August 18-22, 1996, Zurich, Switzerland (Special Issue of the SIGIR Forum). ACM 1996, ISBN 0-89791-792-8
Contents CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML

Online Edition: ACM Digital Library

Citation page

Copyright © Tue Dec 8 20:18:58 2009 by Michael Ley (ley@uni-trier.de)