ACM SIGMOD Anthology ACM SIGMOD dblp.uni-trier.de

Accurate Estimation of the Number of Tuples Satisfying a Condition.

Gregory Piatetsky-Shapiro, Charles Connell: Accurate Estimation of the Number of Tuples Satisfying a Condition. SIGMOD Conference 1984: 256-276
@inproceedings{DBLP:conf/sigmod/Piatetsky-ShapiroC84,
  author    = {Gregory Piatetsky-Shapiro and
               Charles Connell},
  editor    = {Beatrice Yormark},
  title     = {Accurate Estimation of the Number of Tuples Satisfying a Condition},
  booktitle = {SIGMOD'84, Proceedings of Annual Meeting, Boston, Massachusetts,
               June 18-21, 1984},
  publisher = {ACM Press},
  year      = {1984},
  pages     = {256-276},
  ee        = {http://doi.acm.org/10.1145/602259.602294, db/conf/sigmod/Piatetsky-ShapiroC84.html},
  crossref  = {DBLP:conf/sigmod/84},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}

Abstract

We present a new method for estimating the number of tuples satisfying a condition of the type attribute rel constant, where rel is one of "=", ">", "<", ">=", "<=". Our method gives highly accurate, yet easy to compute, estimates. We store information about attribute values as a list of distribution steps (histograms where buckets, instead of having equal width, have equal height). These distribution steps provide an upper bound on the error when estimating the number of tuples satisfying a condition. The estimation error can be arbitrarily reduced by increasing the number of steps. We analyze desirable conditions that such estimates should satisfy. Based on the distribution steps, we derive a set of estimation formulas which minimize the worst-case error. We also present another set of formulas which reduce the average-case error. Finally, we show how to use sampling to compute a close approximation of the distribution steps very quickly. The major applications of our method are in query optimization and in answering statistical queries.

Copyright © 1984 by the ACM, Inc., used by permission. Permission to make digital or hard copies is granted provided that copies are not made or distributed for profit or direct commercial advantage, and that copies show this notice on the first page or initial screen of a display along with the full citation.


ACM SIGMOD Anthology

Online Version (ACM WWW Account required): Full Text in PDF Format

CDROM Version: Load the CDROM "Volume 1 Issue 2, SIGMOD '75-'92" and ...

DVD Version: Load ACM SIGMOD Anthology DVD 1" and ...

Printed Edition

Beatrice Yormark (Ed.): SIGMOD'84, Proceedings of Annual Meeting, Boston, Massachusetts, June 18-21, 1984. ACM Press 1984 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML, SIGMOD Record 14(2)
Contents

Online Edition: ACM Digital Library


References

[Blasgen 77]
Mike W. Blasgen, Kapali P. Eswaran: Storage and Access in Relational Data Bases. IBM Systems Journal 16(4): 362-377(1977) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[Christodoulakis 81]
...
[Christodoulakis 83]
Stavros Christodoulakis: Estimating Block Transfers and Join Sizes. SIGMOD Conference 1983: 40-54 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[Demolombe 80]
Robert Demolombe: Estimation of the Number of Tuples Satisfying a Query Expressed in Predicate Calculus Language. VLDB 1980: 55-63 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[Dixon 79]
...
[Luk 83]
W. S. Luk: On Estimating Block Accesses in Database Organizations. Commun. ACM 26(11): 945-947(1983) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[Piatetsky 84]
...
[Rowe 83]
Neil C. Rowe: Top-Down Statistical Estimation on a Database. SIGMOD Conference 1983: 135-145 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[Samson 83]
W. B. Samson, A. Bendell: Rank Order Distributions and Secondary Key Indexing. Comput. J. 28(3): 309-312(1985) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[Selinger 79]
Patricia G. Selinger, Morton M. Astrahan, Donald D. Chamberlin, Raymond A. Lorie, Thomas G. Price: Access Path Selection in a Relational Database Management System. SIGMOD Conference 1979: 23-34 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[Whang 83]
Kyu-Young Whang, Gio Wiederhold, Daniel Sagalowicz: Estimating Block Accesses in Database Organizations: A Closed Noniterative Formula. Commun. ACM 26(11): 940-944(1983) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[Yao 77]
S. Bing Yao: Approximating the Number of Accesses in Database Organizations. Commun. ACM 20(4): 260-261(1977) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[Yao 79]
S. Bing Yao: Optimization of Query Evaluation Algorithms. ACM Trans. Database Syst. 4(2): 133-155(1979) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[Yu 78]
Clement T. Yu, W. S. Luk, M. K. Siu: On the Estimation of the Number of Desired Records with Respect to a Given Query. ACM Trans. Database Syst. 3(1): 41-56(1978) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[Zipf 49]
George Kingsley Zipf: Human Behaviour and the Principle of Least Effort: an Introduction to Human Ecology. Addison-Wesley 1949
CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML

Copyright © Wed Dec 23 21:45:11 2009 by Michael Ley (ley@uni-trier.de)