ACM SIGMOD Anthology ACM SIGMOD dblp.uni-trier.de

Accurate Estimation of the Number of Tuples Satisfying a Condition.

Gregory Piatetsky-Shapiro, Charles Connell: Accurate Estimation of the Number of Tuples Satisfying a Condition. SIGMOD Conference 1984: 256-276
@inproceedings{DBLP:conf/sigmod/Piatetsky-ShapiroC84,
  author    = {Gregory Piatetsky-Shapiro and
               Charles Connell},
  editor    = {Beatrice Yormark},
  title     = {Accurate Estimation of the Number of Tuples Satisfying a Condition},
  booktitle = {SIGMOD'84, Proceedings of Annual Meeting, Boston, Massachusetts,
               June 18-21, 1984},
  publisher = {ACM Press},
  year      = {1984},
  pages     = {256-276},
  ee        = {http://doi.acm.org/10.1145/602259.602294, db/conf/sigmod/Piatetsky-ShapiroC84.html},
  crossref  = {DBLP:conf/sigmod/84},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}
BibTeX

Abstract

We present a new method for estimating the number of tuples satisfying a condition of the type attribute rel constant, where rel is one of "=", ">", "<", ">=", "<=". Our method gives highly accurate, yet easy to compute, estimates. We store information about attribute values as a list of distribution steps (histograms where buckets, instead of having equal width, have equal height). These distribution steps provide an upper bound on the error when estimating the number of tuples satisfying a condition. The estimation error can be arbitrarily reduced by increasing the number of steps. We analyze desirable conditions that such estimates should satisfy. Based on the distribution steps, we derive a set of estimation formulas which minimize the worst-case error. We also present another set of formulas which reduce the average-case error. Finally, we show how to use sampling to compute a close approximation of the distribution steps very quickly. The major applications of our method are in query optimization and in answering statistical queries.

Copyright © 1984 by the ACM, Inc., used by permission. Permission to make digital or hard copies is granted provided that copies are not made or distributed for profit or direct commercial advantage, and that copies show this notice on the first page or initial screen of a display along with the full citation.


ACM SIGMOD Anthology

Online Version (ACM WWW Account required): Full Text in PDF Format

CDROM Version: Load the CDROM "Volume 1 Issue 2, SIGMOD '75-'92" and ...

DVD Version: Load ACM SIGMOD Anthology DVD 1" and ... BibTeX

Printed Edition

Beatrice Yormark (Ed.): SIGMOD'84, Proceedings of Annual Meeting, Boston, Massachusetts, June 18-21, 1984. ACM Press 1984 BibTeX , SIGMOD Record 14(2)
Contents

Online Edition: ACM Digital Library


References

[Blasgen 77]
Mike W. Blasgen, Kapali P. Eswaran: Storage and Access in Relational Data Bases. IBM Systems Journal 16(4): 362-377(1977) BibTeX
[Christodoulakis 81]
...
[Christodoulakis 83]
Stavros Christodoulakis: Estimating Block Transfers and Join Sizes. SIGMOD Conference 1983: 40-54 BibTeX
[Demolombe 80]
Robert Demolombe: Estimation of the Number of Tuples Satisfying a Query Expressed in Predicate Calculus Language. VLDB 1980: 55-63 BibTeX
[Dixon 79]
...
[Luk 83]
W. S. Luk: On Estimating Block Accesses in Database Organizations. Commun. ACM 26(11): 945-947(1983) BibTeX
[Piatetsky 84]
...
[Rowe 83]
Neil C. Rowe: Top-Down Statistical Estimation on a Database. SIGMOD Conference 1983: 135-145 BibTeX
[Samson 83]
W. B. Samson, A. Bendell: Rank Order Distributions and Secondary Key Indexing. Comput. J. 28(3): 309-312(1985) BibTeX
[Selinger 79]
Patricia G. Selinger, Morton M. Astrahan, Donald D. Chamberlin, Raymond A. Lorie, Thomas G. Price: Access Path Selection in a Relational Database Management System. SIGMOD Conference 1979: 23-34 BibTeX
[Whang 83]
Kyu-Young Whang, Gio Wiederhold, Daniel Sagalowicz: Estimating Block Accesses in Database Organizations: A Closed Noniterative Formula. Commun. ACM 26(11): 940-944(1983) BibTeX
[Yao 77]
S. Bing Yao: Approximating the Number of Accesses in Database Organizations. Commun. ACM 20(4): 260-261(1977) BibTeX
[Yao 79]
S. Bing Yao: Optimization of Query Evaluation Algorithms. ACM Trans. Database Syst. 4(2): 133-155(1979) BibTeX
[Yu 78]
Clement T. Yu, W. S. Luk, M. K. Siu: On the Estimation of the Number of Desired Records with Respect to a Given Query. ACM Trans. Database Syst. 3(1): 41-56(1978) BibTeX
[Zipf 49]
George Kingsley Zipf: Human Behaviour and the Principle of Least Effort: an Introduction to Human Ecology. Addison-Wesley 1949
BibTeX
BibTeX
ACM SIGMOD Anthology - DBLP: [Home | Search: Author, Title | Conferences | Journals]
ACM SIGMOD Anthology: Copyright © by ACM (info@acm.org), Corrections: anthology@acm.org
DBLP: Copyright © by Michael Ley (ley@uni-trier.de), last change: Wed Jul 23 16:19:01 2008