ACM SIGMOD Anthology ACM SIGMOD dblp.uni-trier.de

Large-Sample and Deterministic Confidence Intervals for Online Aggregation.

Peter J. Haas: Large-Sample and Deterministic Confidence Intervals for Online Aggregation. SSDBM 1997: 51-63
@inproceedings{DBLP:conf/ssdbm/Haas97,
  author    = {Peter J. Haas},
  editor    = {Yannis E. Ioannidis and
               David M. Hansen},
  title     = {Large-Sample and Deterministic Confidence Intervals for Online
               Aggregation},
  booktitle = {Ninth International Conference on Scientific and Statistical
               Database Management, Proceedings, August 11-13, 1997, Olympia,
               Washington, USA},
  publisher = {IEEE Computer Society},
  year      = {1997},
  isbn      = {0-8186-7952-2},
  pages     = {51-63},
  ee        = {db/conf/ssdbm/Haas97.html},
  crossref  = {DBLP:conf/ssdbm/97},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}

Abstract

The online aggregation system recently proposed by Hellerstein, et al. permits interactive exploration of large, complex datasets stored in relational database management systems. Running confidence intervals are an important component of an online aggregation system and indicate to the user the estimated proximity of each running aggregate to the corresponding final result. Large-sample confidence intervals contain the final result with a prespecified probability and rest on central limit theorems, while deterministic confidence intervals contain the final query result with probability 1. In this paper we show how new and existing central limit theorems, simple bounding arguments, and the delta method can be used to derive formulas for both large-sample and deterministic confidence intervals. To illustrate these techniques, we obtain formulas for running confidence intervals in the case of single-table and multi-table AVG, COUNT, SUM, VARIANCE, and STDEV queries with join and selection predicates. Duplicate-elimination and GROUP-BY operations are also considered. We then provide numerically stable algorithms for computing the confidence intervals and analyze the complexity of these algorithms.

Copyright © 1997 by The Institute of Electrical and Electronic Engineers, Inc. (IEEE). Abstract used with permission.


ACM SIGMOD Anthology

CDROM Version: Load the CDROM "Volume 2 Issue 5, SSDBM, DBPL, KRDB, ADBIS, COOPIS, SIGBDP" and ... DVD Version: Load ACM SIGMOD Anthology DVD 1" and ...

Online Edition: IEEE Computer Society DL

Citation Page

Printed Edition

Yannis E. Ioannidis, David M. Hansen (Eds.): Ninth International Conference on Scientific and Statistical Database Management, Proceedings, August 11-13, 1997, Olympia, Washington, USA. IEEE Computer Society 1997, ISBN 0-8186-7952-2
Contents CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML

References

[1]
...
[2]
...
[3]
William G. Cochran: Sampling Techniques, 3rd Edition. John Wiley 1977, ISBN 0-471-16240-X
CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[4]
...
[5]
Peter J. Haas, Jeffrey F. Naughton, S. Seshadri, Arun N. Swami: Selectivity and Cost Estimation for Joins Based on Random Sampling. J. Comput. Syst. Sci. 52(3): 550-569(1996) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[6]
...
[7]
Joseph M. Hellerstein, Peter J. Haas, Helen J. Wang: Online Aggregation. SIGMOD Conference 1997: 171-182 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[8]
...
[9]
...
[10]
...

Copyright © Mon Nov 16 22:46:35 2009 by Michael Ley (ley@uni-trier.de)