dblp.uni-trier.de www.dagstuhl.de www.uni-trier.de

New Sampling-Based Summary Statistics for Improving Approximate Query Answers.

Phillip B. Gibbons, Yossi Matias: New Sampling-Based Summary Statistics for Improving Approximate Query Answers. SIGMOD Conference 1998: 331-342
@inproceedings{DBLP:conf/sigmod/GibbonsM98,
  author    = {Phillip B. Gibbons and
               Yossi Matias},
  editor    = {Laura M. Haas and
               Ashutosh Tiwary},
  title     = {New Sampling-Based Summary Statistics for Improving Approximate
               Query Answers},
  booktitle = {SIGMOD 1998, Proceedings ACM SIGMOD International Conference
               on Management of Data, June 2-4, 1998, Seattle, Washington, USA},
  publisher = {ACM Press},
  year      = {1998},
  isbn      = {0-89791-995-5},
  pages     = {331-342},
  ee        = {http://doi.acm.org/10.1145/276304.276334},
  crossref  = {DBLP:conf/sigmod/98},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}

Abstract

In large data recording and warehousing environments, it is often advantageous to provide fast, approximate answers to queries, whenever possible. Before DBMSs providing highly-accurate approximate answers can become a reality, many new techniques for summarizing data and for estimating answers from summarized data must be developed. This paper introduces two new sampling-based summary statistics, concise samples and counting samples, and presents new techniques for their fast incremental maintenance regardless of the data distribution. We quantify their advantages over standard sample views in terms of the number of additional sample points for the same view size, and hence in providing more accurate query answers. Finally, we consider their application to providing fast approximate answers to hot list queries. Our algorithms maintain their accuracy in the presence of ongoing insertions to the data warehouse.

Copyright © 1998 by the ACM, Inc., used by permission. Permission to make digital or hard copies is granted provided that copies are not made or distributed for profit or direct commercial advantage, and that copies show this notice on the first page or initial screen of a display along with the full citation.


ACM SIGMOD DiSC

CDROM Version: Load the CDROM "DiSC, Volume 1 Number 1" and ... Online Version (ACM WWW Account required): Full Text in PDF Format

ACM SIGMOD Anthology

DVD Version: Load ACM SIGMOD Anthology DVD 1" and ...

Printed Edition

Laura M. Haas, Ashutosh Tiwary (Eds.): SIGMOD 1998, Proceedings ACM SIGMOD International Conference on Management of Data, June 2-4, 1998, Seattle, Washington, USA. ACM Press 1998, ISBN 0-89791-995-5 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML, SIGMOD Record 27(2), June 1998
Contents

Online Edition: ACM SIGMOD

[Abstract]
[Full Text (Postscript)]

References

[AMS96]
...
[Ant92]
Gennady Antoshenkov: Random Sampling from Pseudo-Ranked B+ Trees. VLDB 1992: 375-382 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[AS94]
Rakesh Agrawal, Ramakrishnan Srikant: Fast Algorithms for Mining Association Rules in Large Databases. VLDB 1994: 487-499 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[AZ96]
Gennady Antoshenkov, Mohamed Ziauddin: Query Processing and Optimization in Oracle Rdb. VLDB J. 5(4): 229-237(1996) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[BDF+97]
Daniel Barbará, William DuMouchel, Christos Faloutsos, Peter J. Haas, Joseph M. Hellerstein, Yannis E. Ioannidis, H. V. Jagadish, Theodore Johnson, Raymond T. Ng, Viswanath Poosala, Kenneth A. Ross, Kenneth C. Sevcik: The New Jersey Data Reduction Report. IEEE Data Eng. Bull. 20(4): 3-45(1997) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[BM96]
Roberto J. Bayardo Jr., Daniel P. Miranker: Processing Queries for First Few Answers. CIKM 1996: 45-52 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[BMUT97]
Sergey Brin, Rajeev Motwani, Jeffrey D. Ullman, Shalom Tsur: Dynamic Itemset Counting and Implication Rules for Market Basket Data. SIGMOD Conference 1997: 255-264 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[FJS97]
Christos Faloutsos, H. V. Jagadish, Nikolaos Sidiropoulos: Recovering Information from Summary Data. VLDB 1997: 36-45 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[Fla85]
Philippe Flajolet: Approximate Counting: A Detailed Analysis. BIT 25(1): 113-134(1985) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[FM83]
Philippe Flajolet, G. Nigel Martin: Probabilistic Counting. FOCS 1983: 76-82 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[FM85]
Philippe Flajolet, G. Nigel Martin: Probabilistic Counting Algorithms for Data Base Applications. J. Comput. Syst. Sci. 31(2): 182-209(1985) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[GM95]
...
[GM97]
...
[GMP97a]
...
[GMP97b]
Phillip B. Gibbons, Yossi Matias, Viswanath Poosala: Fast Incremental Maintenance of Approximate Histograms. VLDB 1997: 466-475 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[GPA+98]
...
[HHW97]
Joseph M. Hellerstein, Peter J. Haas, Helen J. Wang: Online Aggregation. SIGMOD Conference 1997: 171-182 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[HK95]
...
[HNSS95]
Peter J. Haas, Jeffrey F. Naughton, S. Seshadri, Lynne Stokes: Sampling-Based Estimation of the Number of Distinct Values of an Attribute. VLDB 1995: 311-322 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[IC93]
Yannis E. Ioannidis, Stavros Christodoulakis: Optimal Histograms for Limiting Worst-Case Error Propagation in the Size of Join Results. ACM Trans. Database Syst. 18(4): 709-748(1993) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[Ioa93]
Yannis E. Ioannidis: Universality of Serial Histograms. VLDB 1993: 256-267 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[IP95]
Yannis E. Ioannidis, Viswanath Poosala: Balancing Histogram Optimality and Practicality for Query Result Size Estimation. SIGMOD Conference 1995: 233-244 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[Mat92]
...
[Mor78]
Robert Morris: Counting Large Numbers of Events in Small Registers. Commun. ACM 21(10): 840-842(1978) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[MSY96]
...
[MVN93]
...
[MVY94]
...
[OR89]
Frank Olken, Doron Rotem: Random Sampling from B+ Trees. VLDB 1989: 269-277 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[OR92]
Frank Olken, Doron Rotem: Maintenance of Materialized Views of Sampling Queries. ICDE 1992: 632-641 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[PIHS96]
Viswanath Poosala, Yannis E. Ioannidis, Peter J. Haas, Eugene J. Shekita: Improved Histograms for Selectivity Estimation of Range Predicates. SIGMOD Conference 1996: 294-305 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[Pre97]
...
[Vit85]
Jeffrey Scott Vitter: Random Sampling with a Reservoir. ACM Trans. Math. Softw. 11(1): 37-57(1985) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[VL93]
Susan V. Vrbsky, Jane W.-S. Liu: APPROXIMATE - A Query Processor that Produces Monotonically Improving Approximate Answers. IEEE Trans. Knowl. Data Eng. 5(6): 1056-1068(1993) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[WVZT90]
Kyu-Young Whang, Brad T. Vander Zanden, Howard M. Taylor: A Linear-Time Probabilistic Counting Algorithm for Database Applications. ACM Trans. Database Syst. 15(2): 208-229(1990) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML

Last update Tue Sep 18 00:25:17 2012 CET by the DBLP TeamThis material is Open Data Data released under the ODC-BY 1.0 license — See also our legal information page