Estimating Alphanumeric Selectivity in the Presence of Wildcards.
P. Krishnan, Jeffrey Scott Vitter, Balakrishna R. Iyer:
Estimating Alphanumeric Selectivity in the Presence of Wildcards.
SIGMOD Conference 1996: 282-293@inproceedings{DBLP:conf/sigmod/KrishnanVI96,
author = {P. Krishnan and
Jeffrey Scott Vitter and
Balakrishna R. Iyer},
editor = {H. V. Jagadish and
Inderpal Singh Mumick},
title = {Estimating Alphanumeric Selectivity in the Presence of Wildcards},
booktitle = {Proceedings of the 1996 ACM SIGMOD International Conference on
Management of Data, Montreal, Quebec, Canada, June 4-6, 1996},
publisher = {ACM Press},
year = {1996},
pages = {282-293},
ee = {http://doi.acm.org/10.1145/233269.233341},
crossref = {DBLP:conf/sigmod/96},
bibsource = {DBLP, http://dblp.uni-trier.de}
}
Abstract
Success of commercial query optimizers and database management systems
(object-oriented or relational) depend on accurate cost estimation of
various query reorderings [BGI].
Estimating predicate selectivity, or the fraction of rows in a database
that satisfy a selection predicate, is key to determining the optimal
join order. Previous work has concentrated on estimating selectivity for
numeric fields [ASW, HaSa, IoP, LNS, SAC, WVT]. With the popularity of textual data being
stored in databases, it has become important to estimate selectivity
accurately for alphanumeric fields. A particularly problematic predicate used
against alphanumeric fields is the SQL like predicate
[Dat].
Techniques used for estimating numeric selectivity are not suited for
estimating alphanumeric selectivity.
In this paper, we study for the first time the problem of estimating
alphanumeric selectivity in the presence of wildcards.
Based on the intuition that the model built by a data compressor on an
input text encapsulates information about common substrings in the text, we develop a technique based on the suffix tree data structure to estimate
alphanumeric selectivity. In a statistics generation pass over the database, we construct a compact suffix tree-based structure from the columns of
the database. We then look at three families of methods that utilize this
structure to estimate selectivity during query plan costing, when a query
with predicates on alphanumeric attributes contains wildcards in
the predicate.
We evaluate our methods empirically in the context of the TPC-D benchmark.
We study our methods experimentally against a variety of query patterns
and identify five techniques that hold promise.
Copyright © 1996 by the ACM,
Inc., used by permission. Permission to make
digital or hard copies is granted provided that
copies are not made or distributed for profit or
direct commercial advantage, and that copies show
this notice on the first page or initial screen of
a display along with the full citation.
Printed Edition
H. V. Jagadish, Inderpal Singh Mumick (Eds.):
Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, Montreal, Quebec, Canada, June 4-6, 1996.
ACM Press 1996
, SIGMOD Record 25(2), June 1996
Contents
[Index Terms]
[Full Text in PDF Format, 1285 KB]
References
- [ASW]
- Morton M. Astrahan, Mario Schkolnick, Kyu-Young Whang:
Approximating the number of unique values of an attribute without sorting.
Inf. Syst. 12(1): 11-15(1987)

- [BGI]
- Gautam Bhargava, Piyush Goel, Balakrishna R. Iyer:
Hypergraph Based Reorderings of Outer Join Queries with Complex Predicates.
SIGMOD Conference 1995: 304-315

- [BDG]
- Sergey Brin, James Davis, Hector Garcia-Molina:
Copy Detection Mechanisms for Digital Documents.
SIGMOD Conference 1995: 398-409

- [BuB]
- ...
- [CKV]
- Kenneth M. Curewitz, P. Krishnan, Jeffrey Scott Vitter:
Practical Prefetching via Data Compression.
SIGMOD Conference 1993: 257-266

- [Dat]
- ...
- [FiG]
- Edward R. Fiala, Daniel H. Greene:
Data Compression with Finite Windows.
Commun. ACM 32(4): 490-505(1989)

- [HaSa]
- Peter J. Haas, Arun N. Swami:
Sequential Sampling Procedures for Query Size Estimation.
SIGMOD Conference 1992: 341-350

- [HaSb]
- Peter J. Haas, Arun N. Swami:
Sampling-Based Selectivity Estimation for Joins Using Augmented Frequent Value Statistics.
ICDE 1995: 522-531

- [HeS]
- Mauricio A. Hernández, Salvatore J. Stolfo:
The Merge/Purge Problem for Large Databases.
SIGMOD Conference 1995: 127-138

- [IoP]
- Yannis E. Ioannidis, Viswanath Poosala:
Balancing Histogram Optimality and Practicality for Query Result Size Estimation.
SIGMOD Conference 1995: 233-244

- [Knu]
- Donald E. Knuth:
The Art of Computer Programming, Volume III: Sorting and Searching.
Addison-Wesley 1973, ISBN 0-201-03803-X

- [Kri]
- ...
- [LNS]
- Richard J. Lipton, Jeffrey F. Naughton, Donovan A. Schneider:
Practical Selectivity Estimation through Adaptive Sampling.
SIGMOD Conference 1990: 1-11

- [LiN]
- Richard J. Lipton, Jeffrey F. Naughton:
Query Size Estimation by Adaptive Sampling.
J. Comput. Syst. Sci. 51(1): 18-25(1995)

- [McC]
- Edward M. McCreight:
A Space-Economical Suffix Tree Construction Algorithm.
J. ACM 23(2): 262-272(1976)

- [Mor]
- Donald R. Morrison:
PATRICIA - Practical Algorithm To Retrieve Information Coded in Alphanumeric.
J. ACM 15(4): 514-534(1968)

- [SAC]
- Patricia G. Selinger, Morton M. Astrahan, Donald D. Chamberlin, Raymond A. Lorie, Thomas G. Price:
Access Path Selection in a Relational Database Management System.
SIGMOD Conference 1979: 23-34

- [TPC]
- ...
- [Vap]
- ...
- [WCM]
- Jason Tsong-Li Wang, Gung-Wei Chirn, Thomas G. Marr, Bruce A. Shapiro, Dennis Shasha, Kaizhong Zhang:
Combinatorial Pattern Discovery for Scientific Data: Some Preliminary Results.
SIGMOD Conference 1994: 115-125

- [Wan]
- ...
- [Wei]
- Peter Weiner:
Linear Pattern Matching Algorithms.
SWAT (FOCS) 1973: 1-11

- [WVT]
- Kyu-Young Whang, Brad T. Vander Zanden, Howard M. Taylor:
A Linear-Time Probabilistic Counting Algorithm for Database Applications.
ACM Trans. Database Syst. 15(2): 208-229(1990)

- [ZiLa]
- Jacob Ziv, Abraham Lempel:
A Universal Algorithm for Sequential Data Compression.
IEEE Transactions on Information Theory 23(3): 337-343(1977)

- [ZiLb]
- Jacob Ziv, Abraham Lempel:
Compression of Individual Sequences via Variable-Rate Coding.
IEEE Transactions on Information Theory 24(5): 530-536(1978)

Last update Tue Sep 18 00:25:14 2012
CET by the DBLP Team —
Data released under the ODC-BY 1.0 license — See also our legal information page