dblp.uni-trier.de www.dagstuhl.de www.uni-trier.de

Practical Skew Handling in Parallel Joins.

David J. DeWitt, Jeffrey F. Naughton, Donovan A. Schneider, S. Seshadri: Practical Skew Handling in Parallel Joins. VLDB 1992: 27-40
@inproceedings{DBLP:conf/vldb/DeWittNSS92,
  author    = {David J. DeWitt and
               Jeffrey F. Naughton and
               Donovan A. Schneider and
               S. Seshadri},
  editor    = {Li-Yan Yuan},
  title     = {Practical Skew Handling in Parallel Joins},
  booktitle = {18th International Conference on Very Large Data Bases, August
               23-27, 1992, Vancouver, Canada, Proceedings},
  publisher = {Morgan Kaufmann},
  year      = {1992},
  isbn      = {1-55860-151-1},
  pages     = {27-40},
  ee        = {http://www.vldb.org/conf/1992/P027.PDF},
  crossref  = {DBLP:conf/vldb/92},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}

Abstract

We present an approach to dealing with skew in parallel joins in database systems. Our approach is easily implementable within current parallel DBMS, and performs well on skewed data without degrading the performance of the system on non-skewed data. The main idea is to use multiple algorithms, each specialized for a different degree of skew, and to use a small sample of the relations being joined to determine which algorithm is appropriate. We developed, implemented, and experimented with four new skew-handling parallel join algorithms; one, which we call virtual processor range partitioning, was the clear winner in high skew cases, while traditional hybrid hash joinwas the clear winner in lower skew or no skew cases. We present experimental results from an implementation of all four algorithms on the Gamma parallel database machine. To our knowledge, these are the first reported skew-handling numbers from an actual implementation.

Copyright © 1992 by the VLDB Endowment. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by the permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, requires a fee and/or special permission from the Endowment.


Printed Edition

Li-Yan Yuan (Ed.): 18th International Conference on Very Large Data Bases, August 23-27, 1992, Vancouver, Canada, Proceedings. Morgan Kaufmann 1992, ISBN 1-55860-151-1
Contents CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML

References

[BFKS87]
Chaitanya K. Baru, Ophir Frieder, Dilip D. Kandlur, Mark E. Segal: Join on a Cube: Analysis, Simulation, and Implementation. IWDM 1987: 61-74 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[BGMP79]
Mike W. Blasgen, Jim Gray, Michael F. Mitoma, Thomas G. Price: The Convoy Phenomenon. Operating Systems Review 13(2): 20-25(1979) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[Bra87]
Kjell Bratbergsengen: Algebra Operations on a Parallel Computer - Performance Evaluation. IWDM 1987: 415-428 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[CDKK85]
Hong-Tai Chou, David J. DeWitt, Randy H. Katz, Anthony C. Klug: Design and Implementation of the Wisconsin Storage System. Softw., Pract. Exper. 15(10): 943-962(1985) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[Coc77]
William G. Cochran: Sampling Techniques, 3rd Edition. John Wiley 1977, ISBN 0-471-16240-X
CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[CW79]
Larry Carter, Mark N. Wegman: Universal Classes of Hash Functions. J. Comput. Syst. Sci. 18(2): 143-154(1979) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[DG85]
David J. DeWitt, Robert H. Gerber: Multiprocessor Hash-Based Join Algorithms. VLDB 1985: 151-164 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[DG92]
David J. DeWitt, Jim Gray: Parallel Database Systems: The Future of High Performance Database Systems. Commun. ACM 35(6): 85-98(1992) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[DGG++86]
David J. DeWitt, Robert H. Gerber, Goetz Graefe, Michael L. Heytens, Krishna B. Kumar, M. Muralikrishna: GAMMA - A High Performance Dataflow Database Machine. VLDB 1986: 228-237 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[DGS88]
David J. DeWitt, Shahram Ghandeharizadeh, Donovan A. Schneider: A Performance Analysis of the Gamma Database Machine. SIGMOD Conference 1988: 350-360 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[DGS+90]
David J. DeWitt, Shahram Ghandeharizadeh, Donovan A. Schneider, Allan Bricker, Hui-I Hsiao, Rick Rasmussen: The Gamma Database Machine Project. IEEE Trans. Knowl. Data Eng. 2(1): 44-62(1990) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[DNS91a]
David J. DeWitt, Jeffrey F. Naughton, Donovan A. Schneider: An Evaluation of Non-Equijoin Algorithms. VLDB 1991: 443-452 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[DNS91b]
David J. DeWitt, Jeffrey F. Naughton, Donovan A. Schneider: Parallel Sorting on a Shared-Nothing Architecture using Probabilistic Splitting. PDIS 1991: 280-291 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[ESW78]
Robert S. Epstein, Michael Stonebraker, Eugene Wong: Distributed Query Processing in a Relational Data Base System. SIGMOD Conference 1978: 169-180 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[Gra69]
Ronald L. Graham: Bounds on Multiprocessing Timing Anomalies. SIAM Journal of Applied Mathematics 17(2): 416-429(1969) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[HL91]
Kien A. Hua, Chiang Lee: Handling Data Skew in Multiprocessor Database Computers Using Partition Tuning. VLDB 1991: 525-535 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[KO90]
Masaru Kitsuregawa, Yasushi Ogawa: Bucket Spreading Parallel Hash: A New, Robust, Parallel Hash Join Method for Data Skew in the Super Database Computer (SDC). VLDB 1990: 210-221 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[KTMo83]
Masaru Kitsuregawa, Hidehiko Tanaka, Tohru Moto-Oka: Application of Hash to Data Base Machine and Its Architecture. New Generation Comput. 1(1): 63-74(1983) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[LY90]
M. Seetha Lakshmi, Philip S. Yu: Effectiveness of Parallel Joins. IEEE Trans. Knowl. Data Eng. 2(4): 410-424(1990) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[Omi91]
Edward Omiecinski: Performance Analysis of a Load Balancing Hash-Join Algorithm for a Shared Memory Multiprocessor. VLDB 1991: 375-385 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[OR89]
Frank Olken, Doron Rotem: Random Sampling from B+ Trees. VLDB 1989: 269-277 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[ORX90]
Frank Olken, Doron Rotem, Ping Xu: Random Sampling from Hash Files. SIGMOD Conference 1990: 375-386 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[SD89]
Donovan A. Schneider, David J. DeWitt: A Performance Evaluation of Four Parallel Join Algorithms in a Shared-Nothing Multiprocessor Environment. SIGMOD Conference 1989: 110-121 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[SN92]
S. Seshadri, Jeffrey F. Naughton: Sampling Issues in Parallel Database Systems. EDBT 1992: 328-343 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[Sto86]
Michael Stonebraker: The Case for Shared Nothing. IEEE Database Eng. Bull. 9(1): 4-9(1986) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[WDJ91]
Christopher B. Walton, Alfred G. Dale, Roy M. Jenevein: A Taxonomy and Performance Model of Data Skew Effects in Parallel Joins. VLDB 1991: 537-548 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[WDYT90]
Joel L. Wolf, Daniel M. Dias, Philip S. Yu, John Turek: An Effective Algorithm for Parallelizing Hash Joins in the Presence of Data Skew. ICDE 1991: 200-209 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML

Last update Mon Sep 17 22:00:51 2012 CET by the DBLP TeamThis material is Open Data Data released under the ODC-BY 1.0 license — See also our legal information page