Data Placement In Bubba.
George P. Copeland, William Alexander, Ellen E. Boughter, Tom W. Keller:
Data Placement In Bubba.
SIGMOD Conference 1988: 99-108@inproceedings{DBLP:conf/sigmod/CopelandABK88,
author = {George P. Copeland and
William Alexander and
Ellen E. Boughter and
Tom W. Keller},
editor = {Haran Boral and
Per-{\AA}ke Larson},
title = {Data Placement In Bubba},
booktitle = {Proceedings of the 1988 ACM SIGMOD International Conference on
Management of Data, Chicago, Illinois, June 1-3, 1988},
publisher = {ACM Press},
year = {1988},
pages = {99-108},
ee = {http://doi.acm.org/10.1145/50202.50213, db/conf/sigmod/CopelandABK88.html},
crossref = {DBLP:conf/sigmod/88},
bibsource = {DBLP, http://dblp.uni-trier.de}
}
Abstract
This paper examines the problem of data placement in
Bubba, a highly-parallel system for data-intensive
applications being developed at MCC. "Highly-parallel" implies that load balancing is a critical performance issue.
"Data-intensive" means data is so large that operations should be executed where the data resides. As a result, data placement becomes a critical performance issue.
In general, determining the optimal placement of data
across processing nodes for performance is a difficult problem.
We describe our heuristic approach to solving the data placement problem in Bubba. We then present experimental results using a specific workload to provide insight into the
problem. Several researchers have argued the benefits of declustering (i. e., spreading each base relation over many nodes). We show that as declustering is increased, load balancing continues to improve. However, for transactions
involving complex joins, further declustering reduces throughput because of communications, startup and termination overhead.
We argue that data placement, especially declustering, in a highly-parallel system must be considered early in the design, so that mechanisms can be included for supporting variable declustering, for minimizing the most significant overheads associated with large-scale declustering, and for
gathering the required statistics.
Copyright © 1988 by the ACM,
Inc., used by permission. Permission to make
digital or hard copies is granted provided that
copies are not made or distributed for profit or
direct commercial advantage, and that copies show
this notice on the first page or initial screen of
a display along with the full citation.
Online Version (ACM WWW Account required): Full Text in PDF Format
CDROM Version: Load the CDROM "Volume 1 Issue 2, SIGMOD '75-'92" and ...
DVD Version: Load ACM SIGMOD Anthology DVD 1" and ...
Printed Edition
Haran Boral, Per-Åke Larson (Eds.):
Proceedings of the 1988 ACM SIGMOD International Conference on Management of Data, Chicago, Illinois, June 1-3, 1988.
ACM Press 1988
,
SIGMOD Record 17(2), June 1988
Contents
References
- [Ale87]
- William Alexander, Tom W. Keller, Ellen E. Boughter:
A Workload Characterization Pipeline for Models of Parallel Systems.
SIGMETRICS 1987: 186-194

- [Ale88]
- William Alexander, George P. Copeland:
Comparison of Dataflow Control Techniques In Distributed Data-Intensive Systems.
SIGMETRICS 1988: 157-166

- [AlC88]
- William Alexander, George P. Copeland:
Process And Dataflow Control In Distributed Data-Intensive Systems.
SIGMOD Conference 1988: 90-98

- [Ano85]
- ...
- [Att84]
- Rony Attar, Philip A. Bernstein, Nathan Goodman:
Site Initialization, Recovery, and Backup in a Distributed Database System.
IEEE Trans. Software Eng. 10(6): 645-650(1984)

- [Bat82]
- Don S. Batory:
Optimal File Designs and Reorganization Points.
ACM Trans. Database Syst. 7(1): 60-81(1982)

- [Bou87]
- ...
- [Bun84]
- Richard B. Bunt, Jennifer M. Murphy, Shikharesh Majumdar:
A Measure of Program Locality and Its Application.
SIGMETRICS 1984: 28-40

- [Chu69]
- ...
- [Cve87]
- Zarka Cvetanovic:
The Effects of Problem Partitioning, Allocation, and Granularity on the Performance of Multiple-Processor Systems.
IEEE Trans. Computers 36(4): 421-432(1987)

- [Den78]
- Peter J. Denning, Jeffrey P. Buzen:
The Operational Analysis of Queueing Network Models.
ACM Comput. Surv. 10(3): 225-261(1978)

- [DeW86]
- David J. DeWitt, Robert H. Gerber, Goetz Graefe, Michael L. Heytens, Krishna B. Kumar, M. Muralikrishna:
GAMMA - A High Performance Dataflow Database Machine.
VLDB 1986: 228-237

- [DeW87]
- David J. DeWitt, Shahram Ghandeharizadeh, Donovan A. Schneider, Rajiv Jauhari, M. Muralikrishna, Anoop Sharma:
A Single User Evaluation of the Gamma Database Machine.
IWDM 1987: 370-386

- [Eas74]
- Kapali P. Eswaran:
Placement of Records in a File and File Allocation in a Computer.
IFIP Congress 1974: 304-307

- [Flo78]
- André Flory, J. Gunther, Jacques Kouloumdjian:
Data Base Reorganization by Clustering Methods.
Inf. Syst. 3(1): 59-62(1978)

- [Gra78]
- Jim Gray:
Notes on Data Base Operating Systems.
Advanced Course: Operating Systems 1978: 393-481

- [Gra87]
- Jim Gray, Gianfranco R. Putzolu:
The 5 Minute Rule for Trading Memory for Disk Accesses and The 10 Byte Rule for Trading Memory for CPU Time.
SIGMOD Conference 1987: 395-398

- [Hwa84]
- ...
- [Jak80]
- Matti Jakobsson:
Reducing block accesses in inverted files by partial clustering.
Inf. Syst. 5(1): 1-5(1980)

- [Kat78]
- ...
- [Laz84]
- ...
- [Liv87]
- Miron Livny, Setrag Khoshafian, Haran Boral:
Multi-Disk Management Algorithms.
SIGMETRICS 1987: 69-77

- [Mah76]
- Samy A. Mahmoud, J. Spruce Riordon:
Optimal Allocation of Resources in Distributed Information Networks.
ACM Trans. Database Syst. 1(1): 66-78(1976)

- [Mar76]
- K. Maruyama, S. E. Smith:
Optimal Reorganization of Distributed Space Disk Files.
Commun. ACM 19(11): 634-642(1976)

- [Muk87]
- Ravi Mukkamala, Steven C. Bruell, Roger K. Shultz:
Design of Partially Replicated Distributed Database Systems: An Integrated Methodology.
SIGMETRICS 1988: 187-196

- [Omi83]
- ...
- [Sam87]
- ...
- [Shn73]
- Ben Shneiderman:
Optimum Data Base Reorganization Points.
Commun. ACM 16(6): 362-365(1973)

- [Soc79]
- Gary H. Sockut, Robert P. Goldberg:
Database Reorganization - Principles and Practice.
ACM Comput. Surv. 11(4): 371-395(1979)

- [Sto86]
- Michael Stonebraker:
The Case for Shared Nothing.
IEEE Database Eng. Bull. 9(1): 4-9(1986)

- [Tan87]
- Tandem Database Group - NonStop SQL: A Distributed, High-Performance, High-Availability Implementation of SQL.
HPTS 1987: 60-104

- [Ter85]
- ...
- [Tue78]
- William G. Tuel Jr.:
Optimum Reorganization Points for Linearly Growing Files.
ACM Trans. Database Syst. 3(1): 32-40(1978)

- [Vrs85]
- ...
- [Yao76]
- S. Bing Yao, K. Sundar Das, Toby J. Teorey:
A Dynamic Database Reorganization Algorithm.
ACM Trans. Database Syst. 1(2): 159-174(1976)

- [Yu85]
- Clement T. Yu, Cheing-Mei Suen, K. Lam, M. K. Siu:
Adaptive Record Clustering.
ACM Trans. Database Syst. 10(2): 180-204(1985)

Copyright © Sun Nov 15 05:11:45 2009
by Michael Ley (ley@uni-trier.de)