BIRCH: An Efficient Data Clustering Method for Very Large Databases.
Tian Zhang, Raghu Ramakrishnan, Miron Livny:
BIRCH: An Efficient Data Clustering Method for Very Large Databases.
SIGMOD Conference 1996: 103-114@inproceedings{DBLP:conf/sigmod/ZhangRL96,
author = {Tian Zhang and
Raghu Ramakrishnan and
Miron Livny},
editor = {H. V. Jagadish and
Inderpal Singh Mumick},
title = {BIRCH: An Efficient Data Clustering Method for Very Large Databases},
booktitle = {Proceedings of the 1996 ACM SIGMOD International Conference on
Management of Data, Montreal, Quebec, Canada, June 4-6, 1996},
publisher = {ACM Press},
year = {1996},
pages = {103-114},
ee = {http://doi.acm.org/10.1145/233269.233324, db/conf/sigmod/ZhangRL96.html},
crossref = {DBLP:conf/sigmod/96},
bibsource = {DBLP, http://dblp.uni-trier.de}
}
BibTeX
Abstract
Finding useful patterns in large datasets has attracted considerable interest
recently, and one of the most widely studied problems in this area is the
identification of clusters, or densely populated regions, in a
multi-dimensional dataset. Prior work does not adequately address the problem
of large datasets and minimization of I/O costs.
This paper presents a data clustering method named BIRCH
(Balanced Iterative Reducing and Clustering using Hierarchies),
and demonstrates that it is especially suitable for very large databases.
BIRCH incrementally and dynamically clusters incoming multi-dimensional
metric data points to try to produce the best quality clustering with the
available resources (i.e., availbale memory and time constraints).
BIRCH can typically find a good clustering with a single scan of the data,
and improve the quality further with a few additional scans. BIRCH is also
the first clustering algorithm proposed in the database area to handle "noise"
(data points that are not part of the underlying pattern) effectively.
We evaluate BIRCH's time/space efficiency, data input
order sensitivity, and clustering quality through several experiments.
We also present a performance comparisons of BIRCH versus
CLARANS, a clustering method proposed recently for large datasets,
and show that BIRCH is consistently superior.
Copyright © 1996 by the ACM,
Inc., used by permission. Permission to make
digital or hard copies is granted provided that
copies are not made or distributed for profit or
direct commercial advantage, and that copies show
this notice on the first page or initial screen of
a display along with the full citation.
Online Version (ACM WWW Account required): Full Text in PDF Format
CDROM Version: Load the CDROM "Volume 1 Issue 1, SIGMOD '93-'97" and ...
DVD Version: Load ACM SIGMOD Anthology DVD 1" and ...
BibTeX
Printed Edition
H. V. Jagadish, Inderpal Singh Mumick (Eds.):
Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, Montreal, Quebec, Canada, June 4-6, 1996.
ACM Press 1996 BibTeX
,
SIGMOD Record 25(2),
June 1996
Contents
[Index Terms]
[Full Text in PDF Format, 1451 KB]
References
- [CKS88]
- ...
- [DH73]
- ...
- [DJ80]
- ...
- [EKX95a]
- Martin Ester, Hans-Peter Kriegel, Xiaowei Xu:
A Database Interface for Clustering in Large Spatial Databases.
KDD 1995: 94-99 BibTeX
- [EKX95b]
- Martin Ester, Hans-Peter Kriegel, Xiaowei Xu:
Knowledge Discovery in Large Spatial Databases: Focusing Techniques for Efficient Class Identification.
SSD 1995: 67-82 BibTeX
- [Fis87]
- Douglas H. Fisher:
Knowledge Acquisition via Incremental Conceptual Clustering.
Machine Learning 2(2): 139-172(1987) BibTeX
- [Fis95]
- ...
- [GG92]
- ...
- [KR90]
- ...
- [Leb87]
- ...
- [Lee81]
- ...
- [Mur83]
- Fionn Murtagh:
A Survey of Recent Advances in Hierarchical Clustering Algorithms.
Comput. J. 26(4): 354-359(1983) BibTeX
- [NH94]
- Raymond T. Ng, Jiawei Han:
Efficient and Effective Clustering Methods for Spatial Data Mining.
VLDB 1994: 144-155 BibTeX
- [Ols93]
- ...
- [ZRL95]
- ...
BibTeX
ACM SIGMOD Anthology - DBLP:
[Home | Search: Author, Title | Conferences | Journals]
ACM SIGMOD Anthology: Copyright © by ACM (info@acm.org), Corrections: anthology@acm.org
DBLP: Copyright © by Michael Ley (ley@uni-trier.de), last change: Fri May 16 16:37:20 2008