dblp.uni-trier.de www.dagstuhl.de www.uni-trier.de

BIRCH: An Efficient Data Clustering Method for Very Large Databases.

Tian Zhang, Raghu Ramakrishnan, Miron Livny: BIRCH: An Efficient Data Clustering Method for Very Large Databases. SIGMOD Conference 1996: 103-114
@inproceedings{DBLP:conf/sigmod/ZhangRL96,
  author    = {Tian Zhang and
               Raghu Ramakrishnan and
               Miron Livny},
  editor    = {H. V. Jagadish and
               Inderpal Singh Mumick},
  title     = {BIRCH: An Efficient Data Clustering Method for Very Large Databases},
  booktitle = {Proceedings of the 1996 ACM SIGMOD International Conference on
               Management of Data, Montreal, Quebec, Canada, June 4-6, 1996},
  publisher = {ACM Press},
  year      = {1996},
  pages     = {103-114},
  ee        = {http://doi.acm.org/10.1145/233269.233324},
  crossref  = {DBLP:conf/sigmod/96},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}

Abstract

Finding useful patterns in large datasets has attracted considerable interest recently, and one of the most widely studied problems in this area is the identification of clusters, or densely populated regions, in a multi-dimensional dataset. Prior work does not adequately address the problem of large datasets and minimization of I/O costs.
This paper presents a data clustering method named BIRCH (Balanced Iterative Reducing and Clustering using Hierarchies), and demonstrates that it is especially suitable for very large databases. BIRCH incrementally and dynamically clusters incoming multi-dimensional metric data points to try to produce the best quality clustering with the available resources (i.e., availbale memory and time constraints). BIRCH can typically find a good clustering with a single scan of the data, and improve the quality further with a few additional scans. BIRCH is also the first clustering algorithm proposed in the database area to handle "noise" (data points that are not part of the underlying pattern) effectively.
We evaluate BIRCH's time/space efficiency, data input order sensitivity, and clustering quality through several experiments. We also present a performance comparisons of BIRCH versus CLARANS, a clustering method proposed recently for large datasets, and show that BIRCH is consistently superior.

Copyright © 1996 by the ACM, Inc., used by permission. Permission to make digital or hard copies is granted provided that copies are not made or distributed for profit or direct commercial advantage, and that copies show this notice on the first page or initial screen of a display along with the full citation.


ACM SIGMOD Anthology

Online Version (ACM WWW Account required): Full Text in PDF Format

CDROM Version: Load the CDROM "Volume 1 Issue 1, SIGMOD '93-'97" and ...

DVD Version: Load ACM SIGMOD Anthology DVD 1" and ...

Printed Edition

H. V. Jagadish, Inderpal Singh Mumick (Eds.): Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, Montreal, Quebec, Canada, June 4-6, 1996. ACM Press 1996 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML, SIGMOD Record 25(2), June 1996
Contents

Online Edition: ACM Digital Library

[Index Terms]
[Full Text in PDF Format, 1451 KB]

References

[CKS88]
...
[DH73]
...
[DJ80]
...
[EKX95a]
Martin Ester, Hans-Peter Kriegel, Xiaowei Xu: A Database Interface for Clustering in Large Spatial Databases. KDD 1995: 94-99 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[EKX95b]
Martin Ester, Hans-Peter Kriegel, Xiaowei Xu: Knowledge Discovery in Large Spatial Databases: Focusing Techniques for Efficient Class Identification. SSD 1995: 67-82 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[Fis87]
Douglas H. Fisher: Knowledge Acquisition via Incremental Conceptual Clustering. Machine Learning 2(2): 139-172(1987) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[Fis95]
...
[GG92]
...
[KR90]
...
[Leb87]
...
[Lee81]
...
[Mur83]
Fionn Murtagh: A Survey of Recent Advances in Hierarchical Clustering Algorithms. Comput. J. 26(4): 354-359(1983) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[NH94]
Raymond T. Ng, Jiawei Han: Efficient and Effective Clustering Methods for Spatial Data Mining. VLDB 1994: 144-155 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[Ols93]
...
[ZRL95]
...

Last update Tue Sep 18 00:25:13 2012 CET by the DBLP TeamThis material is Open Data Data released under the ODC-BY 1.0 license — See also our legal information page