dblp.uni-trier.de www.dagstuhl.de www.uni-trier.de

Clustering Categorical Data: An Approach Based on Dynamical Systems.

David Gibson, Jon M. Kleinberg, Prabhakar Raghavan: Clustering Categorical Data: An Approach Based on Dynamical Systems. VLDB 1998: 311-322
@inproceedings{DBLP:conf/vldb/GibsonKR98,
  author    = {David Gibson and
               Jon M. Kleinberg and
               Prabhakar Raghavan},
  editor    = {Ashish Gupta and
               Oded Shmueli and
               Jennifer Widom},
  title     = {Clustering Categorical Data: An Approach Based on Dynamical Systems},
  booktitle = {VLDB'98, Proceedings of 24rd International Conference on Very
               Large Data Bases, August 24-27, 1998, New York City, New York,
               USA},
  publisher = {Morgan Kaufmann},
  year      = {1998},
  isbn      = {1-55860-566-5},
  pages     = {311-322},
  ee        = {http://www.vldb.org/conf/1998/p311.pdf},
  crossref  = {DBLP:conf/vldb/98},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}

Abstract

We describe a novel approach for clustering collections of sets, and its application to the analysis and mining of categorical data. By "categorical data," we mean tables with fields that cannot be naturallyordered by a metric - e.g., the names of producers of automobiles, or the names of products offered by a manufacturer. Our approach is based on an iterative method for assigning and propagatingweights on the categorical values in a table; this facilitates a type of similarity measure arising from the co-occurrence of values in the dataset. Our techniques can be studied analytically in terms of certain types of non-linear dynamical systems. We discuss experiments on a variety of tables of synthetic and real data; we find that our iterative methods converge quickly to prominently correlated values of various categorical fields.

Copyright © 1998 by the VLDB Endowment. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by the permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, requires a fee and/or special permission from the Endowment.


Printed Edition

Ashish Gupta, Oded Shmueli, Jennifer Widom (Eds.): VLDB'98, Proceedings of 24rd International Conference on Very Large Data Bases, August 24-27, 1998, New York City, New York, USA. Morgan Kaufmann 1998, ISBN 1-55860-566-5
Contents CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML

References

[1]
Rakesh Agrawal, Heikki Mannila, Ramakrishnan Srikant, Hannu Toivonen, A. Inkeri Verkamo: Fast Discovery of Association Rules. Advances in Knowledge Discovery and Data Mining 1996: 307-328 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[2]
Rakesh Agrawal, Tomasz Imielinski, Arun N. Swami: Mining Association Rules between Sets of Items in Large Databases. SIGMOD Conference 1993: 207-216 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[3]
...
[4]
...
[5]
...
[6]
Avrim Blum, Joel Spencer: Coloring Random and Semi-Random k-Colorable Graphs. J. Algorithms 19(2): 204-234(1995) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[7]
Ravi B. Boppana: Eigenvalues and Graph Bisection: An Average-Case Analysis (Extended Abstract). FOCS 1987: 280-285 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[8]
Sergey Brin, Rajeev Motwani, Jeffrey D. Ullman, Shalom Tsur: Dynamic Itemset Counting and Implication Rules for Market Basket Data. SIGMOD Conference 1997: 255-264 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[9]
...
[10]
...
[11]
Tzi-cker Chiueh: Content-Based Image Indexing. VLDB 1994: 582-593 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[12]
...
[13]
Gautam Das, Heikki Mannila, Pirjo Ronkainen: Similarity of Attributes by External Probes. KDD 1998: 23-29 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[14]
Scott C. Deerwester, Susan T. Dumais, Thomas K. Landauer, George W. Furnas, Richard A. Harshman: Indexing by Latent Semantic Analysis. JASIS 41(6): 391-407(1990) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[15]
...
[16]
...
[17]
...
[18]
...
[19]
...
[20]
Myron Flickner, Harpreet S. Sawhney, Jonathan Ashley, Qian Huang, Byron Dom, Monika Gorkani, Jim Hafner, Denis Lee, Dragutin Petkovic, David Steele, Peter Yanker: Query by Image and Video Content: The QBIC System. IEEE Computer 28(9): 23-32(1995) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[21]
M. R. Garey, David S. Johnson: Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman 1979, ISBN 0-7167-1044-7
CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[22]
...
[23]
Eui-Hong Han, George Karypis, Vipin Kumar, Bamshad Mobasher: Clustering Based On Association Rule Hypergraphs. DMKD 1997: 0- CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[24]
...
[25]
Zhexue Huang: A Fast Clustering Algorithm to Cluster Very Large Categorical Data Sets in Data Mining. DMKD 1997: 0- CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[26]
...
[27]
...
[28]
...
[29]
...
[30]
...
[31]
Heikki Mannila, Hannu Toivonen, A. Inkeri Verkamo: Discovering Frequent Episodes in Sequences. KDD 1995: 210-215 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[32]
...
[33]
...
[34]
...
[35]
...
[36]
...
[37]
Hannu Toivonen: Sampling Large Databases for Association Rules. VLDB 1996: 134-145 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[38]
...
[39]
Tian Zhang, Raghu Ramakrishnan, Miron Livny: BIRCH: An Efficient Data Clustering Method for Very Large Databases. SIGMOD Conference 1996: 103-114 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML

Last update Mon Sep 17 22:01:03 2012 CET by the DBLP TeamThis material is Open Data Data released under the ODC-BY 1.0 license — See also our legal information page