ACM SIGMOD Anthology ACM SIGMOD dblp.uni-trier.de

Efficient Parallel and Data Mining for Association Rules.

Jong Soo Park, Ming-Syan Chen, Philip S. Yu: Efficient Parallel and Data Mining for Association Rules. CIKM 1995: 31-36
@inproceedings{DBLP:conf/cikm/ParkCY95,
  author    = {Jong Soo Park and
               Ming-Syan Chen and
               Philip S. Yu},
  title     = {Efficient Parallel and Data Mining for Association Rules},
  booktitle = {CIKM '95, Proceedings of the 1995 International Conference on
               Information and Knowledge Management, November 28 - December
               2, 1995, Baltimore, Maryland, USA},
  publisher = {ACM},
  year      = {1995},
  pages     = {31-36},
  ee        = {db/conf/cikm/ParkCY95.html, http://doi.acm.org/10.1145/221270.221320},
  crossref  = {DBLP:conf/cikm/95},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}

Abstract

In this paper, we develop an algorithm, called PDM, to conduct parallel data mining for association rules. Consider a transaction as a collection of items, and a large itemset is a set of items such that the number of transactions containing it exceeds a pre-specilied threshold. PDM is so designed that the global set of large itemsets can be identified efficiently and the amount of inter-node data exchange required is minimized. SpecificaUy, with a given database partition, each processing node will collect (count) information on each itemset from its local database efficiently via a hashing method. The information discovered by each node is next shared with other nodes via some communication schemes. Then, PDM employs a technique, called clue-and-poll, to address the uncertainty due to the partial knowledge collected at each node by judiciously selecting a small fraction of the itemsets for the exchange of count information among nodes, thus reducing the communication cost. The global set of large iternsets can hence be determined based on the aggregate count of itemsets. It is experimentally shown that PDM not only attains very good parallelization efficiencies, but also provides robust performance for various input patterns.

Copyright © 1995 by the ACM, Inc., used by permission. Permission to make digital or hard copies is granted provided that copies are not made or distributed for profit or direct commercial advantage, and that copies show this notice on the first page or initial screen of a display along with the full citation.


ACM SIGMOD Anthology

CDROM Version: Load the CDROM "Volume 2 Issue 4, CIKM, DOLAP, GIS, SIGFIDET, ..." and ... DVD Version: Load ACM SIGMOD Anthology DVD 1" and ...

Printed Edition

CIKM '95, Proceedings of the 1995 International Conference on Information and Knowledge Management, November 28 - December 2, 1995, Baltimore, Maryland, USA. ACM 1995
Contents CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML

Online Edition

Citation Page

References

[1]
Rakesh Agrawal, Christos Faloutsos, Arun N. Swami: Efficient Similarity Search In Sequence Databases. FODO 1993: 69-84 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[2]
Rakesh Agrawal, Sakti P. Ghosh, Tomasz Imielinski, Balakrishna R. Iyer, Arun N. Swami: An Interval Classifier for Database Mining Applications. VLDB 1992: 560-573 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[3]
Rakesh Agrawal, Tomasz Imielinski, Arun N. Swami: Mining Association Rules between Sets of Items in Large Databases. SIGMOD Conference 1993: 207-216 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[4]
Rakesh Agrawal, Ramakrishnan Srikant: Mining Sequential Patterns. ICDE 1995: 3-14 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[5]
Rakesh Agrawal, Ramakrishnan Srikant: Fast Algorithms for Mining Association Rules in Large Databases. VLDB 1994: 487-499 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[6]
...
[7]
Jiawei Han, Yandong Cai, Nick Cercone: Knowledge Discovery in Databases: An Attribute-Oriented Approach. VLDB 1992: 547-559 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[8]
Raymond T. Ng, Jiawei Han: Efficient and Effective Clustering Methods for Spatial Data Mining. VLDB 1994: 144-155 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[9]
...
[10]
Jong Soo Park, Ming-Syan Chen, Philip S. Yu: An Effective Hash Based Algorithm for Mining Association Rules. SIGMOD Conference 1995: 175-186 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML

Copyright © Wed Dec 23 21:32:16 2009 by Michael Ley (ley@uni-trier.de)