ACM SIGMOD Anthology ACM SIGMOD dblp.uni-trier.de

An Effective Hash Based Algorithm for Mining Association Rules.

Jong Soo Park, Ming-Syan Chen, Philip S. Yu: An Effective Hash Based Algorithm for Mining Association Rules. SIGMOD Conference 1995: 175-186
@inproceedings{DBLP:conf/sigmod/ParkCY95,
  author    = {Jong Soo Park and
               Ming-Syan Chen and
               Philip S. Yu},
  editor    = {Michael J. Carey and
               Donovan A. Schneider},
  title     = {An Effective Hash Based Algorithm for Mining Association Rules},
  booktitle = {Proceedings of the 1995 ACM SIGMOD International Conference on
               Management of Data, San Jose, California, May 22-25, 1995},
  publisher = {ACM Press},
  year      = {1995},
  pages     = {175-186},
  ee        = {http://doi.acm.org/10.1145/223784.223813, db/conf/sigmod/sigmod95-13.html},
  crossref  = {DBLP:conf/sigmod/95},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}

Abstract

In this paper, we examine the issue of mining association rules among items in a large database of sales transactions. The mining of association rules can be mapped into the problem of discovering large itemsets where a large itemset is a group of items which appear in a sufficient number of transactions. The problem of discovering large itemsets can be solved by constructing a candidate set of itemsets first and then, identifying, within this candidate set, those itemsets that meet the large itemset requirement. Generally this is done iteratively for each large k-itemset in increasing order of k where a large k-itemset is a large itemset with k items. To determine large itemsets from a huge number of candidate large itemsets in early iterations is usually the dominating factor for the overall data mining performance. To address this issue, we propose in this paper an effective algorithm for the candidate set generation. It is a hash based algorithm and is especially effective for the generation of candidate set for large 2-itemsets. Explicitly, the number of candidate 2-itemsets generated by the proposed algorithm is, in orders of magnitude, smaller than that by previous methods, thus resolving the performance bottleneck. Note that the generation of smaller candidate sets enables us to effectively trim the transaction database size at a much earlier stage of the iterations, i.e., right after the generation of large 2-itemsets, thereby reducing the computational cost for later iterations significantly. Extensive simulation study is conducted to evaluate performance of the proposed algorithm.

Copyright © 1995 by the ACM, Inc., used by permission. Permission to make digital or hard copies is granted provided that copies are not made or distributed for profit or direct commercial advantage, and that copies show this notice on the first page or initial screen of a display along with the full citation.


ACM SIGMOD Anthology

Online Version (ACM WWW Account required): Full Text in PDF Format

CDROM Version: Load the CDROM "Volume 1 Issue 1, SIGMOD '93-'97" and ...

DVD Version: Load ACM SIGMOD Anthology DVD 1" and ...

Printed Edition

Michael J. Carey, Donovan A. Schneider (Eds.): Proceedings of the 1995 ACM SIGMOD International Conference on Management of Data, San Jose, California, May 22-25, 1995. ACM Press 1995 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML, SIGMOD Record 24(2), June 1995
Contents

Online Edition: ACM Digital Library

[Index Terms]
[Full Text in PDF Format, 1129 KB]

References

[1]
Rakesh Agrawal, Christos Faloutsos, Arun N. Swami: Efficient Similarity Search In Sequence Databases. FODO 1993: 69-84 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[2]
Rakesh Agrawal, Sakti P. Ghosh, Tomasz Imielinski, Balakrishna R. Iyer, Arun N. Swami: An Interval Classifier for Database Mining Applications. VLDB 1992: 560-573 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[3]
Rakesh Agrawal, Tomasz Imielinski, Arun N. Swami: Mining Association Rules between Sets of Items in Large Databases. SIGMOD Conference 1993: 207-216 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[4]
Rakesh Agrawal, Ramakrishnan Srikant: Mining Sequential Patterns. ICDE 1995: 3-14 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[5]
Rakesh Agrawal, Ramakrishnan Srikant: Fast Algorithms for Mining Association Rules in Large Databases. VLDB 1994: 487-499 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[6]
Tarek M. Anwar, Howard W. Beck, Shamkant B. Navathe: Knowledge Mining by Imprecise Querying: A Classification-Based Approach. ICDE 1992: 622-630 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[7]
Jiawei Han, Yandong Cai, Nick Cercone: Knowledge Discovery in Databases: An Attribute-Oriented Approach. VLDB 1992: 547-559 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[8]
...
[9]
...
[10]
Raymond T. Ng, Jiawei Han: Efficient and Effective Clustering Methods for Spatial Data Mining. VLDB 1994: 144-155 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[11]
...
[12]
...

Copyright © Mon Dec 21 00:18:15 2009 by Michael Ley (ley@uni-trier.de)