ACM SIGMOD Anthology VLDB dblp.uni-trier.de

Efficient Construction of Regression Trees with Range and Region Splitting.

Yasuhiko Morimoto, Hiromu Ishii, Shinichi Morishita: Efficient Construction of Regression Trees with Range and Region Splitting. VLDB 1997: 166-175
@inproceedings{DBLP:conf/vldb/MorimotoIM97,
  author    = {Yasuhiko Morimoto and
               Hiromu Ishii and
               Shinichi Morishita},
  editor    = {Matthias Jarke and
               Michael J. Carey and
               Klaus R. Dittrich and
               Frederick H. Lochovsky and
               Pericles Loucopoulos and
               Manfred A. Jeusfeld},
  title     = {Efficient Construction of Regression Trees with Range and Region
               Splitting},
  booktitle = {VLDB'97, Proceedings of 23rd International Conference on Very
               Large Data Bases, August 25-29, 1997, Athens, Greece},
  publisher = {Morgan Kaufmann},
  year      = {1997},
  isbn      = {1-55860-470-7},
  pages     = {166-175},
  ee        = {db/conf/vldb/MorimotoIM97.html},
  crossref  = {DBLP:conf/vldb/97},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}

Abstract

We propose an efficient way of constructing regression trees in order to predict the objective numeric attribute values of given tuples. A regression tree is a rooted binary tree such that each internal node contains a test, which can be expressed as an RDB query, for splitting tuples into two disjoint classes and passing data in each class down to the left or right subtree. The mean of the objective attribute values at the leaf is used as the predicted value of the tuple.

To test a numeric attribute, traditional approaches use a guillotine-cut splitting that classifies data into those below a given value and others. Instead, we consider a family R of grid-regions in the plane associated with two given numeric attributes. We propose to use a test that splits data into those that lie inside a region R and those that lie outside.

The contributions of this paper are as follows. We present an efficient algorithm for computing R in R that minimizes the mean squared error after the introduction of the test with the region R. Experiments confirmed that the use of region splitting gives a smaller mean squared error of regression trees. Our approach can also generate smaller regression trees.

Copyright © 1997 by the VLDB Endowment. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by the permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, requires a fee and/or special permission from the Endowment.


Online Paper

ACM SIGMOD Anthology

CDROM Version: Load the CDROM "Volume 1 Issue 5, VLDB '89-'97" and ... DVD Version: Load ACM SIGMOD Anthology DVD 1" and ...

Printed Edition

Matthias Jarke, Michael J. Carey, Klaus R. Dittrich, Frederick H. Lochovsky, Pericles Loucopoulos, Manfred A. Jeusfeld (Eds.): VLDB'97, Proceedings of 23rd International Conference on Very Large Data Bases, August 25-29, 1997, Athens, Greece. Morgan Kaufmann 1997, ISBN 1-55860-470-7
Contents CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML

Electronic Edition

From CS Dept., University Trier (Germany)

References

[ACKT96]
Tetsuo Asano, Danny Z. Chen, Naoki Katoh, Takeshi Tokuyama: Polynomial-Time Solutions to Image Segmentation. SODA 1996: 104-113 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[AIS93]
Rakesh Agrawal, Tomasz Imielinski, Arun N. Swami: Mining Association Rules between Sets of Items in Large Databases. SIGMOD Conference 1993: 207-216 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[AS94]
Rakesh Agrawal, Ramakrishnan Srikant: Fast Algorithms for Mining Association Rules in Large Databases. VLDB 1994: 487-499 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[BFOS84]
Leo Breiman, J. H. Friedman, R. A. Olshen, C. J. Stone: Classification and Regression Trees. Wadsworth 1984, ISBN 0-534-98053-8
CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[FMMT96a]
Takeshi Fukuda, Yasuhiko Morimoto, Shinichi Morishita, Takeshi Tokuyama: Mining Optimized Association Rules for Numeric Attributes. PODS 1996: 182-191 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[FMMT96b]
Takeshi Fukuda, Yasuhiko Morimoto, Shinichi Morishita, Takeshi Tokuyama: Data Mining Using Two-Dimensional Optimized Accociation Rules: Scheme, Algorithms, and Visualization. SIGMOD Conference 1996: 13-23 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[FMMT96c]
Takeshi Fukuda, Yasuhiko Morimoto, Shinichi Morishita, Takeshi Tokuyama: Constructing Efficient Decision Trees by Using Optimized Numeric Association Rules. VLDB 1996: 146-155 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[HF95]
Jiawei Han, Yongjian Fu: Discovery of Multiple-Level Association Rules from Large Databases. VLDB 1995: 420-431 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[MAR96]
Manish Mehta, Rakesh Agrawal, Jorma Rissanen: SLIQ: A Fast Scalable Classifier for Data Mining. EDBT 1996: 18-32 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[PCY95]
Jong Soo Park, Ming-Syan Chen, Philip S. Yu: An Effective Hash Based Algorithm for Mining Association Rules. SIGMOD Conference 1995: 175-186 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[PS91]
Gregory Piatetsky-Shapiro, William J. Frawley (Eds.): Knowledge Discovery in Databases. AAAI/MIT Press 1991, ISBN 0-262-62080-4
Contents CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[PSF91]
Gregory Piatetsky-Shapiro: Discovery, Analysis, and Presentation of Strong Rules. Knowledge Discovery in Databases 1991: 229-248 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[Qui86]
J. Ross Quinlan: Induction of Decision Trees. Machine Learning 1(1): 81-106(1986) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[Qui93]
J. Ross Quinlan: C4.5: Programs for Machine Learning. Morgan Kaufmann 1993, ISBN 1-55860-238-0
CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[SA96]
Ramakrishnan Srikant, Rakesh Agrawal: Mining Quantitative Association Rules in Large Relational Tables. SIGMOD Conference 1996: 1-12 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[YFM+97]
Kunikazu Yoda, Takeshi Fukuda, Yasuhiko Morimoto, Shinichi Morishita, Takeshi Tokuyama: Computing Optimized Rectilinear Regions for Association Rules. KDD 1997: 96-103 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML

Copyright © Tue Dec 8 20:20:46 2009 by Michael Ley (ley@uni-trier.de)