5. ICDM 2005: Houston, Texas, USA
Proceedings of the 5th IEEE International Conference on Data Mining (ICDM 2005), 27-30 November 2005, Houston, Texas, USA. IEEE Computer Society 2005 ISBN 0-7695-2278-5
Introduction
Welcome Message from the Conference Chairs.
Welcome to ICDM 2005.
Conference Organization.
Steering Committee.
Program Committee.
Non-PC Reviewers.
Invited Talks.
Tutorials.
Workshops.
Panel Session.
Regular Papers
Alan S. Abrahams, Adrian Becker, Daniel Fleder, Ian C. MacMillan: Handling Generalized Cost Functions in the Partitioning Optimization Problem through Sequential Binary Programming. 3-9
Elke Achtert, Christian Böhm, Hans-Peter Kriegel, Peer Kröger: Online Hierarchical Clustering in a Data Warehouse Environment. 10-17
Manu Aery, Sharma Chakravarthy: eMailSift: Email Classification Based on Structure and Content. 18-25
Deepak K. Agarwal: An Empirical Bayes Approach to Detect Anomalies in Dynamic Multidimensional Arrays. 26-33
Costin Barbu, Raja Tanveer Iqbal, Jing Peng: Classifier Fusion Using Shared Sampling Distribution for Boosting. 34-41
Steven M. Beitzel, Eric C. Jensen, Ophir Frieder, David D. Lewis, Abdur Chowdhury, Aleksander Kolcz: Improving Automatic Query Classification via Semi-Supervised Learning. 42-49
Arnab Bhattacharya, Vebjorn Ljosa, Jia-Yu Pan, Mark R. Verardo, Hyung-Jeong Yang, Christos Faloutsos, Ambuj K. Singh: ViVo: Visual Vocabulary Construction for Mining Biomedical Images. 50-57
Mikhail Bilenko, Sugato Basu, Mehran Sahami: Adaptive Product Normalization: Using Online Learning for Record Linkage in Comparison Shopping. 58-65
Julien Blanchard, Fabrice Guillet, Régis Gras, Henri Briand: Using Information-Theoretic Measures to Assess Association Rule Interestingness. 66-73
Huiping Cao, Nikos Mamoulis, David W. Cheung: Mining Frequent Spatio-Temporal Sequential Patterns. 82-89
Varun Chandola, Vipin Kumar: Summarization - Compressing Data into an Informative Representation. 98-105
Hung-Leng Chen, Kun-Ta Chuang, Ming-Syan Chen: Labeling Unclustered Categorical Data into Clusters Based on the Important Attribute Values. 106-113
Jason R. Chen: Making Subsequence Time Series Clustering Meaningful. 114-121
Anne Denton: Kernel-Density-Based Clustering of Time Series Subsequences Using a Continuous Random-Walk Noise Model. 122-129
Mohamed G. Elfeky, Walid G. Aref, Ahmed K. Elmagarmid: WARP: Time Warping for Periodicity Detection. 138-145
Mohammad El-Hajj, Osmar R. Zaïane, Paul Nalos: Bifold Constraint-Based Mining by Simultaneous Monotone and Anti-Monotone Checking. 146-153
Wei Fan, Ed Greengrass, Joe McCloskey, Philip S. Yu, Kevin Drummey: Effective Estimation of Posterior Probabilities: Explaining the Accuracy of Randomized Decision Tree Approaches. 154-161
Frédéric Flouvat, Fabien De Marchi, Jean-Marc Petit: A Thorough Experimental Study of Datasets for Frequent Itemsets. 162-169
Shohei Hido, Hiroyuki Kawano: AMIOT: Induced Ordered Tree Mining in Tree-Structured Databases. 170-177
Yi Huang, Kai Yu, Matthias Schubert, Shipeng Yu, Volker Tresp, Hans-Peter Kriegel: Hierarchy-Regularized Latent Semantic Indexing. 178-185
Koji Iwanuma, Ryuichi Ishihara, Yo Takano, Hidetomo Nabeshima: Extracting Frequent Subsequences from a Single Long Data Sequence: A Novel Anti-Monotonic Measure and a Simple On-Line Algorithm. 186-193
Xiaonan Ji, James Bailey, Guozhu Dong: Mining Minimal Distinguishing Subsequence Patterns with Gap Constraints. 194-201
Ruoming Jin, Gagan Agrawal: An Algorithm for In-Core Frequent Itemset Mining on Streaming Data. 210-217
Alexandros Kalousis, Julien Prados, Melanie Hilario: Stability of Feature Selection Algorithms. 218-225
Eamonn J. Keogh, Jessica Lin, Ada Wai-Chee Fu: HOT SAX: Efficiently Finding the Most Unusual Time Series Subsequence. 226-233
Tamara G. Kolda, Brett W. Bader, Joseph P. Kenny: Higher-Order Web Link Analysis Using Multilinear Algebra. 242-249
Hans-Peter Kriegel, Peer Kröger, Matthias Renz, Sebastian H. R. Wurst: A Generic Framework for Efficient Subspace Clustering of High-Dimensional Data. 250-257
Hans-Peter Kriegel, Peer Kröger, Alexey Pryakhin, Matthias Schubert: Effective and Efficient Distributed Model-Based Clustering. 258-265
Daesu Lee, Wonsuk Lee: Finding Maximal Frequent Itemsets over Online Data Streams Adaptively. 266-273
Carson Kai-Sang Leung, Quamrul I. Khan, Tariqul Hoque: CanTree: A Tree Structure for Efficient Incremental Mining of Frequent Patterns. 274-281
Bo Long, Zhongfei (Mark) Zhang, Philip S. Yu: Combining Multiple Clusterings by Soft Correspondence. 282-289
Anna M. Manning, David J. Haglin: A New Algorithm for Finding Minimal Sample Uniques for Use in Statistical Disclosure Assessment. 290-297
Keith Marsolo, Srinivasan Parthasarathy: Alternate Representation of Distance Matrices for Characterization of Protein Structure. 298-305
Shawn Martin: Training Support Vector Machines Using Gilbert's Algorithm. 306-313
Steven Minton, Claude Nanjo, Craig A. Knoblock, Martin Michalowski, Matthew Michelson: A Heterogeneous Field Matching Method for Record Linkage. 314-321
Jennifer Neville, David Jensen: Leveraging Relational Autocorrelation with Latent Group Models. 322-329
Thomas Takeo Osugi, Kun Deng, Stephen D. Scott: Balancing Exploration and Exploitation: A New Algorithm for Active Machine Learning. 330-337
Feng Pan, Wei Wang, Anthony K. H. Tung, Jiong Yang: Finding Representative Set from Massive Data. 338-345
Spiros Papadimitriou, Aristides Gionis, Panayiotis Tsaparas, Risto A. Väisänen, Heikki Mannila, Christos Faloutsos: Parameter-Free Spatial Data Mining Using MDL. 346-353
Panagiotis Papapetrou, George Kollios, Stan Sclaroff, Dimitrios Gunopulos: Discovering Frequent Arrangements of Temporal Intervals. 354-361
Marcelino Pereira dos Santos Silva, Gilberto Câmara, Ricardo Cartaxo Modesto de Souza, Dalton M. Valeriano, Maria Isabel Sobral Escada: Mining Patterns of Change in Remote Sensing Image Databases. 362-369
Saharon Rosset, Claudia Perlich, Bianca Zadrozny: Ranking-Based Evaluation of Regression Models. 370-377
Lars Schmidt-Thieme: Compound Classification Models for Recommender Systems. 378-385
Ted E. Senator: Multi-Stage Classification. 386-393
Wing-Ho Shum, Kwong-Sak Leung, Man Leung Wong: Learning Functional Dependency Networks Based on Genetic Programming. 394-401
Jonathan Stoeckel, Glenn Fung: SVM Feature Selection for Classification of SPECT Images of Alzheimer's Disease Using Spatial Information. 410-417
Jimeng Sun, Huiming Qu, Deepayan Chakrabarti, Christos Faloutsos: Neighborhood Formation and Anomaly Detection in Bipartite Graphs. 418-425
Zoltán Szamonek, Csaba Szepesvári: X-mHMM: An Efficient Algorithm for Training Mixtures of HMMs When the Number of Mixtures Is Unknown. 434-441
Raz Tamir: A Random Walk through Human Associations. 442-449
Dacheng Tao, Xuelong Li, Weiming Hu, Stephen J. Maybank, Xindong Wu: Supervised Tensor Learning. 450-457
Gang Wang, Hui Zhang, Zhihua Zhang, Frederick H. Lochovsky: A Bernoulli Relational Model for Nonlinear Embedding. 458-465
Ke Wang, Benjamin C. M. Fung, Philip S. Yu: Template-Based Privacy Preservation in Classification Problems. 466-473
Peng Wang, Haixun Wang, Xiaochen Wu, Wei Wang, Baile Shi: On Reducing Classifier Granularity in Mining Concept-Drifting Data Streams. 474-481
Yongge Wang, Xintao Wu: Approximate Inverse Frequent Itemset Mining: Privacy, Complexity, and Approximation. 482-489
Li Wei, Eamonn J. Keogh, Helga Van Herle, Agenor Mafra-Neto: Atomic Wedgie: Efficient Query Filtering for Streaming Times Series. 490-497
Oksana Yakhnenko, Adrian Silvescu, Vasant Honavar: Discriminatively Trained Markov Model for Sequence Classification. 498-505
Jie Yin, Qiang Yang: Integrating Hidden Markov Models and Spectral Analysis for Sensory Time Series Clustering. 506-513
Yi Zhang, W. Nick Street, Samuel Burer: Sharing Classifiers among Ensembles from Related Problem Domains. 522-529
Kaidi Zhao, Bing Liu, Thomas M. Tirpak, Weimin Xiao: A Visual Data Mining Framework for Convenient Identification of Useful Knowledge. 530-537
Dong Zhuang, Benyu Zhang, Qiang Yang, Jun Yan, Zheng Chen, Ying Chen: Efficient Text Classification by Weighted Proximal SVM. 538-545
Short Papers
Hidenao Abe, Shusaku Tsumoto, Miho Ohsaki, Takahira Yamaguchi: A Rule Evaluation Support Method with Learning Models Based on Objective Rule Evaluation Indexes. 549-552
Foto N. Afrati, Gautam Das, Aristides Gionis, Heikki Mannila, Taneli Mielikäinen, Panayiotis Tsaparas: Mining Chains of Relations. 553-556
Fabio Aiolli: A Preference Model for Structured Supervised Learning Tasks. 557-560
Maurizio Atzori, Francesco Bonchi, Fosca Giannotti, Dino Pedreschi: Blocking Anonymity Threats Raised by Frequent Itemset Mining. 561-564
Abraham Bagherjeiran, Christoph F. Eick, Chun-Sheng Chen, Ricardo Vilalta: Adaptive Clustering: Obtaining Better Clusters Using Feedback and Past Experience. 565-568
Jinbo Bi, Glenn Fung, Murat Dundar, R. Bharat Rao: Semi-Supervised Mixture of Kernels via LPBoost Methods. 569-572
Robin D. Burke, Bamshad Mobasher, Runa Bhaumik, Chad Williams: Segment-Based Injection Attacks against Collaborative Filtering Recommender Systems. 577-580
Richard Butterworth, Gregory Piatetsky-Shapiro, Dan A. Simovici: On Feature Selection through Clustering. 581-584

Yixin Chen, Henry L. Bart Jr., Shuqing Huang, Huimin Chen: A Computational Framework for Taxonomic Research: Diagnosing Body Shape within Fish Species Complexes. 593-596
Federico Di Palma, Giuseppe De Nicolao, Guido Miraglia, Oliver M. Donzelli: Process Diagnosis via Electrical-Wafer-Sorting Maps Classification. 601-604
Wei Fan, Ian Davidson, Bianca Zadrozny, Philip S. Yu: An Improved Categorization of Classifier's Sensitivity on Sample Selection Bias. 605-608
Johannes Fischer, Volker Heun, Stefan Kramer: Fast Frequent String Mining Using Suffix Arrays. 609-612
Ada Wai-Chee Fu, Raymond Chi-Wing Wong, Ke Wang: Privacy-Preserving Frequent Pattern Mining across Private Databases. 613-616
Jie Gao, Jörg Denzinger, Robert C. James: CoLe: A Cooperative Data Mining Approach and Its Application to Early Diabetes Detection. 617-620
Like Gao, Xiaoyang Sean Wang: Feature Selection for Building Cost-Effective Data Stream Classifiers. 621-624
Thomas George, Srujana Merugu: A Scalable Collaborative Filtering Framework Based on Co-Clustering. 625-628
Shantanu Godbole, Ganesh Ramakrishnan, Sunita Sarawagi: Text Classification with Evolving Label-Sets. 629-632
Ningthoujam Gourakishwar Singh, Sanasam Ranbir Singh, Anjana K. Mahanta: CloseMiner: Discovering Frequent Closed Itemsets Using Frequent Closed Tidsets. 633-636
Maria Halkidi, Dimitrios Gunopulos, Nitin Kumar, Michalis Vazirgiannis, Carlotta Domeniconi: A Framework for Semi-Supervised Learning Based on Subjective and Objective Clustering Criteria. 637-640
Ayça Azgin Hintoglu, Ali Inan, Yücel Saygin, Mehmet Keskinöz: Suppressing Data Sets to Prevent Discovery of Association Rules. 645-648

Tsuyoshi Idé: Pairwise Symmetry Decomposition Method for Generalized Covariance Analysis. 657-660
Vandana Pursnani Janeja, Vijayalakshmi Atluri: FS3: A Random Walk Based Free-Form Spatial Scan Statistic for Anomalous Window Detection. 661-664
Yuelong Jiang, Ke Wang, Alexander Tuzhilin, Ada Wai-Chee Fu: Mining Patterns That Respond to Actions. 669-672
Toshihiro Kamishima, Hideto Kazawa, Shotaro Akaho: Supervised Ordering - An Empirical Survey. 673-676
Ning Kang, Carlotta Domeniconi, Daniel Barbará: Categorization and Keyword Identification of Unlabeled Documents. 677-680
Paul Komarek, Andrew W. Moore: Making Logistic Regression a Core Data Mining Tool with TR-IRLS. 685-688
Hans-Peter Kriegel, Martin Pfeifle: Hierarchical Density-Based Clustering of Uncertain Data. 689-692
Nimit Kumar, Krishna Kummamuru, Deepa Paranjpe: Semi-Supervised Clustering with Metric Learning Using Relative Comparisons. 693-696
Krishna Kummamuru, Raghu Krishnapuram, Rakesh Agrawal: On Learning Asymmetric Dissimilarity Measures. 697-700
Longin Jan Latecki, Vasileios Megalooikonomou, Qiang Wang, Rolf Lakämper, Chotirat (Ann) Ratanamahatana, Eamonn J. Keogh: Partial Elastic Matching of Time Series. 701-704
In-Yee Lee, Jan-Ming Ho, Ming-Syan Chen: CLUGO: A Clustering Algorithm for Automated Functional Annotations Based on Gene Ontology. 705-708
Daniel Lemire, Martin Brooks, Yuhong Yan: An Optimal Linear Time Algorithm for Quasi-Monotonic Segmentation. 709-712
Loïck Lhote, François Rioult, Arnaud Soulet: Average Number of Frequent (Closed) Patterns in Bernouilli and Markovian Databases. 713-716
Charles X. Ling, Shengli Sheng, Tilmann F. W. Bruckhaus, Nazim H. Madhavji: Predicting Software Escalations with Maximum ROI. 717-720
Jinze Liu, Susan Paulsen, Wei Wang, Andrew B. Nobel, Jan Prins: Mining Approximate Frequent Itemsets from Noisy Data. 721-724
Ning Liu, Benyu Zhang, Jun Yan, Zheng Chen, Wenyin Liu, Fengshan Bai, Leefeng Chien: Text Representation: From Vector to Tensor. 725-728
Elio Lozano, Edgar Acuña: Parallel Algorithms for Distance-Based and Density-Based Outliers. 729-732
Tong Luo, Lawrence O. Hall, Dmitry B. Goldgof, Andrew Remsen: Bit Reduction Support Vector Machine. 733-736
Sandeep Mane, Carson Murray, Shashi Shekhar, Jaideep Srivastava, Anne Pusey: Spatial Clustering of Chimpanzee Locations for Neighborhood Identification. 737-740
Prem Melville, Foster J. Provost, Raymond J. Mooney: An Expected Utility Approach to Active Feature-Value Acquisition. 745-748
Dheerendranath Mundluru, Jayasimha Reddy Katukuri, Saygin Celebi: Automatically Mining Result Records from Search Engine Response Pages. 749-752
Jian Pei, Jian Liu, Haixun Wang, Ke Wang, Philip S. Yu, Jianyong Wang: Efficiently Mining Frequent Closed Partial Orders. 753-756
Kunal Punera, Joydeep Ghosh: CLUMP: A Scalable and Robust Framework for Structure Discovery. 757-760
Martin Scholz: On the Tractability of Rule Discovery from Distributed Data. 761-764
Jiazheng Shi, Ashok Samal, David Marx: Face Recognition Using Landmark-Based Bidimensional Regression. 765-768
Arno Siebes, Muhammad Subianto, A. J. Feelders: Instability of Classifiers on Categorical Data. 769-772
Lisa Singh, Lise Getoor, Louis Licamele: Pruning Social Networks Using Structural Properties and Descriptive Attributes. 773-776
Arnaud Soulet, Bruno Crémilleux: Optimizing Constraint-Based Mining by Automatically Relaxing Constraints. 777-780
Alexandre Termier, Marie-Christine Rousset, Michèle Sebag, Kouzou Ohara, Takashi Washio, Hiroshi Motoda: Efficient Mining of High Branching Factor Attribute Trees. 785-788
Chi-Ho Tsang, Sam Kwong, Hanli Wang: Anomaly Intrusion Detection Using Multi-Objective Genetic Fuzzy System and Agent-Based Evolutionary Computation Framework. 789-792
Takashi Washio, Yuki Mitsunaga, Hiroshi Motoda: Mining Quantitative Frequent Itemsets Using Adaptive Density-Based Subspace Clustering. 793-796
Wensheng Wu, AnHai Doan, Clement T. Yu: Merging Interface Schemas on the Deep Web via Clustering Aggregation. 801-804
Kiyoung Yang, Cyrus Shahabi: On the Stationarity of Multivariate Time Series for Correlation-Based Data Analysis. 805-808
Sandeep Yaramakala, Dimitris Margaritis: Speculative Markov Blanket Discovery for Optimal Feature Selection. 809-812
Jin Soung Yoo, Shashi Shekhar, Mete Celik: A Join-Less Approach for Co-Location Pattern Mining: A Summary of Results. 813-816
Kun Zhang, Zujia Xu, Jing Peng, Bill P. Buckles: Learning through Changes: An Empirical Study of Dynamic Behaviors of Probability Estimation Trees. 817-820
Xiaofeng Zhang, William K. Cheung: Visualizing Global Manifold Based on Distributed Local Data Abstractions. 821-824
Cui Zhu, Hiroyuki Kitagawa, Christos Faloutsos: Example-Based Robust Outlier Detection in High Dimensional Datasets. 829-832



