10. IPPS 1996: Honolulu, Hawaii, USA
Proceedings of IPPS '96, The 10th International Parallel Processing Symposium, April 15-19, 1996, Honolulu, Hawaii, USA. IEEE Computer Society 1996 ISBN 0-8186-7255-2
Keynote Address
Charles E. Leiserson: Can Multithreaded Programming Save Massively Parallel Computing? 2-3
Session 1 - Compiler Optimization

Martin C. Rinard, Pedro C. Diniz: Commutativity Analysis: A Technique for Automatically Parallelizing Pointer-Based Computations. 14-22
Shaw-Yen Tseng, Chung-Ta King, Chuan Yi Tang: Profiling Dependence Vectors for Loop Parallelization. 23-27
David J. Kolson, Alexandru Nicolau, Nikil D. Dutt, Ken Kennedy: A Method for Register Allocation to Loops in Multiple Register File Architectures. 28-33
Jingling Xue: Affine-by-Statement Transformations of Imperfectly Nested Loops. 34-38
Rafael H. Saavedra-Barrera, Weihua Mao, Daeyeon Park, Jacqueline Chame, Sungdo Moon: The Combined Effectiveness of Unimodular Transformations, Tiling, and Software Prefetching. 39-45
Session 2 - Scientific/Engineering Applications
Ka-Cheong Leung, Ishfaq Ahmad, Hsiao-Ming Hsu: Ocean Circulation on the Intel Paragon: Modeling and Implementation. 47-54
Wei-keng Liao, Chao-Wei Ou, Sanjay Ranka: Dynamic Alignment and Distribution of Irregularly Coupled Data Arrays for Scalable Parallelization of Particle-in-Cell Problems. 57-61
Hiroaki Kobayashi, Hitoshi Yamauchi, Yuichiro Toh, Tadao Nakamura: A Hierarchical Parallel Processing System for the Multipass-Rendering Method. 62-67
Steve G. Steinberg, Jun Yang, Katherine A. Yelick: Performance Modeling and Composition: A Case Study in Cell Simulation. 68-74
Session 3 - Distributed Memory Systems
Hideki Murayama, Satoshi Yoshizawa, Takeshi Aimoto, Hidenori Inouchi, Shooichi Murase, Takehisa Hayashi, Hiroshi Iwamoto: A Study of High-Performance Communication Mechanism for Multicomputer Systems. 76-83
Timothy G. Mattson, David Scott, Stephen R. Wheat: A TeraFLOP Supercomputer in 1996: The ASCI TFLOP System. 84-93
Daniel J. Scales, Michael Burrows, Chandramohan A. Thekkath: Experience with Parallel Computing on the AN2 Network. 94-103
Thomas L. Sterling, Donald J. Becker, Chance Reschke, Daniel Savarese, Michael R. Berry: Achieving a Balanced Low-Cost Architecture for Mass Storage Management through Multiple Fast Ethernet Channels on the Beowulf Parallel Workstation. 104-108
Klaus E. Schauser, Chris J. Scheiman, J. Mitchell Ferguson, Paul Z. Kolano: Exploiting the Capabilities of Communications Co-Processors. 109-115
Andrew Sohn, Mitsuhisa Sato, Namhoon Yoo, Jean-Luc Gaudiot: Effects of Multithreading on Data and Workload Distribution for Distributed-Memory Multiprocessors. 116-122
Session 4 - Shared Memory Systems

Robert D. Blumofe, Matteo Frigo, Christopher F. Joerg, Charles E. Leiserson, Keith H. Randall: Dag-Consistent Distributed Shared Memory. 132-141
Ricardo Bianchini, Thomas J. LeBlanc, Jack E. Veenstra: Categorizing Network Traffic in Update-Based Protocols on Scalable Multiprocessors. 142-151
Henk L. Muller, Paul W. A. Stallard, David H. D. Warren: Implementing the Data Diffusion Machine Using Crossbar Routers. 152-158
Sally A. McKee, William A. Wulf: A Memory Controller for Improved Performance of Streamed Computations on Symmetric Multiprocessors. 159-165
Stefanos Kaxiras: Kiloprocessor Extensions to SCI. 166-172
Session 5 - Algorithms
Miroslaw Kutylowski, Tomasz Wierzbicki: Approximate Compaction and Padded-Sorting on Exclusive Write PRAMs. 174-181
Maria Cristina Pinotti, Vincenzo A. Crupi, Sajal K. Das: A Parallel Solution to the Extended Set Union Problem with Unlimited Backtracking. 182-186
Xiaotie Deng, Binhai Zhu: A Randomized Algorithm for Voronoi Diagram of Line Segments on Coarse-Grained Multiprocessors. 192-198
Shuvra S. Bhattacharyya, Sundararajan Sriram, Edward A. Lee: Self-Timed Resynchronization: A Post-Optimization for Static Multiprocessor Schedules. 199-205
Session 6 - Programming Languages
Laxmikant V. Kalé, Milind A. Bhandarkar, Narain Jagathesan, Sanjeev Krishnan, Josh Yelon: Converse: An Interoperable Framework for Parallel Programming. 212-217
Jose Nagib Cotrim Árabe, Adam Beguelin, Bruce Lowekamp, Erik Seligman, Mike Starkey, Peter Stephan: Dome: Parallel Programming in a Distributed Computing Environment. 218-224
Yair I. Friedman, Dror G. Feitelson, Iaakov Exman: The Parallel Break Construct, or How to Kill an Activity Tree. 230-234
Xingbin Zhang, Vijay Karamcheti, Tony Ng, Andrew A. Chien: Optimizing COOP Languages: Study of a Protein Dynamics Program. 235-240
Raju Pandey, James C. Browne: Support for Extensibility and Reusability in a Concurrent Object-Oriented Programming Language. 241-247
Session 7 - Communication I
Gheith A. Abandah, Edward S. Davidson: Modeling the Communication Performance of the IBM SP2. 249-257
Yucel Aydogan, Craig B. Stunkel, Cevdet Aykanat, Bülent Abali: Adaptive Source Routing in Multistage Interconnection Networks. 258-267
Sherry Moore, Lionel M. Ni: The Effects of Network Contention on Processor Allocation Strategies. 268-273
Robert W. Horst: ServerNet Deadlock Avoidance and Fractahedral Topologies. 274-280
Sajal K. Das, Sanjoy K. Sen: Analysis of Memory Interference in Buffered Multiprocessor Systems in Presence of Hot Spots and Favorite Memories. 281-285
Debashis Basak, Dhabaleswar K. Panda, Mohammad Banikazemi: Benefits of Processor Clustering in Designing Large Parallel Systems: When and How? 286-290
Session 8 - Implementation of Primitive Operations
David A. Bader, Joseph JáJá: Practical Parallel Algorithms for Dynamic Data Redistribution, Median Finding, and Selection. 292-301
Sun Chung, Anne Condon: Parallel Implementation of Borvka's Minimum Spanning Tree Algorithm. 302-308
Ibraheem Al-Furaih, Srinivas Aluru, Sanjay Goil, Sanjay Ranka: Practical Algorithms for Selection on Coarse-Grained Parallel Computers. 309-313
Seungjo Bae, Sanjay Ranka: PACK/UNPACK on Coarse-Grained Distributed Memory Parallel Machines. 320-324
Session 9 - Resource Allocation and Management

Yiqun Ge, David Y. Y. Yun: Simultaneous Compression of Makespan and Number of Processors Using CRP. 332-338
Bodhisattwa Mukherjee, Karsten Schwan: Implementation of Scalable Blocking Locks Using an Adaptive Thread Scheduler. 339-343
Samuel H. Russ, Brian K. Flachs, Jonathan Robinson, Bjørn Heckel: Hector: Automated Task Allocation for MPI. 344-348
Dwip Banerjee, James C. Browne: Complete Parallelization of Computations: Integration of Data Partitioning and Functional Parallelism for Dynamic Data Structures. 354-360
Keynote Address
Charles L. Seitz: MPPs versus Clusters. 362
Session 10 - Communication II
Tsunehiko Kamachi, Kazuhiro Kusano, Kenji Suehiro, Yoshiki Seo: Generating Realignment-Based Communication for HPF Programs. 364-371
Cezary Dubnicki, Liviu Iftode, Edward W. Felten, Kai Li: Software Support for Virtual Memory-Mapped Communication. 372-281
Jan Jonsson, Jonas Vasell: A Comparative Study of Methods for Time-Deterministic Message Delivery in a Multiprocessor Architecture. 392-398
Bruce Lowekamp, Adam Beguelin: ECO: Efficient Collective Operations for Communication on Heterogeneous Networks. 399-405
Eric A. Brewer, Paul Gauthier, Armando Fox, Angela Schuett: Software Techniques for Improving MPP Bulk-Transfer Performance. 406-412
Session 11 - Algorithms: Implementation
David A. Bader, Joseph JáJá, David Harwood, Larry S. Davis: Parallel Algorithms for Image Enhancement and Segmentation by Region Growing with an Experimental Study. 414-423
Yu-Hua Lee, Shi-Jinn Horng: The Chessboard Distance Transform and the Medial Axis Transform are Interchangeable. 424-428
Armin Bäumker, Wolfgang Dittrich: Parallel Algorithms for Image Processing: Practical Algorithms with Experiments. 429-433
Bongki Moon, Anurag Acharya, Joel H. Saltz: Study of Scalable Declustering Algorithms for Parallel Grid Files. 434-440
Session 12 - Performance Evaluation and Prediction
Kelvin K. Yue, David J. Lilja: Efficient Execution of Parallel Applications in Multiprogrammed Multiprocessor Systems. 448-456
Xian-He Sun: The Relation of Scalability and Execution Time. 457-462
Thu D. Nguyen, Raj Vaswani, John Zahorjan: Maximizing Speedup through Self-Tuning of Processor Allocation. 463-468
Shaun Kaneshiro, Tatsuya Shindo: Profiling Optimized Code: A Profiling System for an HPF Compiler. 469-473
Thomas Fahringer: Toward Symbolic Performance Prediction of Parallel Programs. 474-478
Sivan Toledo: Performance Prediction with Benchmaps. 479-485
Industrial Track - Invited Presentations
Session-I: Parallel Architectures - Implementation, Programming, and Performance
Jeffrey M. Nick, Jen-Yao Chung, Nicholas S. Bowen: IBM System/390 Division: Overview of IBM System/390 Parallel Sysplex - A Commercial Parallel Processing System. 488-495
Alan L. Smeyne: Litton Guidance and Control Systems, Inc.: Implementing Parallel Processing in a Rugged Embeddable Environment. 496-501
Gerard Vichniac, Barry Isenstein, Craig Lund, Arlan Pool: Mercury Computer Systems, Inc.: Planned Direct Transfers: A Programming Model for Real-Time Applications. 502-505
Session-II: Networking and Distributed Computing
Yogindra Abhyankar, Anil Degwekar, Abhay Karandikar: Centre for Development of Advanced Computing: DS-Link over Fiber: A High-Speed Interconnect for Cluster Computing. 507-511
Woo-Jong Hahn, Ando Ki, Kee-Wook Rim, Soo-Won Kim: Electronics and Telecommunications Research Institute: A Multiprocessor Server with a New Highly Pipelined Bus. 512-517
Robert W. Horst, Doug Jewett, William J. Watson, L. Young, Dimiter R. Avresky, R. Wilkinson, Chris M. Cunningham: Tandem Computers Incorporated: Performance Modeling of ServerNetTM Topologies. 518-523
Session 13 - Synchronization, Virtual Memory, and Runtime System Support
Georg Stellner: CoCheck: Checkpointing and Process Migration for MPI. 526-531
Peter H. Beckman, Dennis Gannon: Tulip: A Portable Run-Time System for Object-Parallel Systems. 532-536
Reiner W. Hartenstein, Jürgen Becker, Michael Herz, Rainer Kress, Ulrich Nageldinger: A Partitioning Programming Environment for a Novel Parallel Architecture. 544-548
Martin C. Rinard: An Integrated Synchronization and Consistency Protocol for the Implementation of a High-Level Parallel Programming Language. 549-553
Meenakshi Arunachalam, Alok N. Choudhary, Brad Rullman: Implementation and Evaluation of Prefetching in the Intel Paragon Parallel File System. 554-559
Session 14 - Arrays and Hypercubes
Qian-Ping Gu, Hisao Tamaki: Routing a Permutation in the Hypercube by Two Sets of Edge-Disjoint Paths. 561-567
Bogdan S. Chlebus, José D. P. Rolim, Giora Slutzki: Distributing Tokens on a Hypercube without Error Accumulation. 573-578
Hari Krishna Tadepalli, Errol L. Lloyd: An Improved Approximation Algorithm for Scheduling Task Trees on Linear Arrays. 584-590
Session 15 - Mathematical Methods
Bing Bing Zhou, Richard P. Brent: Jacobi-like Algorithms for Eigenvalue Decomposition of a Real Normal Matrix Using Real Arithmetic. 593-600
Hong Q. Ding, Robert D. Ferraro: An Element-Based Concurrent Partitioner for Unstructured Finite Element Meshes. 601-605
William E. Hart, Scott B. Baden, Richard K. Belew, Scott R. Kohn: Analysis of the Numerical Effects of Parallelism on a Parallel Genetic Algorithm. 606-612
Shankar Ramaswamy, Eugene W. Hodges IV, Prithviraj Banerjee: Compiling MATLAB Programs to ScaLAPACK: Exploiting Task and Data Parallelism. 613-619
Eugene V. Zima, Karthi R. Vadivelu, Thomas L. Casavant: Mapping Techniques for Parallel Evaluation of Chains of Recurrences. 620-624
Adrian Moga, Michel Dubois: Performance of Asynchronous Linear Iterations with Random Delays. 625-629
Panel
William M. Farmer, Richard F. Freund, Mark Furtney, Paul Messina, Lionel M. Ni, Charles L. Seitz, Marc Snir: For a Massive Number of Massively Parallel Machines: What are the Target Applications, Who are the Target Users, and What New R&D is Needed to Hit the Target? 631-634
Keynote Address
Gregory F. Pfister: Clusters for Commercial Computing: An Invisible Architecture. 636
Session 16 - Interconnection Networks

Yeimkuan Chang: Partitionability of the Multistage Interconnection Networks. 644-649
Mounir Hamdi, Siang W. Song: On Embedding Various Networks into the Hypercube Using Matrix Transformations. 650-654
Baback A. Izadi, Füsun Özgüner: Optimal Subcube Fault Tolerance in a Circuit-Switched Hypercube. 655-659
Mongkol Raksapatcharawong, Timothy Mark Pinkston: An Optical Interconnect Model for k-ary n-cube Wormhole Networks. 666-672
Session 17 - Bus-Based Algorithms
Ramachandran Vaidyanathan, Sudharani Nadella: Fault-Tolerant Multiple Bus Networks for Fan-In Algorithms. 674-681
Peter Damaschke: Coping with Sparse Inputs on Enhanced Meshes - Semigroup Computation with COMMON CRCW Buses. 682-686
Koji Nakano, Stephan Olariu: An Optimal Algorithm for the Angle-Restricted All Nearest Neighbor Problem on the Reconfigurable. 687-691
Sandy Pavel, Selim G. Akl: Efficient Algorithms for the Hough Transform on Arrays with Reconfigurable Optical Buses. 697-701
Jerry L. Trahan, Chun-ming Lu, Ramachandran Vaidyanathan: Integer and Floating Point Matrix-Vector Multiplication on the Reconfigurable Mesh. 702-706
Session 18 - Image and Radar Processing
Shung-Shing Lee, Shi-Jinn Horng, Horng-Ren Tsai, Yu-Hua Lee: Some Image Processing Algorithms on a RAP with Wider Bus Networks. 708-715
Peter G. Meisl, Mabo Robert Ito, Ian G. Cumming: Parallel Synthetic Aperture Radar Processing on Workstation Networks. 716-723
Alberto Broggi: The Evolution of a Massively Parallel Vision System for Real-Time Automotive Image Processing. 724-728
Concettina Guerra: 2D Object Recognition on a Reconfigurable Mesh. 729-733
Janice S. McMahon, Ken Teitelbaum: Space-Time Adaptive Processing on the Mesh Synchronous Processor. 734-740
Michael R. Berry, Tarek A. El-Ghazawi: An Experimental Study of Input/Output Characteristics of NASA Earth and Space Sciences Applications. 741-747
Session 19 - Special-Purpose Applications
Beverly Gocal: Bitonic Sorting on Bene Networks. 749-753
Célio Estevan Morón: Designing Adaptable Real-Time Fault-Tolerant Parallel Systems. 754-758
James D. Allen, David E. Schimmel: Improving Memory Performance for Indirect Accesses on SIMD Computers. 759-765

Bernardo Rodriguez, Harry F. Jordan, Gita Alaghband: Temporal Characterization of Demands for Data Movement on Parallel Programs. 776-779
Session 20 - Communication III

Yuanyuan Yang, Gerald M. Masson: The Necessary Conditions for Clos-Type Nonblocking Multicast Networks. 789-795
Yuanyuan Yang: A Class of Interconnection Networks for Multicasting. 796-802
Young-Joo Suh, Sudhakar Yalamanchili: Algorithms for All-to-All Personalized Exchange in 2D and 3D Tori. 808-814
Anjan K. Venkatramani, Timothy Mark Pinkston, José Duato: Generalized Theory for Deadlock-Free Adaptive Wormhole Routing and its Application to Disha Concurrent. 815-821
Session 21 - Clusters and Domain Decomposition

Joseph Gil, Alan S. Wagner: A New Technique for 3-D Domain Decomposition on Multicomputers which Reduces Message-Passing. 831-835
Patrick W. Dowd, Todd M. Carrozzi, Frank A. Pellegrino, Amy Xin Chen: Native ATM Application Programmer Interface Testbed for Cluster-Based Computing. 843-849
Daniel Andresen, Tao Yang, Vegard Holmedahl, Oscar H. Ibarra: SWEB: Towards a Scalable World Wide Web Server on Multicomputers. 850-856
Additional Papers
Kannappan Palaniappan, Mohammad Faisal, Chandra Kambhamettu, A. Frederick Haslert: Implementation of an Automatic Semi-Fluid Motion Analysis Algorithm on a Massively Parallel Computer. 864-877
Subhash Saini: NAS Experiences of Porting CM Fortran Codes to on IBM SP2 and SGI Power Challenge. 878-880
Nihar R. Mahapatra, Shantanu Dutt: Random Seeking: A General, Efficient, and Informed Randomized Scheme for Dynamic Load Balancing. 881-885
Marián Vajtersic: A Direct Block-Five-Diagonal System Solver for the VLSI Parallel Model. 886-890
Ladan Kazerouni, Basant Rajan, R. K. Shyamasundar: Mapping Linear Recurrences onto Systolic Arrays. 891-897



