ICS 2010: Tsukuba, Ibaraki, Japan
Taisuke Boku, Hiroshi Nakashima, Avi Mendelson (Eds.): Proceedings of the 24th International Conference on Supercomputing, 2010, Tsukuba, Ibaraki, Japan, June 2-4, 2010. ACM 2010 ISBN 978-1-4503-0018-6
Keynotes
Stephen S. Pawlowski: Exascale science: the next frontier in high performance computing. 1
William J. Dally: Throughput computing. 2
Kimihiko Hirao: The next-generation supercomputer project and a plan for the advanced institute for computational science. 3
MPI
Vladimir Marjanovic, Jesús Labarta, Eduard Ayguadé, Mateo Valero: Overlapping communication and computation by using a hybrid MPI/SMPSs approach. 5-16
Sreeram Potluri, Ping Lai, Karen A. Tomko, Sayantan Sur, Yifeng Cui, Mahidhar Tatineni, Karl W. Schulz, William L. Barth, Amitava Majumdar, Dhabaleswar K. Panda: Quantifying performance benefits of overlap using MPI-2 in a seismic modeling application. 17-25
Nikhil Jain, Yogish Sabharwal: Optimal bucket algorithms for large MPI collectives on torus interconnects. 27-36
Cache and transaction memory
Javier Lira, Carlos Molina, Antonio González: The auction: optimizing banks usage in Non-Uniform Cache Architectures. 37-47
Robert Strzodka, Mohammed Shaheen, Dawid Pajak, Hans-Peter Seidel: Cache oblivious parallelograms in iterative stencil computations. 49-59
Woongki Baek, Nathan Grasso Bronson, Christos Kozyrakis, Kunle Olukotun: Making nested parallel transactions practical using lightweight hardware support. 61-71
Applications (1)
Atabak Mahram, Martin C. Herbordt: Fast and accurate NCBI BLASTP: acceleration with multiphase FPGA-based prefiltering. 73-82
Narges Bani Asadi, Christopher W. Fletcher, Greg Gibeling, John Wawrzynek, Wing H. Wong, Garry P. Nolan: ParaLearn: a massively parallel, scalable system for learning interaction networks on FPGAs. 83-94
Michael D. Linderman, Robert V. Bruggner, Vivek Athalye, Teresa H. Y. Meng, Narges Bani Asadi, Garry P. Nolan: High-throughput Bayesian network learning using heterogeneous multicore computers. 95-104
Chi Ching Chi, Ben H. H. Juurlink, Cor Meenderinck: Evaluation of parallel H.264 decoding strategies for the Cell Broadband Engine. 105-114
GPGPU and accelerators (1)
Eddy Z. Zhang, Yunlian Jiang, Ziyu Guo, Xipeng Shen: Streamlining GPU applications on the fly: thread divergence elimination through runtime thread-data remapping. 115-126
Allen D. Malony, Scott Biersdorff, Wyatt Spear, Shangkar Mayanglambam: An experimental approach to performance measurement of heterogeneous parallel applications using CUDA. 127-136
Vignesh T. Ravi, Wenjing Ma, David Chiu, Gagan Agrawal: Compiler and runtime support for enabling generalized reduction computations on heterogeneous parallel configurations. 137-146
Architecture
Ramon Bertran, Marc González, Xavier Martorell, Nacho Navarro, Eduard Ayguadé: Decomposable and responsive power models for multicore processors using performance counters. 147-158
Lixin Zhang, Evan Speight, Ramakrishnan Rajamony, Jiang Lin: Enigma: architectural and operating system support for reducing the impact of address translation. 159-168
Huaiyu Zhu, Yong Chen, Xian-He Sun: Timing local streams: improving timeliness in data prefetching. 169-178
Chunyang Gou, Georgi Kuzmanov, Georgi Gaydadjiev: SAMS multi-layout memory: providing multiple views of data to boost SIMD performance. 179-188
System and IO issues

Adam J. Oliner, Alex Aiken: A query language for understanding component interactions in production systems. 201-210
Ramya Prabhakar, Shekhar Srikantaiah, Mahmut T. Kandemir, Christina M. Patrick: Adaptive multi-level cache allocation in distributed storage architectures. 211-221
Xuechen Zhang, Song Jiang: InterferenceRemoval: removing interference of disk access for MPI programs through data replication. 223-232
Applications (2)
Keith R. Bisset, Jiangzhuo Chen, Xizhou Feng, Yifei Ma, Madhav V. Marathe: Indemics: an interactive data intensive framework for high performance epidemic simulation. 233-242
Todd Gamblin, Bronis R. de Supinski, Martin Schulz, Robert J. Fowler, Daniel A. Reed: Clustering performance data efficiently at massive scales. 243-252
Jaewook Shin, Mary W. Hall, Jacqueline Chame, Chun Chen, Paul F. Fischer, Paul D. Hovland: Speeding up Nek5000 with autotuning and specialization. 253-262
Compilers
Josep M. Pérez, Rosa M. Badia, Jesús Labarta: Handling task dependencies under strided and aliased references. 263-274
Harmen L. A. van der Spek, C. W. Mattias Holm, Harry A. G. Wijshoff: How to unleash array optimizations on code using recursive data structures. 275-284
Lixia Liu, Zhiyuan Li: A compiler-automated array compression scheme for optimizing memory intensive programs. 285-294
Arun Chauhan, Chun-Yu Shei: Static reuse distances for locality-based optimizations in MATLAB. 295-304
GPGPU and accelerators (2)
Liang Gu, Xiaoming Li, Jakob Siegel: An empirically tuned 2D and 3D FFT library on CUDA GPU. 305-314
Yong Dou, Yuanwu Lei, Guiming Wu, Song Guo, Jie Zhou, Li Shen: FPGA accelerating double/quad-double high precision floating-point applications for ExaScale computing. 325-336
Jamin Naghmouchi, Daniele Paolo Scarpazza, Mladen Berekovic: Small-ruleset regular expression matching on GPGPUs: quantitative performance analysis and optimization. 337-348



