17. PPOPP 2012: New Orleans, LA, USA
J. Ramanujam, P. Sadayappan (Eds.): Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP 2012, New Orleans, LA, USA, February 25-29, 2012. ACM 2012 ISBN 978-1-4503-1160-1
GPU tools
Huynh Phung Huynh, Andrei Hagiescu, Weng-Fai Wong, Rick Siow Mong Goh: Scalable framework for mapping streaming applications onto multi-GPU systems. 1-10
Jaewoong Sim, Aniruddha Dasgupta, Hyesoon Kim, Richard W. Vuduc: A performance analysis framework for identifying potential benefits in GPGPU applications. 11-22
Sara S. Baghsorkhi, Isaac Gelado, Matthieu Delahaye, Wen-mei W. Hwu: Efficient performance evaluation of memory hierarchy for highly multithreaded graphics processors. 23-34
Communication & SIMD optimization
Grey Ballard, James Demmel, Nicholas Knight: Communication avoiding successive band reduction. 35-44
Paul Sack, William Gropp: Faster topology-aware collective algorithms through non-minimal communication. 45-54
Roland Leißa, Sebastian Hack, Ingo Wald: Extending a C-like language for portable SIMD programming. 65-74
Programming models
Okwan Kwon, Fahed Jubair, Rudolf Eigenmann, Samuel P. Midkiff: A hybrid approach of OpenMP for clusters. 75-84
Yong Hun Eom, Stephen Yang, James Christopher Jenista, Brian Demsky: DOJ: dynamically parallelizing object-oriented programs. 85-96
Daniele Bonetta, Achille Peternier, Cesare Pautasso, Walter Binder: S: a scripting language for high-performance RESTful web services. 97-106
GPU algorithms
Mario Méndez-Lojo, Martin Burtscher, Keshav Pingali: A GPU implementation of inclusion-based points-to analysis. 107-116
Yuan Zu, Ming Yang, Zhonghu Xu, Lin Wang, Xin Tian, Kunyang Peng, Qunfeng Dong: GPU-based NFA implementation for memory efficient high speed regular expression matching. 129-140
Concurrent data structures

Aleksandar Prokopec, Nathan Grasso Bronson, Phil Bagwell, Martin Odersky: Concurrent tries with efficient non-blocking snapshots. 151-160
Yifeng Chen, Xiang Cui, Hong Mei: PARRAY: a unifying array representation for heterogeneous parallelism. 171-180
Parallel algorithms
Guy E. Blelloch, Jeremy T. Fineman, Phillip B. Gibbons, Julian Shun: Internally deterministic parallel algorithms can be fast. 181-192
Charles E. Leiserson, Tao B. Schardl, Jim Sukha: Deterministic parallel random-number generation for dynamic-multithreading platforms. 193-204
Sadegh Nobari, Thanh-Tung Cao, Panagiotis Karras, Stéphane Bressan: Scalable parallel minimum spanning forest computation. 205-214
Correctness and fault tolerance
Guodong Li, Peng Li, Geoffrey Sawaya, Ganesh Gopalakrishnan, Indradeep Ghosh, Sreeranga P. Rajan: GKLEE: concolic verification and test generation for GPUs. 215-224
Peng Du, Aurelien Bouteiller, George Bosilca, Thomas Hérault, Jack Dongarra: Algorithm-based fault tolerance for dense matrix factorizations. 225-234
Jeremy D. Buhler, Kunal Agrawal, Peng Li, Roger D. Chamberlain: Efficient deadlock avoidance for streaming computation with filtering. 235-246
Scheduling and synchronization
David Dice, Virendra J. Marathe, Nir Shavit: Lock cohorting: a general technique for designing NUMA locks. 247-256
Panagiota Fatourou, Nikolaos D. Kallimanis: Revisiting the combining synchronization technique. 257-266
Olivier Tardieu, Haichuan Wang, Haibo Lin: A work-stealing scheduler for X10's task parallelism with suspension. 267-276
Poster session 1 (Monday)
Muthu Manikandan Baskaran, Nicolas Vasilache, Benoît Meister, Richard Lethin: Automatic communication optimizations through memory reuse strategies. 277-278
Gu Liu, Hong An, Wenting Han, Xiaoqiang Li, Tao Sun, Wei Zhou, Xuechao Wei, Xulong Tang: FlexBFS: a parallelism-aware implementation of breadth-first search on GPU. 279-280
Michael Andersch, Chi Ching Chi, Ben H. H. Juurlink: Programming parallel embedded and consumer applications in OpenMP superscalar. 281-282
Christophe Alias, Alain Darte, Alexandru Plesco: Optimizing remote accesses for offloaded kernels: application to high-level synthesis for FPGA. 285-286
Jian Tao, Marek Blazewicz, Steven R. Brandt: Using GPU's to accelerate stencil-based computation kernels for the development of large scale scientific applications on heterogeneous systems. 287-288
Bryan Marker, Andy Terrel, Jack Poulson, Don S. Batory, Robert A. van de Geijn: Mechanizing the expert dense linear algebra developer. 289-290
Cedric Nugteren, Henk Corporaal: The boat hull model: adapting the roofline model to enable performance prediction for parallel computing. 291-292
Alexandra Jimborean, Philippe Clauss, Benoît Pradelle, Luis Mastrangelo, Vincent Loechner: Adapting the polyhedral model as a framework for efficient speculative parallelization. 295-296
Yifan Gong, Bingsheng He, Jianlong Zhong: An overview of CMPI: network performance aware MPI in the cloud. 297-298
Jungwon Kim, Sangmin Seo, Jun Lee, Jeongho Nah, Gangwon Jo, Jaejin Lee: OpenCL as a unified programming model for heterogeneous CPU/GPU clusters. 299-300
George Tzenakis, Angelos Papatriantafyllou, John Kesapides, Polyvios Pratikakis, Hans Vandierendonck, Dimitrios S. Nikolopoulos: BDDT: : block-level dynamic dependence analysis for deterministic task-based parallelism. 301-302
Shoaib Kamil, Derrick Coetzee, Scott Beamer, Henry Cook, Ekaterina Gonina, Jonathan Harper, Jeffrey Morlan, Armando Fox: Portable parallel performance from sequential, productive, embedded domain-specific languages. 303-304
Torsten Hoefler, Timo Schneider: Communication-centric optimizations by dynamically detecting collective operations. 305-306
Poster session 2 (Tuesday)


Minh Ngoc Dinh, David Abramson, Chao Jin, Andrew Gontarek, Bob Moench, Luiz De Rose: Scalable parallel debugging with statistical assertions. 311-312
Anshul Mittal, Nikhil Jain, Thomas George, Yogish Sabharwal, Sameer Kumar: Collective algorithms for sub-communicators. 315-316
Zviad Metreveli, Nickolai Zeldovich, M. Frans Kaashoek: CPHASH: a cache-partitioned hash table. 319-320
John Robert Wernsing, Greg Stitt: RACECAR: a heuristic for automatic function specialization on multi-core heterogeneous systems. 321-322
Albert Noll, Thomas R. Gross: An infrastructure for dynamic optimization of parallel programs. 325-326
Fredrik Kjolstad, Torsten Hoefler, Marc Snir: Automatic datatype generation and optimization. 327-328
Jacob Burnim, Tayfun Elmas, George C. Necula, Koushik Sen: NDetermin: inferring nondeterministic sequential specifications for parallelism correctness. 329-330
Andrew Stone, John Dennis, Michelle Strout: Establishing a Miniapp as a programmability proxy. 333-334
Lei Jiang, Pragneshkumar B. Patel, George Ostrouchov, Ferdinand Jamitzky: OpenMP-style parallelism in data-centered multicore computing with R. 335-336
Yves Caniou, Daniel Diaz, Florian Richoux, Philippe Codognet, Salvador Abreu: Performance analysis of parallel constraint-based local search. 337-338



