| 2013 | ||
|---|---|---|
| j9 | Markus Wittmann, Thomas Zeiser, Georg Hager, Gerhard Wellein: Comparison of different propagation steps for lattice Boltzmann methods. Computers & Mathematics with Applications 65(6): 924-935 (2013) | |
| i27 | Markus Wittmann, Georg Hager, Thomas Zeiser, Gerhard Wellein: Asynchronous MPI for the Masses. CoRR abs/1302.4280 (2013) | |
| i26 | Tobias Scharpff, Klaus Iglberger, Georg Hager, Ulrich Rüde: Model-guided Performance Analysis of the Sparse Matrix-Matrix Multiplication. CoRR abs/1303.1651 (2013) | |
| i25 | Christoph Scheit, Georg Hager, Jan Treibig, Stefan Becker, Gerhard Wellein: Optimization of FASTEST-3D for Modern Multicore Systems. CoRR abs/1303.4538 (2013) | |
| i24 | Markus Wittmann, Georg Hager, Thomas Zeiser, Gerhard Wellein: An analysis of energy-optimized lattice-Boltzmann CFD simulations from the chip to the highly parallel level. CoRR abs/1304.7664 (2013) | |
| 2012 | ||
| j8 | Klaus Iglberger, Georg Hager, Jan Treibig, Ulrich Rüde: Expression Templates Revisited: A Performance Analysis of Current Methodologies. SIAM J. Scientific Computing 34(2) (2012) | |
| c15 | ||
| c14 | Jan Treibig, Georg Hager, Gerhard Wellein: Performance Patterns and Hardware Metrics on Modern Multicore Processors: Best Practices for Performance Engineering. Euro-Par Workshops 2012: 451-460 | |
| c13 | Klaus Iglberger, Georg Hager, Jan Treibig, Ulrich Rüde: High performance smart expression template math libraries. HPCS 2012: 367-373 | |
| c12 | Moritz Kreutzer, Georg Hager, Gerhard Wellein, Holger Fehske, Achim Basermann, Alan R. Bishop: Sparse Matrix-vector Multiplication on GPGPU Clusters: A New Storage Format and a Scalable Implementation. IPDPS Workshops 2012: 1696-1702 | |
| i23 | Jan Treibig, Georg Hager, Gerhard Wellein: Best practices for HPM-assisted performance engineering on modern multicore processors. CoRR abs/1206.3738 (2012) | |
| i22 | Georg Hager, Jan Treibig, Johannes Habich, Gerhard Wellein: Exploring performance and power properties of modern multicore chips via simple machine models. CoRR abs/1208.2908 (2012) | |
| 2011 | ||
| j7 | Johannes Habich, Thomas Zeiser, Georg Hager, Gerhard Wellein: Performance analysis and optimization strategies for a D3Q19 lattice Boltzmann kernel on nVIDIA GPUs using CUDA. Advances in Engineering Software 42(5): 266-272 (2011) | |
| j6 | Jan Treibig, Gerhard Wellein, Georg Hager: Efficient multicore-aware parallelization strategies for iterative stencil computations. J. Comput. Science 2(2): 130-137 (2011) | |
| j5 | Christian Feichtinger, Johannes Habich, Harald Köstler, Georg Hager, Ulrich Rüde, Gerhard Wellein: A flexible Patch-based lattice Boltzmann parallelization approach for heterogeneous GPU-CPU clusters. Parallel Computing 37(9): 536-549 (2011) | |
| j4 | Gerald Schubert, Holger Fehske, Georg Hager, Gerhard Wellein: Hybrid-Parallel Sparse Matrix-Vector Multiplication with Explicit Communication Overlap on Current Multicore-Based Systems. Parallel Processing Letters 21(3): 339-358 (2011) | |
| c11 | Gerald Schubert, Georg Hager, Holger Fehske, Gerhard Wellein: Parallel Sparse Matrix-Vector Multiplication as a Test Case for Hybrid MPI+OpenMP Programming. IPDPS Workshops 2011: 1751-1758 | |
| c10 | Jan Treibig, Georg Hager, Gerhard Wellein, Michael Meier: Poster: LIKWID: lightweight performance tools. SC Companion 2011: 29-30 | |
| i21 | Gerald Schubert, Georg Hager, Holger Fehske, Gerhard Wellein: Parallel sparse matrix-vector multiplication as a test case for hybrid MPI+OpenMP programming. CoRR abs/1101.0091 (2011) | |
| i20 | Markus Wittmann, Georg Hager: Optimizing ccNUMA locality for task-parallel execution under OpenMP and TBB on multicore-based systems. CoRR abs/1101.0093 (2011) | |
| i19 | Klaus Iglberger, Georg Hager, Jan Treibig, Ulrich Rüde: Expression Templates Revisited: A Performance Analysis of the Current ET Methodology. CoRR abs/1104.1729 (2011) | |
| i18 | Jan Treibig, Georg Hager, Gerhard Wellein: LIKWID: Lightweight Performance Tools. CoRR abs/1104.4874 (2011) | |
| i17 | Jan Treibig, Georg Hager, Hannes G. Hofmann, Joachim Hornegger, Gerhard Wellein: Pushing the limits for medical image reconstruction on recent standard multicore processors. CoRR abs/1104.5243 (2011) | |
| i16 | Gerald Schubert, Holger Fehske, Georg Hager, Gerhard Wellein: Hybrid-parallel sparse matrix-vector multiplication with explicit communication overlap on current multicore-based systems. CoRR abs/1106.5908 (2011) | |
| i15 | Markus Wittmann, Thomas Zeiser, Georg Hager, Gerhard Wellein: Comparison of different Propagation Steps for the Lattice Boltzmann Method. CoRR abs/1111.0922 (2011) | |
| i14 | Markus Wittmann, Thomas Zeiser, Georg Hager, Gerhard Wellein: Domain decomposition and locality optimization for large-scale lattice Boltzmann simulations. CoRR abs/1111.1129 (2011) | |
| i13 | Johannes Habich, Christian Feichtinger, Harald Köstler, Georg Hager, Gerhard Wellein: Performance engineering for the Lattice Boltzmann method on GPGPUs: Architectural requirements and performance results. CoRR abs/1112.0850 (2011) | |
| i12 | Moritz Kreutzer, Georg Hager, Gerhard Wellein, Holger Fehske, Achim Basermann, Alan R. Bishop: Sparse matrix-vector multiplication on GPGPU clusters: A new storage format and a scalable implementation. CoRR abs/1112.5588 (2011) | |
| 2010 | ||
| j3 | Markus Wittmann, Georg Hager, Jan Treibig, Gerhard Wellein: Leveraging Shared Caches for Parallel Temporal Blocking of Stencil Codes on Multicore Processors and Clusters. Parallel Processing Letters 20(4): 359-376 (2010) | |
| c9 | Jan Treibig, Georg Hager, Gerhard Wellein: LIKWID: A Lightweight Performance-Oriented Tool Suite for x86 Multicore Environments. ICPP Workshops 2010: 207-216 | |
| c8 | Markus Wittmann, Georg Hager, Gerhard Wellein: Multicore-aware parallel temporal blocking of stencil codes for shared and distributed memory. IPDPS Workshops 2010: 1-7 | |
| i11 | Jan Treibig, Gerhard Wellein, Georg Hager: Efficient multicore-aware parallelization strategies for iterative stencil computations. CoRR abs/1004.1741 (2010) | |
| i10 | Jan Treibig, Georg Hager, Gerhard Wellein: LIKWID: A lightweight performance-oriented tool suite for x86 multicore environments. CoRR abs/1004.4431 (2010) | |
| i9 | Markus Wittmann, Georg Hager, Jan Treibig, Gerhard Wellein: Leveraging shared caches for parallel temporal blocking of stencil codes on multicore processors and clusters. CoRR abs/1006.3148 (2010) | |
| i8 | Christian Feichtinger, Johannes Habich, Harald Köstler, Georg Hager, Ulrich Rüde, Gerhard Wellein: A Flexible Patch-Based Lattice Boltzmann Parallelization Approach for Heterogeneous GPU-CPU Clusters. CoRR abs/1007.1388 (2010) | |
| 2009 | ||
| j2 | Thomas Zeiser, Georg Hager, Gerhard Wellein: Benchmark Analysis and Application Results for Lattice Boltzmann Simulations on NEC SX Vector and Intel Nehalem Systems. Parallel Processing Letters 19(4): 491-511 (2009) | |
| c7 | Gerhard Wellein, Georg Hager, Thomas Zeiser, Markus Wittmann, Holger Fehske: Efficient Temporal Blocking for Stencil Computations by Multicore-Aware Wavefront Parallelization. COMPSAC (1) 2009: 579-586 | |
| c6 | Thomas Zeiser, Georg Hager, Gerhard Wellein: The world's fastest CPU and SMP node: Some performance results from the NEC SX-9. IPDPS 2009: 1-8 | |
| c5 | Rolf Rabenseifner, Georg Hager, Gabriele Jost: Hybrid MPI/OpenMP Parallel Programming on Clusters of Multi-Core SMP Nodes. PDP 2009: 427-436 | |
| c4 | Jan Treibig, Georg Hager: Introducing a Performance Model for Bandwidth-Limited Loop Kernels. PPAM (1) 2009: 615-624 | |
| i7 | Markus Wittmann, Georg Hager: A Proof of Concept for Optimizing Task Parallelism by Locality Queues. CoRR abs/0902.1884 (2009) | |
| i6 | Jan Treibig, Georg Hager: Introducing a Performance Model for Bandwidth-Limited Loop Kernels. CoRR abs/0905.0792 (2009) | |
| i5 | Gerald Schubert, Georg Hager, Holger Fehske: Performance limitations for sparse matrix-vector multiplications on current multicore environments. CoRR abs/0910.4836 (2009) | |
| i4 | Jan Treibig, Georg Hager, Gerhard Wellein: Multi-core architectures: Complexities of performance prediction and the impact of cache topology. CoRR abs/0910.4865 (2009) | |
| i3 | Markus Wittmann, Georg Hager, Gerhard Wellein: Multicore-aware parallel temporal blocking of stencil codes for shared and distributed memory. CoRR abs/0912.4506 (2009) | |
| 2008 | ||
| j1 | Georg Hager, Thomas Zeiser, Gerhard Wellein: Data Access Characteristics and Optimizations for Sun UltraSPARC T2 and T2+ Systems. Parallel Processing Letters 18(4): 471-490 (2008) | |
| c3 | Georg Hager, Thomas Zeiser, Gerhard Wellein: Data access optimizations for highly threaded multi-core CPUs with multiple memory controllers. IPDPS 2008: 1-7 | |
| 2007 | ||
| i2 | Georg Hager, Thomas Zeiser, Gerhard Wellein: Data access optimizations for highly threaded multi-core CPUs with multiple memory controllers. CoRR abs/0712.2302 (2007) | |
| i1 | Georg Hager, Holger Stengel, Thomas Zeiser, Gerhard Wellein: RZBENCH: Performance evaluation of current HPC architectures using low-level and application benchmarks. CoRR abs/0712.3389 (2007) | |
| 2006 | ||
| c2 | Rolf Rabenseifner, Georg Hager, Gabriele Jost, Rainer Keller: Hybrid MPI and OpenMP Parallel Programming. PVM/MPI 2006: 11 | |
| 2002 | ||
| c1 | Gerhard Wellein, Georg Hager, Achim Basermann, Holger Fehske: Fast Sparse Matrix-Vector Multiplication for TeraFlop/s Computers. VECPAR 2002: 287-301 | |
Data released under the ODC-BY 1.0 license — See also our legal information page