Efficient and Accurate Runtime Prediction of Fused Linear Algebra Kernels
SESSION: Doctoral Research Showcase I (Autotuning and Performance Engineering on Emerging and Scalable Systems)
EVENT TYPE: Doctoral Research Showcase
TIME: 2:42PM - 3:00PM
SESSION CHAIR: Sadaf R. Alam
ABSTRACT: Data movement limits the performance of many scientific computing applications. For these programs, runtimes are most accurately expressed in terms of memory traffic. Techniques such as loop fusion decrease data movement, often producing speedups proportional to the resulting reduction in memory accesses. However, loop fusion sometimes decreases performance by causing capacity misses in caches and registers. Whether fusion causes misses depends on hardware and routine characteristics. Finding the optimal amount of fusion requires trying all possible fusion strategies for all sizes of interest, which is often infeasible. In this talk, we describe our use of a model that accurately reduces a large number of versions of the same routine to a practical collection to test. We include only the most distinguishing machine and routine features, allowing for an economical comparison while maintaining accuracy. We integrate our model into a compilation framework where it reduces compile times without sacrificing kernel efficiency.
Sadaf R. Alam (Chair) - Swiss National Supercomputing Centre