CUDA-CHiLL: Using Compiler-Based Autotuning to Generate High-Performance GPU Libraries
SESSION: Research Poster Reception
EVENT TYPE: Poster
TIME: 5:15PM - 7:00PM
AUTHOR(S):Malik Khan, Gabe Rudy, Chun Chen, Mary Hall, Jacqueline Chame
ABSTRACT: This poster presents CUDA-CHiLL, a compiler-based transformation, code generation and auto-tuning system to generate high-performance library code targeting GPUs. A high-level transformation recipe interface allows a programmer or compiler algorithm to control mapping of computation to the GPU, which is applied using an underlying polyhedral code generation framework. The compiler generates recipes automatically for results presented.
We report results for three single precision functions from BLAS -- matrix-matrix multiply (MM), matrix -vector multiply (MV) and transposed matrix-vector multiply (TMV). Our automatically-generated code achieves a performance as high as 435 Gflops on MM on a GTX280, and an average speedup of 1.5X (as high as 2.5X) over CUBLAS 2.2 for square matrices ranging from 128 to 8K. MV and TMV perform up to 64 and 18 Gflops, respectively, with MV yielding average speedup of 1.8X (up to 2.7X), and TMV an average speedup of 1.05X (up to 1.7X), over CUBLAS 2.2.