Modeling the performance of geometric multigrid stencils on multicore computer architectures
Faculty of Sciences. Mathematics and Computer Science
SIAM journal on scientific computing. - Philadelphia, Pa
, p. C194-C216
University of Antwerp
The basic building blocks of the classic geometric multigrid algorithm all have a low ratio of executed floating point operations per byte fetched from memory. On modern computer architectures, such computational kernels are typically bound by memory traffic and achieve only a small percentage of the theoretical peak floating point performance of the underlying hardware. We suggest the use of state-of-the-art (stencil) compiler techniques to improve the flop per byte ratio, also called the arithmetic intensity, of the steps in the algorithm. Our focus will be on the smoother which is a repeated stencil application. With a tiling approach based on the polyhedral loop optimization framework, data reuse in the smoother can be improved, leading to a higher effective arithmetic intensity. For an academic constant coefficient Poisson problem, we present a performance model for the multigrid $V$-cycle solver based on the tiled smoother. For increasing numbers of smoothing steps, there is a trade-off between the improved efficiency due to better data reuse and the additional flops required for extra smoothing steps. Our performance model predicts time to solution by linking convergence rate to arithmetic intensity via the roofline model. We show results for two-dimensional (2D) and three-dimensional (3D) simulations on Intel Sandy Bridge and for 2D simulations on Intel Xeon Phi architectures. The actual performance is compared with the theoretical predictions.