Publication
Title
Hiding global synchronization latency in the preconditioned Conjugate Gradient algorithm
Author
Abstract
Scalability of Krylov subspace methods suffers from costly global synchronization steps that arise in dot-products and norm calculations on parallel machines. In this work, a modified preconditioned Conjugate Gradient (CG) method is presented that removes the costly global synchronization steps from the standard CG algorithm by only performing a single non-blocking reduction per iteration. This global communication phase can be overlapped by the matrixvector product, which typically only requires local communication. The resulting algorithm will be referred to as pipelined CG. An alternative pipelined method, mathematically equivalent to the Conjugate Residual (CR) method that makes different trade-offs with regard to scalability and serial runtime is also considered. These methods are compared to a recently proposed asynchronous CG algorithm by Gropp. Extensive numerical experiments demonstrate the numerical stability of the methods. Moreover, it is shown that hiding the global synchronization step improves scalability on distributed memory machines using the message passing paradigm and leads to significant speedups compared to standard preconditioned CG.
Language
English
Source (journal)
Parallel computing. - Amsterdam
Publication
Amsterdam : 2014
ISSN
0167-8191
DOI
10.1016/J.PARCO.2013.06.001
Volume/pages
40 :7 (2014) , p. 224-238
ISI
000339598400007
Full text (Publisher's DOI)
Full text (publisher's version - intranet only)
UAntwerpen
Faculty/Department
Research group
Publication type
Subject
Affiliation
Publications with a UAntwerp address
External links
Web of Science
Record
Identifier
Creation 05.09.2013
Last edited 02.10.2024
To cite this reference