Memory access optimization for iterative tomography on many-core architectures
Faculty of Sciences. Physics
Faculty of Sciences. Mathematics and Computer Science
The 12th International Meeting on Fully Three-Dimensional Image Reconstruction in Radiology and Nuclear Medicine
University of Antwerp
Iterative tomographic reconstruction methods, de- spite their virtues, are known to be slow compared to analytic reconstruction methods, mainly because of the computationally very intensive forward and backward projection operations. By relying on many-core architectures with large vector registers, modern high performance computing (HPC) systems can offer relief. However, to optimally benefit from such systems, the peak performance of the algorithms should not be bound by the memory bandwidth. In this work, a strategy is proposed that improves the performance of the tomographic forward projection by optimizing its memory accesses. Data locality is exploited to hide data access latency and knowledge of the cache architecture is used to optimally distribute the projection operation over many computing cores. Experiments performed on the recently introduced Intel R Xeon Phi TM architecture confirm a substantial boost in projection performance.