Publication
Title
Hard faults and soft-errors : possible numerical remedies in linear algebra solvers
Author
Abstract
On future large-scale systems, the mean time between failures (MTBF) of the system is expected to decrease so that many faults could occur during the solution of large problems. Consequently, it becomes critical to design parallel numerical linear algebra kernels that can survive faults. In that framework, we investigate the relevance of approaches relying on numerical techniques, which might be combined with more classical techniques for real large-scale parallel implementations. Our main objective is to provide robust resilient schemes so that the solver may keep converging in the presence of the hard fault without restarting the calculation from scratch. For this purpose, we study interpolation-restart (IR) strategies. For a given numerical scheme, the IR strategies consist of extracting relevant information from available data after a fault. After data extraction, a well-selected part of the missing data is regenerated through interpolation strategies to constitute a meaningful input to restart the numerical algorithm. In this paper, we revisit a few state-of-the-art methods in numerical linear algebra in the light of our IR strategies. Through a few numerical experiments, we illustrate the respective robustness of the resulting resilient schemes with respect to the MTBF via qualitative illustrations.
Language
English
Source (journal)
Lecture notes in computer science. - Berlin, 1973, currens
Source (book)
12th International Conference on High-Performance Computing for, Computational Science (VECPAR), JUN 28-30, 2016, Porto, PORTUGAL
Publication
Cham : Springer international publishing ag , 2017
ISBN
978-3-319-61981-1
978-3-319-61982-8
978-3-319-61981-1
DOI
10.1007/978-3-319-61982-8_3
Volume/pages
10150 (2017) , p. 11-18
ISI
000441374400003
Full text (Publisher's DOI)
UAntwerpen
Faculty/Department
Research group
Project info
HPC iterative solvers for multi-particle physics simulation.
Publication type
Subject
Affiliation
Publications with a UAntwerp address
External links
Web of Science
Record
Identifier
Creation 07.09.2018
Last edited 09.10.2023
To cite this reference