Title
|
|
|
|
Hard faults and soft-errors : possible numerical remedies in linear algebra solvers
| |
Author
|
|
|
|
| |
Abstract
|
|
|
|
On future large-scale systems, the mean time between failures (MTBF) of the system is expected to decrease so that many faults could occur during the solution of large problems. Consequently, it becomes critical to design parallel numerical linear algebra kernels that can survive faults. In that framework, we investigate the relevance of approaches relying on numerical techniques, which might be combined with more classical techniques for real large-scale parallel implementations. Our main objective is to provide robust resilient schemes so that the solver may keep converging in the presence of the hard fault without restarting the calculation from scratch. For this purpose, we study interpolation-restart (IR) strategies. For a given numerical scheme, the IR strategies consist of extracting relevant information from available data after a fault. After data extraction, a well-selected part of the missing data is regenerated through interpolation strategies to constitute a meaningful input to restart the numerical algorithm. In this paper, we revisit a few state-of-the-art methods in numerical linear algebra in the light of our IR strategies. Through a few numerical experiments, we illustrate the respective robustness of the resulting resilient schemes with respect to the MTBF via qualitative illustrations. |
| |
Language
|
|
|
|
English
| |
Source (journal)
|
|
|
|
Lecture notes in computer science. - Berlin, 1973, currens
| |
Source (book)
|
|
|
|
12th International Conference on High-Performance Computing for, Computational Science (VECPAR), JUN 28-30, 2016, Porto, PORTUGAL
| |
Publication
|
|
|
|
Cham
:
Springer international publishing ag
,
2017
| |
ISBN
|
|
|
|
978-3-319-61982-8
978-3-319-61981-1
978-3-319-61981-1
| |
DOI
|
|
|
|
10.1007/978-3-319-61982-8_3
| |
Volume/pages
|
|
|
|
10150
(2017)
, p. 11-18
| |
ISI
|
|
|
|
000441374400003
| |
Full text (Publisher's DOI)
|
|
|
|
| |
|