Title




Hard faults and softerrors : possible numerical remedies in linear algebra solvers
 
Author




 
Abstract




On future largescale systems, the mean time between failures (MTBF) of the system is expected to decrease so that many faults could occur during the solution of large problems. Consequently, it becomes critical to design parallel numerical linear algebra kernels that can survive faults. In that framework, we investigate the relevance of approaches relying on numerical techniques, which might be combined with more classical techniques for real largescale parallel implementations. Our main objective is to provide robust resilient schemes so that the solver may keep converging in the presence of the hard fault without restarting the calculation from scratch. For this purpose, we study interpolationrestart (IR) strategies. For a given numerical scheme, the IR strategies consist of extracting relevant information from available data after a fault. After data extraction, a wellselected part of the missing data is regenerated through interpolation strategies to constitute a meaningful input to restart the numerical algorithm. In this paper, we revisit a few stateoftheart methods in numerical linear algebra in the light of our IR strategies. Through a few numerical experiments, we illustrate the respective robustness of the resulting resilient schemes with respect to the MTBF via qualitative illustrations. 
 
Language




English
 
Source (journal)




Lecture notes in computer science.  Berlin, 1973, currens
 
Source (book)




12th International Conference on HighPerformance Computing for, Computational Science (VECPAR), JUN 2830, 2016, Porto, PORTUGAL
 
Publication




Cham
:
Springer international publishing ag
,
2017
 
ISBN




9783319619828
9783319619811
9783319619811
 
DOI




10.1007/9783319619828_3
 
Volume/pages




10150
(2017)
, p. 1118
 
ISI




000441374400003
 
Full text (Publisher's DOI)




 
