LOW-LATENCY, CONCURRENT CHECKPOINTING FOR PARALLEL PROGRAMS

Citation
K. Li et al., LOW-LATENCY, CONCURRENT CHECKPOINTING FOR PARALLEL PROGRAMS, IEEE transactions on parallel and distributed systems, 5(8), 1994, pp. 874-879
Citations number
30
Categorie Soggetti
System Science","Engineering, Eletrical & Electronic","Computer Science Theory & Methods
ISSN journal
10459219
Volume
5
Issue
8
Year of publication
1994
Pages
874 - 879
Database
ISI
SICI code
1045-9219(1994)5:8<874:LCCFPP>2.0.ZU;2-Y
Abstract
This short note presents the results of an implementation of several a lgorithms for checkpointing and restarting parallel programs on shared -memory multiprocessors. The algorithms are compared according to the metrics of overall checkpointing time, overhead imposed by the checkpo inter on the target program, and amount of time during which the check pointer interrupts the target program. The best algorithm measured ach ieves its efficiency through a variation of copy-on-write, which allow s the most time-consuming operations of the checkpoint to be overlappe d with the running of the program being checkpointed.