K. Li et al., LOW-LATENCY, CONCURRENT CHECKPOINTING FOR PARALLEL PROGRAMS, IEEE transactions on parallel and distributed systems, 5(8), 1994, pp. 874-879
Citations number
30
Categorie Soggetti
System Science","Engineering, Eletrical & Electronic","Computer Science Theory & Methods
This short note presents the results of an implementation of several a
lgorithms for checkpointing and restarting parallel programs on shared
-memory multiprocessors. The algorithms are compared according to the
metrics of overall checkpointing time, overhead imposed by the checkpo
inter on the target program, and amount of time during which the check
pointer interrupts the target program. The best algorithm measured ach
ieves its efficiency through a variation of copy-on-write, which allow
s the most time-consuming operations of the checkpoint to be overlappe
d with the running of the program being checkpointed.