An implementation of using remote memory to checkpoint processes

Authors
Citation
St. Hsu et Rc. Chang, An implementation of using remote memory to checkpoint processes, SOFTW PR EX, 29(11), 1999, pp. 985-1004
Citations number
30
Categorie Soggetti
Computer Science & Engineering
Journal title
SOFTWARE-PRACTICE & EXPERIENCE
ISSN journal
00380644 → ACNP
Volume
29
Issue
11
Year of publication
1999
Pages
985 - 1004
Database
ISI
SICI code
0038-0644(199909)29:11<985:AIOURM>2.0.ZU;2-M
Abstract
Process checkpointing is a procedure which periodically saves the process s tates into stable storage. Most checkpointing facilities select hard disks for archiving. However, the disk seek time is limited by the speed of the r ead-write heads, thus checkpointing process into a local disk requires exte nsive disk bandwidth. In this paper, we propose an approach that exploits t he memory on idle workstations as a faster storage for checkpointing. In ou r scheme, autonomous machines which submit jobs to the computation server o ffer their physical memory to the server for job checkpointing. Eight appli cations are used to measure the remote memory performance in four checkpoin ting policies. Experimental results show that remote memory reduces at leas t 34.5 per cent of the overhead for sequential checkpointing and 32.1 per c ent for incremental checkpointing. Additionally, to checkpoint a running pr ocess into a remote memory requires only 60 per cent of the local disk chec kpoint latency time. Copyright (C) 1999 John Wiley & Sons, Ltd.