Designing fault-tolerant distributed archives for picture archiving and communication systems

Citation
R. Mendenhall et al., Designing fault-tolerant distributed archives for picture archiving and communication systems, J DIGIT IM, 14(2), 2001, pp. 80-83
Categorie Soggetti
Radiology ,Nuclear Medicine & Imaging
Journal title
JOURNAL OF DIGITAL IMAGING
ISSN journal
08971889 → ACNP
Volume
14
Issue
2
Year of publication
2001
Supplement
1
Pages
80 - 83
Database
ISI
SICI code
0897-1889(200106)14:2<80:DFDAFP>2.0.ZU;2-A
Abstract
Purpose: Distributed archives in a picture archiving and communication syst em (PACS) environment can provide added fault tolerance and fail-over capab ility, as well as increased load capacity at a more economical price than t raditional "high-availability" systems. Systems can be configured with vary ing levels of fault tolerance, depending on the amount of redundancy desire d. There is, however, a direct correlation between the level of hardware re dundancy and cost to implement. This presentation details the system design for fault-tolerant distributed archives as well as several options for red undancy, referencing implementation of a fault-tolerant archive system at t he University of Utah. Methods: The distributed archive system described he re is based on Image Devices' image archive software, which can be implemen ted on multiple individual archive servers in order to distribute archive f unctionality and operational load. The configuration and implementation of the individual servers together make up the distributed archive system and does not impact the ability of the system to be scaled to meet future requi rements. Several implementation and configuration options exist, including the ability for servers to maintain replicated data-bases containing patein t and image information. Thus, each archive can be aware of all information and the location of this information within the distributed archive system . Results: The goal is to produce systems that will still be operational in the event of any single point of failure, ie, a network connection failure between facilities or the failure of a single archive server within the di stributed system. During normal operation, workload for image acquisition, image routing and image query requests will be distributed between the arch ive servers. If the system is deployed in a multifacility environment, each archive server can be configured to be responsible for the acquisition and image distribution management within that server's local facility. If the system is deployed in a single facility environment, load can be distribute d evenly between the archive servers based on an understanding of the workl oad requirements generated be each acquisition and display device in the sy stem. In the event that an archive server fails, other archive servers with in the system will have the ability to provide some or all of the failed se rver's functionality. The degree of fail-over capability Is dependent on th e archive server's configuration as well as hardware redundancy employed. T hree levels of fault-tolerant design can be achieved with this system archi tecture: (1) duplicate work capability only; (2) duplicate work capability and short-term image cache; (3) duplicate work capability, short-term image cache, and longterm image archival. Using the basic fault-tolerant design above, we have implemented a multifacility distributed archive system at th e University of Utah. This system was implemented at a fraction of the cost of true "high-availability" archive architectures yet provides constant up time for the PACS system. If the network connection between the two locati ons goes down, each site is still fully functional for soft copy read, as w ell as image acquisition and distribution. If either of the archive servers goes down, the image sources are redirected to the other archive server. T he operational server then handles image distribution for both locations. A ccess to images in the short-term image cache is available to both archive servers and is not affected by loss of the network connection or remote ser ver. Because there is ony one long-term archive device, the ability to retrieve images from long-term storage is the only function compromised by a network or server failure. Conclusion: By implementing distributed archives in a P ACS environment, it is possible to achieve a highly fault-tolerant system w ithout the expense of high-availability hardware and software. The design c oncepts outlined here can be applied to any PACS system that supports distr ibuted archive functionality. Copyright (C) 2001 by W.B. Saunders Company.