Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

Athanasia: A User-Transparent and Fault-Tolerant System for Parallel Applications

Full metadata record
DC Field Value Language
dc.contributor.authorJung, Hyungsoo-
dc.contributor.authorHan, Hyuck-
dc.contributor.authorYeom, Heon Y.-
dc.contributor.authorKang, Sooyong-
dc.date.accessioned2022-07-16T18:52:30Z-
dc.date.available2022-07-16T18:52:30Z-
dc.date.created2021-05-12-
dc.date.issued2011-10-
dc.identifier.issn1045-9219-
dc.identifier.urihttps://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/167475-
dc.description.abstractThis article presents Athanasia, a user-transparent and fault-tolerant system, for parallel applications running on large-scale cluster systems. Cluster systems have been regarded as a de facto standard to achieve multitera-flop computing power. These cluster systems, as we know, have an inherent failure factor that can cause computation failure. The reliability issue in parallel computing systems, therefore, has been studied for a relatively long time in the literature, and we have seen many theoretical promises arise from the extensive research. However, despite the rigorous studies, practical and easily deployable fault-tolerant systems have not been successfully adopted commercially. Athanasia is a user-transparent checkpointing system for a fault-tolerant Message Passing Interface (MPI) implementation that is primarily based on the sync-and-stop protocol. Athanasia supports three critical functionalities that are necessary for fault tolerance: a light-weight failure detection mechanism, dynamic process management that includes process migration, and a consistent checkpoint and recovery mechanism. The main features of Athanasia are that it does not require any modifications to the application code and that it preserves many of the high performance characteristics of high-speed networks. Experimental results show that Athanasia can be a good candidate for practically deployable fault-tolerant systems in very-large and high-performance clusters and that its protocol can be applied to a variety of parallel communication libraries easily.-
dc.language영어-
dc.language.isoen-
dc.publisherIEEE COMPUTER SOC-
dc.titleAthanasia: A User-Transparent and Fault-Tolerant System for Parallel Applications-
dc.typeArticle-
dc.contributor.affiliatedAuthorKang, Sooyong-
dc.identifier.doi10.1109/TPDS.2011.63-
dc.identifier.scopusid2-s2.0-80052314359-
dc.identifier.wosid000294162500006-
dc.identifier.bibliographicCitationIEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, v.22, no.10, pp.1653 - 1668-
dc.relation.isPartOfIEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS-
dc.citation.titleIEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS-
dc.citation.volume22-
dc.citation.number10-
dc.citation.startPage1653-
dc.citation.endPage1668-
dc.type.rimsART-
dc.type.docTypeArticle-
dc.description.journalClass1-
dc.description.isOpenAccessN-
dc.description.journalRegisteredClassscie-
dc.description.journalRegisteredClassscopus-
dc.relation.journalResearchAreaComputer Science-
dc.relation.journalResearchAreaEngineering-
dc.relation.journalWebOfScienceCategoryComputer Science, Theory & Methods-
dc.relation.journalWebOfScienceCategoryEngineering, Electrical & Electronic-
dc.subject.keywordPlusCluster analysis-
dc.subject.keywordPlusCluster computing-
dc.subject.keywordPlusFault detection-
dc.subject.keywordPlusFault tolerant computer systems-
dc.subject.keywordPlusMessage passing-
dc.subject.keywordPlusParallel architectures-
dc.subject.keywordAuthorUser transparency-
dc.subject.keywordAuthorfault tolerance-
dc.subject.keywordAuthormessage passing interface-
dc.subject.keywordAuthorparallel systems-
dc.subject.keywordAuthorMyrinet-
dc.subject.keywordAuthorInfiniBand-
dc.subject.keywordAuthorch_p4-
dc.identifier.urlhttps://ieeexplore.ieee.org/document/5710900-
Files in This Item
Go to Link
Appears in
Collections
서울 공과대학 > 서울 컴퓨터소프트웨어학부 > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Kang, Soo yong photo

Kang, Soo yong
COLLEGE OF ENGINEERING (SCHOOL OF COMPUTER SCIENCE)
Read more

Altmetrics

Total Views & Downloads

BROWSE