Aimed at an exascale supercomputer, an FPDC (failure prediction data collection framework) was introduced to fully collect the data related to the state of compute nodes’ health. An adaptive multi-layer data aggregation method was presented for data aggregation with less overhead. Extensive experiments, by implementing FPDC on TH-1A,indicate that the FPDC has the advantage of high efficiency and good scalability.
Reference
Related
Cited by
Get Citation
HU Wei, JIANG Yanhuang, LIU Guangming, DONG Wenrui, CUI Xinwu. Data collection for failure prediction toward exascale supercomputers[J]. Journal of National University of Defense Technology,2016,38(1):93-100.