Data collection for failure prediction toward exascale supercomputers
CSTR:
Author:
Affiliation:

Clc Number:

Fund Project:

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    Aimed at an exascale supercomputer, an FPDC (failure prediction data collection framework) was introduced to fully collect the data related to the state of compute nodes’ health. An adaptive multi-layer data aggregation method was presented for data aggregation with less overhead. Extensive experiments, by implementing FPDC on TH-1A,indicate that the FPDC has the advantage of high efficiency and good scalability.

    Reference
    Related
    Cited by
Get Citation

HU Wei, JIANG Yanhuang, LIU Guangming, DONG Wenrui, CUI Xinwu. Data collection for failure prediction toward exascale supercomputers[J]. Journal of National University of Defense Technology,2016,38(1):93-100.

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:April 09,2015
  • Revised:
  • Adopted:
  • Online: March 07,2016
  • Published:
Article QR Code