Technologies for memory optimization for large model training on domestic platforms
CSTR:
Author:
Affiliation:

1.College of Computer Science and Technology, National University of Defense Technology, Changsha 410073 , China ; 2.National Key Laboratory of Parallel and Distributed Computing, National University of Defense Technology, Changsha 410073 , China

Clc Number:

TP302.7

Fund Project:

undefined

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    In the current landscape of large-scale model training, the contradiction between the exponential growth of model parameters and the slow increase in GPU memory capacity has become increasingly prominent. Among memory optimization technologies, recomputation and computational offloading reduce GPU memory overhead by trading time for space. The development trends of recomputation and computational offloading were analyzed in this article. Then, the hardware bandwidth bottlenecks and software ecosystem adaptation challenges faced by memory optimization were analyzed, with a focus on the heterogeneous architecture characteristics of domestic artificial intelligence platforms. It also delved into the memory optimization technologies for large model training on domestic platforms such as MT-3000, with the aim of providing technical references for large model training on domestic platforms.

    Reference
    Related
    Cited by
Get Citation

李东升, 唐宇, 乔林波, 等. 面向国产平台的大模型训练显存优化技术[J]. 国防科技大学学报, 2026, 48(2): 284-295.

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:December 06,2025
  • Revised:
  • Adopted:
  • Online: April 08,2026
  • Published:
Article QR Code