引用本文: | 高莹,汪龙昕,田蕾,等.分组密码uBlock算法快速软件实现.[J].国防科技大学学报,2024,46(6):96-106.[点击复制] |
GAO Ying,WANG Longxin,TIAN Lei,et al.Fast software implementation of the block cipher uBlock algorithm[J].Journal of National University of Defense Technology,2024,46(6):96-106[点击复制] |
|
|
|
本文已被:浏览 391次 下载 173次 |
分组密码uBlock算法快速软件实现 |
高莹,汪龙昕,田蕾,胡洋,张宇鹏,严宇,伍前红 |
(北京航空航天大学 网络空间安全学院, 北京 100191)
|
摘要: |
为对国产分组密码算法uBlock进行软件优化,选择支持256 bit数据位宽的AVX2指令集,提高编译器自动优化等级,优化函数的调用过程,优化数据存储结构,综合使用高位并行、低延迟指令逻辑优化等方法实现单线程并行计算。通过使用这种有效的组合方法,uBlock-128/128算法、uBlock-128/256算法和uBlock-256/256算法单密钥短消息加密的速度较原代码分别提升269%、182%和49%。基于这些优化方法,uBlock-128/128、uBlock-128/256和uBlock-256/256三个算法版本均实现了单密钥场景与多密钥场景。 |
关键词: uBlock算法 AVX2指令集 并行运算 低延迟 快速软件实现 |
DOI:10.11887/j.cn.202406010 |
投稿日期:2022-04-06 |
基金项目:国家重点研发计划资助项目(2022YFB2701600);国家自然科学基金资助项目(61932011,61932011,61972017);北京市自然科学基金资助项目(M21033) |
|
Fast software implementation of the block cipher uBlock algorithm |
GAO Ying, WANG Longxin, TIAN Lei, HU Yang, ZHANG Yupeng, YAN Yu, WU Qianhong |
(School of Cyber Science and Technology, Beihang University, Beijing 100191, China)
|
Abstract: |
To optimize the software implementation of the domestic block cipher uBlock algorithm, the AVX2 instruction set supporting 256 bit data width was implemented, the automatic optimization level of the compiler was increased, optimizing the calling process of functions, and the methods of data storage structure optimization, high-level parallelism and low latency instruction logic optimization were used in order to implement parallel computing under the single-thread condition. Using this efficient combination method, the speed of single key short message encryption of uBlock-128/128 algorithm, uBlock-128/256 algorithm and uBlock-256/256 algorithm are 269%, 182% and 49% higher than the original code. Based on these optimization methods,the implementation of single-key scenario and multi-key scenario are given for three algorithm versions of uBlock-128/128, uBlock-128/256 and uBlock-256/256. |
Keywords: uBlock algorithm AVX2 instruction set parallel operation low latency fast software implementation |
|
|
|
|
|