Abstract:To optimize the software implementation of the domestic block cipher uBlock algorithm, the AVX2 instruction set supporting 256 bit data width was implemented, the automatic optimization level of the compiler was increased, optimizing the calling process of functions, and the methods of data storage structure optimization, high-level parallelism and low latency instruction logic optimization were used in order to implement parallel computing under the single-thread condition. Using this efficient combination method, the speed of single key short message encryption of uBlock-128/128 algorithm, uBlock-128/256 algorithm and uBlock-256/256 algorithm are 269%, 182% and 49% higher than the original code. Based on these optimization methods,the implementation of single-key scenario and multi-key scenario are given for three algorithm versions of uBlock-128/128, uBlock-128/256 and uBlock-256/256.