引用本文: | 陈西江,梁全恩,韩贤权,等.利用多时间尺度卷积的视频行为识别网络.[J].国防科技大学学报,2023,45(3):136-145.[点击复制] |
CHEN Xijiang,LIANG Quanen,HAN Xianquan,et al.Video behavior recognition network using multi time-scale convolution[J].Journal of National University of Defense Technology,2023,45(3):136-145[点击复制] |
|
|
|
本文已被:浏览 4217次 下载 3133次 |
利用多时间尺度卷积的视频行为识别网络 |
陈西江1,梁全恩1,韩贤权2,安庆3 |
(1. 武汉理工大学 安全科学与应急管理学院, 湖北 武汉 430070;2. 长江科学院, 湖北 武汉 430010;3. 武昌理工学院 人工智能学院, 湖北 武汉 430223)
|
摘要: |
基于2D的行为识别网络通常融合多张视频帧的分类结果识别不同的行为,但其在卷积过程中缺少对时空特征提取。针对该问题,基于时间位移模块(temporal shift module,TSM)的思想设计了一组多时间尺度卷积,包含不同设计的卷积核以提取融合不同时间尺度的时空信息。通过控制多时间尺度卷积嵌入ResNet50网络的位置及其模块的参数设置,寻找最优的基于多时间尺度卷积的行为识别网络。使用PyTorch深度学习框架训练模型,在大型开源数据集Something-Somethingv2上进行了实验研究。结果表明,基于多时间尺度卷积的行为识别网络对行为识别准确率达到了59.47%,优于TSM等网络。 |
关键词: 行为识别 卷积神经网络 分类 残差神经网络 PyTorch |
DOI:10.11887/j.cn.202303016 |
投稿日期:2021-06-01 |
基金项目:国家自然科学基金资助项目(42171428);重庆市技术创新与应用发展专项面上资助项目(cstc2019jscx-msxmX0051);长江科学院开放研究基金资助项目(CKWV2019758/KY) |
|
Video behavior recognition network using multi time-scale convolution |
CHEN Xijiang1, LIANG Quanen1, HAN Xianquan2, AN Qing3 |
(1. School of Safety Science and Emergency Management, Wuhan University of Technology, Wuhan 430070, China;2. Changjiang River Scientific Research Institute, Wuhan 430010, China;3. School of Artificial Intelligence, Wuchang University of Technology, Wuhan 430223, China)
|
Abstract: |
The behavior recognition network based on 2D convolutional usually integrates classification results of multiple video frames to recognize different behaviors, but it can′t extract space-time feature using the 2D convolution kernels. To solve this problem, MTSC (multi time-scale convolution) was proposed based on TSM(temporal shift module), which contained convolution kernels of different scales to fuse the space-time feature from different time scales. By controlling the position that inserting MTSC into ResNet50 network and the parameter setting of MTSC, the optimal behavior recognition network based on MTSC was discussed. Using the PyTorch training model, an experimental study was conducted on a large open source dataset, Something-Something v2. The results show that the behavior recognition network based on MTSC achieves 59.47% Top-1 accuracy, and outperform TSM and other behavior recognition networks. |
Keywords: behavior recognition convolution neural network classification residual neural network PyTorch |
|
|
|
|
|