利用多时间尺度卷积的视频行为识别网络

doi:10.11887/j.cn.202303016

首页 > 过刊浏览>2023年第45卷第3期 >136-145. DOI:10.11887/j.cn.202303016

利用多时间尺度卷积的视频行为识别网络
DOI:
                        10.11887/j.cn.202303016
                    
作者:
                        
                        
                    
作者单位:(1. 武汉理工大学 安全科学与应急管理学院, 湖北 武汉 430070;2. 长江科学院, 湖北 武汉 430010;3. 武昌理工学院 人工智能学院, 湖北 武汉 430223)
作者简介:陈西江(1985—),男,安徽淮南人,副教授,博士,硕士生导师,E-mail:cxj_0421@163.com
通讯作者:
中图分类号:TP391.4
基金项目:国家自然科学基金资助项目(42171428)；重庆市技术创新与应用发展专项面上资助项目(cstc2019jscx-msxmX0051)；长江科学院开放研究基金资助项目(CKWV2019758/KY)

Video behavior recognition network using multi time-scale convolution

Author:

Affiliation:

(1. School of Safety Science and Emergency Management, Wuhan University of Technology, Wuhan 430070, China;2. Changjiang River Scientific Research Institute, Wuhan 430010, China;3. School of Artificial Intelligence, Wuchang University of Technology, Wuhan 430223, China)

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

基于2D的行为识别网络通常融合多张视频帧的分类结果识别不同的行为,但其在卷积过程中缺少对时空特征提取。针对该问题,基于时间位移模块(temporal shift module,TSM)的思想设计了一组多时间尺度卷积,包含不同设计的卷积核以提取融合不同时间尺度的时空信息。通过控制多时间尺度卷积嵌入ResNet50网络的位置及其模块的参数设置,寻找最优的基于多时间尺度卷积的行为识别网络。使用PyTorch深度学习框架训练模型,在大型开源数据集Something-Somethingv2上进行了实验研究。结果表明,基于多时间尺度卷积的行为识别网络对行为识别准确率达到了59.47%,优于TSM等网络。

Abstract:

The behavior recognition network based on 2D convolutional usually integrates classification results of multiple video frames to recognize different behaviors, but it can′t extract space-time feature using the 2D convolution kernels. To solve this problem, MTSC (multi time-scale convolution) was proposed based on TSM(temporal shift module), which contained convolution kernels of different scales to fuse the space-time feature from different time scales. By controlling the position that inserting MTSC into ResNet50 network and the parameter setting of MTSC, the optimal behavior recognition network based on MTSC was discussed. Using the PyTorch training model, an experimental study was conducted on a large open source dataset, Something-Something v2. The results show that the behavior recognition network based on MTSC achieves 59.47% Top-1 accuracy, and outperform TSM and other behavior recognition networks.

参考文献

相似文献

引证文献

引用本文

陈西江,梁全恩,韩贤权,等.利用多时间尺度卷积的视频行为识别网络[J].国防科技大学学报,2023,45(3):136-145.
CHEN Xijiang, LIANG Quanen, HAN Xianquan, et al. Video behavior recognition network using multi time-scale convolution[J]. Journal of National University of Defense Technology,2023,45(3):136-145.

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2021-06-01
最后修改日期:
录用日期:
在线发布日期: 2023-06-07
出版日期: 2023-06-28

首页

期刊介绍

投稿指南

编委会

期刊订阅

联系我们

Email订阅

Rss

English

引用本文

分享

文章指标

历史

文章二维码