Video behavior recognition network using multi time-scale convolution
Author:
Affiliation:

(1. School of Safety Science and Emergency Management, Wuhan University of Technology, Wuhan 430070, China;2. Changjiang River Scientific Research Institute, Wuhan 430010, China;3. School of Artificial Intelligence, Wuchang University of Technology, Wuhan 430223, China)

Clc Number:

TP391.4

Fund Project:

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    The behavior recognition network based on 2D convolutional usually integrates classification results of multiple video frames to recognize different behaviors, but it can′t extract space-time feature using the 2D convolution kernels. To solve this problem, MTSC (multi time-scale convolution) was proposed based on TSM(temporal shift module), which contained convolution kernels of different scales to fuse the space-time feature from different time scales. By controlling the position that inserting MTSC into ResNet50 network and the parameter setting of MTSC, the optimal behavior recognition network based on MTSC was discussed. Using the PyTorch training model, an experimental study was conducted on a large open source dataset, Something-Something v2. The results show that the behavior recognition network based on MTSC achieves 59.47% Top-1 accuracy, and outperform TSM and other behavior recognition networks.

    Reference
    Related
    Cited by
Get Citation
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:June 01,2021
  • Revised:
  • Adopted:
  • Online: June 07,2023
  • Published: June 28,2023
Article QR Code