Using Scene Context to Improve Action Recognition

Recently action recognition has been used for a variety of applications such as surveillance, smart homes, and in-home elder monitoring. Such applications usually focus on recognizing human actions without taking into account the different scenarios where the action occurs. In this paper, we propose a two-stream architecture that considers not only the movements to identify the action, but also the context scene where the action is performed. Experiments show that the scene context may improve the recognition of certain actions. Our proposed architecture is tested against baselines and the standard two-stream network.

To address the contextual awareness on action recognition, our approach aims to use the context of the scenes by fusing the information of the background with the information that identifies the action being performed in a two-stream architecture. Our architecture is composed by a stream containing a CNN to identify the action happening in the current frame, and a CNN to identify where (scene context) the action is happening in the current frame, as illustrated in in the image below.

The idea of this architecture is that the CNN responsible for the action stream focuses on the movements that are being performed to identify an action, while the CNN responsible for the scene stream focuses on the background where the action happens. For example, consider two actions that contain similar movements, such as baseball swing and tennis swing. While the action stream may identify the swing performed in the action, the scene stream identifies the context where the swing is happening, increasing the chance to correctly classify the action. Features from both streams are connected in a late fusion approach and a classifier predicts the action performed on the input image.

The complete paper can be seen here. When citing our work in academic papers, please use this BibTeX entry:

@inproceedings{MonteiroEtAl2018ciarp,
  author    = {Monteiro, Juarez and 
               Granada, Roger and 
               Meneguzzi, Felipe and 
               Barros, Rodrigo C},
  title     = {Using Scene Context to Improve Action Recognition},
  booktitle = {Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications},
  series    = {CIARP 2018},
  location  = {Madrid, Spain},
  pages     = {954--961},
  isbn      = {978-3-030-13469-3},
  doi       = {10.1007/978-3-030-13469-3_110},
  url       = {https://doi.org/10.1007/978-3-030-13469-3_110},
  month     = {November},
  year      = {2019},
  publisher = {Springer International Publishing}
}
}

Written on June 12, 2018