Going deeper into third-person action anticipation
Abstract
Analysing human actions in videos is gaining a great deal of interest in the field of computer vision. This paper explores and reviews different deep learning techniques used in third-person action anticipation. The task of action anticipation is divided into feature extraction and a predictive model for many architectures. This paper outlines a project plan for action anticipation in the third person using step-based activity. We will use several data sets to compare some of these different architectures based on their prediction accuracy and ability to predict actions in varying time frames.
Keywords
Full Text:
PDFReferences
Jain A, Singh A, Koppula HS, Soh S, Saxena A. Recurrent neural networks for driver activity an-ticipation via sensory-fusion architecture. In2016 IEEE International conference on robotics and au-tomation (ICRA) 2016 (pp. 3118-3125). https://arxiv.org/pdf/1509.05016
Zhen X. Feature extraction and representation for human action recognition (Doctoral dissertation, University of Sheffield). https://etheses.whiterose.ac.uk/5141/1/Thesis_ZhenXT_revised_final.pdf
Wang L, Xiong Y, Wang Z, Qiao Y, Lin D, Tang X, Van Gool L. Temporal segment networks: Towards good practices for deep action recognition. In European conference on computer vision 2016 (pp. 20-36). https://arxiv.org/pdf/1608.00859.pdf%EF%BC%89
Abu Farha Y, Richard A, Gall J. When will you do what? anticipating temporal occurrences of ac-tivities. In Proceedings of the IEEE conference on computer vision and pattern recognition 2018 (pp. 5343-5352). http://openaccess.thecvf.com/content_cvpr_2018/papers/Abu_Farha_When_Will_You_CVPR_2018_paper.pdf
Zhang H, Chen F, Yao A. Weakly-supervised dense action anticipation. arXiv preprint arXiv:2111.07593. 2021. https://arxiv.org/pdf/2111.07593
Morais R, Le V, Tran T, Venkatesh S. Learning to abstract and predict human actions. arXiv pre-print arXiv:2008.09234. 2020. https://arxiv.org/pdf/2008.09234
Martinez J, Black MJ, Romero J. On human motion prediction using recurrent neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition 2017 (pp. 2891-2900). https://openaccess.thecvf.com/content_cvpr_2017/papers/Martinez_On_Human_Motion_CVPR_2017_paper.pdf
Zhao R, Ali H, Van der Smagt P. Two-stream RNN/CNN for action recognition in 3D videos. In2017 IEEE/RSJ International conference on intelligent robots and systems (IROS) 2017 (pp. 4260-4267). https://arxiv.org/pdf/1703.09783
Du Y, Wang W, Wang L. Hierarchical recurrent neural network for skeleton based action recogni-tion. In Proceedings of the IEEE conference on computer vision and pattern recognition 2015 (pp. 1110-1118). https://www.cv-founda-tion.org/openaccess/content_cvpr_2015/papers/Du_Hierarchical_Recurrent_Neural_2015_CVPR_paper.pdf
Zhu W, Lan C, Xing J, Zeng W, Li Y, Shen L, Xie X. Co-occurrence feature learning for skeleton based action recognition using regularized deep LSTM networks. In Proceedings of the AAAI con-ference on artificial intelligence 2016 Mar 5 (Vol. 30, No. 1). https://ojs.aaai.org/index.php/AAAI/article/download/10451/10310
Schmidt RM. Recurrent neural networks (rnns): A gentle introduction and overview. arXiv preprint arXiv:1912.05911. 2019. https://arxiv.org/pdf/1912.05911
Furnari A, Farinella GM. What would you expect? anticipating egocentric actions with rolling-unrolling lstms and modality attention. In Proceedings of the IEEE/CVF International conference on computer vision 2019 (pp. 6252-6261). https://openaccess.thecvf.com/content_ICCV_2019/papers/Furnari_What_Would_You_Expect_Anticipating_Egocentric_Actions_With_Rolling-Unrolling_LSTMs_ICCV_2019_paper.pdf
Sener F, Singhania D, Yao A. Temporal aggregate representations for long-range video under-standing. In Proceedings of the IEEE European conference on computer vision, 2020 (pp. 154-171). https://arxiv.org/pdf/2006.00830
Ioffe S, Szegedy C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning 2015(pp. 448-456). http://proceedings.mlr.press/v37/ioffe15.pdf
Qi Z, Wang S, Su C, Su L, Huang Q, Tian Q. Self-regulated learning for egocentric video activity anticipation. IEEE transactions on pattern analysis and machine intelligence. 2021. https://arxiv.org/pdf/2111.11631
Vondrick C, Pirsiavash H, Torralba A. Anticipating the future by watching unlabeled video. arXiv preprint arXiv:1504.08023. 2015;2:2. http://www.cs.columbia.edu/~vondrick/prediction/paper.pdf
Ranzato M, Szlam A, Bruna J, Mathieu M, Collobert R, Chopra S. Video (language) modeling: a baseline for generative models of natural videos. arXiv preprint arXiv:1412.6604. 2014. https://arxiv.org/pdf/1412.6604
Kuehne H, Arslan A, Serre T. The language of actions: Recovering the syntax and semantics of goal-directed human activities. In Proceedings of the IEEE conference on computer vision and pat-tern recognition 2014 (pp. 780-787). https://www.cv-founda-tion.org/openaccess/content_cvpr_2014/papers/Kuehne_The_Language_of_2014_CVPR_paper.pdf
Stein S, McKenna SJ. Combining embedded accelerometers with computer vision for recognizing food preparation activities. In Proceedings of the 2013 ACM International joint conference on per-vasive and ubiquitous computing 2013 (pp. 729-738). https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=b9410401cec076baef045e83953f3ff24f25d149
DOI: https://doi.org/10.23954/osj.v8i2.3437
Refbacks
- There are currently no refbacks.
This work is licensed under a Creative Commons Attribution 4.0 International License.
Open Science Journal (OSJ) is multidisciplinary Open Access journal. We accept scientifically rigorous research, regardless of novelty. OSJ broad scope provides a platform to publish original research in all areas of sciences, including interdisciplinary and replication studies as well as negative results.