A review on deep-learning based egocentric action anticipation

Sareh Rowlands, Richard Wardle

Abstract


As autonomous systems become more embedded into our environments, the ability of these systems to anticipate the future actions of humans will become invaluable for providing assistance and safety measures. Egocentric action anticipation is a task in which a future activity must be predicted using first-person footage. This project is a survey that aims to provide an updated view of advancements within this task, to guide architecture design for future implementations. This survey has chosen a range of publicly available egocentric action anticipation models.


Keywords


Action anticipation, Egocentric vision, Deep learning, Transformers

Full Text:

PDF

References


Furnari A, Farinella GM. What would you expect? anticipating egocentric actions with rolling-unrolling lstms and modality attention. In: Proceedings of the International Conference on Computer Vision; 2019. p. 6252–6261.

Rodin I, Furnari A, Mavroeidis D, Farinella GM. Predicting the future from first person (egocentric) vision: A survey. Computer Vision and Image Understanding.

;211:103252.

Furnari A, Farinella GM. Towards streaming egocentric action anticipation. In: 26th International Conference on Pattern Recognition (ICPR). IEEE; 2022. p. 1250–1257.

Zatsarynna O, Abu Farha Y, Gall J. Multimodal temporal convolutional network for anticipating actions in egocentric videos. In: Proceedings of the Conference on Computer Vision and Pattern Recognition; 2021. p. 2249–2258.

Hochreiter S, Schmidhuber J. Long short-term memory. Neural computation. 1997;9(8):1735–1780.

Chung J, Gulcehre C, Cho K, Bengio Y.

Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:14123555. 2014;.

Bulat A, Perez Rua JM, Sudhakaran S, Martinez B, Tzimiropoulos G. Space-time mixing attention for video transformer. Advances in neural information processing systems. 2021;34:19594–19607.

Huang Y, Yang X, Xu C. Multimodal Global Relation Knowledge Distillation for Egocentric Action Anticipation. In: Proceedings of the 29th ACM International Conference on Multimedia; 2021. p. 245–254.

Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, et al. Attention is all you need. Advances in neural information processing systems. 2017;30.

Damen D, Doughty H, Farinella GM, Furnari A, Kazakos E, Ma J, et al. Rescaling egocentric vision: Collection, pipeline and challenges for epic-kitchens-100. International Journal of Computer Vision. 2022;p. 1–23.

Li Y, Liu M, Rehg JM. In the eye of beholder: Joint learning of gaze and actions in first person video. In: Proceedings of the European conference on computer vision; 2018. p. 619–635.

Wu Y, Zhu L, Wang X, Yang Y, Wu F.

Learning to anticipate egocentric actions by imagination. IEEE Transactions on Image Processing. 2020;30:1143–1152.

Roy D, Fernando B. Action anticipation using latent goal learning. In: Proceedings of the Winter Conference on Applications of Computer Vision; 2022. p. 2745–2753.

Qi Z, Wang S, Su C, Su L, Huang Q, Tian Q.

Self-regulated learning for egocentric video activity anticipation. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2021;.

Sener F, Singhania D, Yao A. Temporal aggregate representations for long-range video understanding. In: Proceedings of the European Conference on Computer Vision. Springer; 2020. p. 154–171.

Adaloglou N, Karagiannakos S. How Attention works in Deep Learning: understanding the attention mechanism in sequence models. Theaisummer com. 2019;.

Girdhar R, Grauman K. Anticipative video transformer. In: Proceedings of the international conference on computer vision; 2021. p. 13505–13515.

Damen D, Doughty H, Farinella GM, Fidler S, Furnari A, Kazakos E, et al. Scaling egocentric vision: The epic-kitchens dataset. In: Proceedings of the European conference on computer vision; 2018. p. 720–736.

Furnari A, Battiato S, Maria Farinella G. Leveraging uncertainty to rethink loss functions and evaluation measures for egocentric action anticipation. In: Proceedings of the European Conference on Computer Vision Workshops; 2018. p. 389–405.




DOI: https://doi.org/10.23954/osj.v10i1.3696

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.

Open Science Journal (OSJ) is multidisciplinary Open Access journal. We accept scientifically rigorous research, regardless of novelty. OSJ broad scope provides a platform to publish original research in all areas of sciences, including interdisciplinary and replication studies as well as negative results.