Catalogue en ligne IGN

Descripteur

Termes IGN > 1-Candidats > reconnaissance de gestes

reconnaissance de gestes

Documents disponibles dans cette catégorie (6)

Ajouter le résultat dans votre panier Visionner les documents numériques Affiner la recherche Interroger des sources externes

Etendre la recherche sur niveau(x) vers le bas

Analysis of pedestrian movements and gestures using an on-board camera to predict their intentions / Joseph Gesnouin (2022)

Public

Titre : Analysis of pedestrian movements and gestures using an on-board camera to predict their intentions
Titre original : Analyse des mouvements et gestes des piétons via caméra embarquée pour la prédiction de leurs intentions
Type de document : Thèse/HDR
Auteurs : Joseph Gesnouin, Auteur ; Fabien Moutarde, Directeur de thèse
Editeur : Paris : Université Paris Sciences et Lettres
Année de publication : 2022
Importance : 171 p.
Format : 21 x 30 cm
Note générale : bibliographie
Thèse de doctorat de l'Université Paris Sciences et Lettres, Préparée à MINES ParisTech, Spécialité
Informatique temps réel, robotique et automatique
Langues : Anglais (eng)
Descripteur : [Vedettes matières IGN] Intelligence artificielle
[Termes IGN] apprentissage profond
[Termes IGN] attention (apprentissage automatique)
[Termes IGN] classification par réseau neuronal convolutif
[Termes IGN] classification par réseau neuronal récurrent
[Termes IGN] estimation de pose
[Termes IGN] image RVB
[Termes IGN] instrument embarqué
[Termes IGN] navigation autonome
[Termes IGN] piéton
[Termes IGN] reconnaissance de gestes
[Termes IGN] réseau neuronal de graphes
[Termes IGN] squelettisation
[Termes IGN] trajectoire (véhicule non spatial)
[Termes IGN] vision par ordinateur

Index. décimale : THESE Thèses et HDR
Résumé : (auteur) The autonomous vehicle (AV) is a major challenge for the mobility of tomorrow. Progress is being made every day to achieve it; however, many problems remain to be solved to achieve a safe outcome for the most vulnerable road users (VRUs). One of the major challenge faced by AVs is the ability to efficiently drive in urban environments. Such a task requires interactions between autonomous vehicles and VRUs to resolve traffic ambiguities. In order to interact with VRUs, AVs must be able to understand their intentions and predict their incoming actions. In this dissertation, our work revolves around machine learning technology as a way to understand and predict human behaviour from visual signals and more specifically pose kinematics. Our goal is to propose an assistance system to the AV that is lightweight, scene-agnostic that could be easily implemented in any embedded devices with real-time constraints. Firstly, in the gesture and action recognition domain, we study and introduce different representations for pose kinematics, based on deep learning models as a way to efficiently leverage their spatial and temporal components while staying in an euclidean grid-space. Secondly, in the autonomous driving domain, we show that it is possible to link the posture, the walking attitude and the future behaviours of the protagonists of a scene without using the contextual information of the scene (zebra crossing, traffic light...). This allowed us to divide by a factor of 20 the inference speed of existing approaches for pedestrian intention prediction while keeping the same prediction robustness. Finally, we assess the generalization capabilities of pedestrian crossing predictors and show that the classical train-test sets evaluation for pedestrian crossing prediction, i.e., models being trained and tested on the same dataset, is not sufficient to efficiently compare nor conclude anything about their applicability in a real-world scenario. To make the research field more sustainable and representative of the real advances to come. We propose new protocols and metrics based on uncertainty estimates under domain-shift in order to reach the end-goal of pedestrian crossing behavior predictors: vehicle implementation.
Note de contenu : 1- Introduction
2- Human activity recognition with pose-driven deep learning models
3- From action recognition to pedestrian discrete intention prediction
4- Assessing the generalization of pedestrian crossing predictors
5- Conclusion

Numéro de notice : 24066
Affiliation des auteurs : non IGN
Thématique : INFORMATIQUE
Nature : Thèse française
Note de thèse : Thèse de Doctorat : Informatique temps réel, robotique et automatique : Paris Sciences et Lettres : 2022
DOI : sans
En ligne : https://tel.hal.science/tel-03813520
Format de la ressource électronique : URL
Permalink : https://documentation.ensg.eu/index.php?lvl=notice_display&id=102091

Scene understanding and gesture recognition for human-machine interaction / Naina Dhingra (2022)

Public

Titre : Scene understanding and gesture recognition for human-machine interaction
Type de document : Thèse/HDR
Auteurs : Naina Dhingra, Auteur
Editeur : Zurich : Eidgenossische Technische Hochschule ETH - Ecole Polytechnique Fédérale de Zurich EPFZ
Année de publication : 2022
Note générale : Bibliographie
A dissertation submitted to attain the degree of Doctor of Sciences of ETH Zurich
Langues : Français (fre)
Descripteur : [Vedettes matières IGN] Intelligence artificielle
[Termes IGN] apprentissage profond
[Termes IGN] attention (apprentissage automatique)
[Termes IGN] classification orientée objet
[Termes IGN] classification par réseau neuronal convolutif
[Termes IGN] classification par séparateurs à vaste marge
[Termes IGN] compréhension de l'image
[Termes IGN] image RVB
[Termes IGN] interaction homme-machine
[Termes IGN] oculométrie
[Termes IGN] reconnaissance automatique
[Termes IGN] reconnaissance de formes
[Termes IGN] reconnaissance de gestes
[Termes IGN] réseau neuronal récurrent
[Termes IGN] scène
[Termes IGN] vision par ordinateur

Résumé : (auteur) Scene understanding and gesture recognition are useful for a myriad of applications such as human-robotic interaction, assisting blind and visually impaired people, advanced driver assistance systems, and autonomous driving. To work autonomously in real-world environments, automatic systems need to deliver non-verbal information to enhance the verbal communication in particular for blind people. We are exploring the holistic approach for providing the scene as well as gesture related information. We propose that incorporating attention mechanisms in neural networks which behave similarly to attention in the human brain, and conducting an integrated study using neural networks in real-time can yield significant improvements in the scene and gesture understanding, thereby enhancing the user experience. In this thesis, we investigate the understanding of visual scenes and gestures. We explore these two areas, in particular, by proposing novel architectures, training methods, user studies, and thorough evaluations. We show that, for deep learning approaches, attention or self attention mechanisms improve and push the boundaries of network performance for different tasks in consideration. We suggest that the various kinds of gestures can complement and supplement each other’s information to better understand non-verbal conversation; hence integrated gestures comprehension is useful. First, we focus on visual scene understanding using scene graph generation. We propose, BGT-Net, a new network that uses an object detection model with 1) bidirectional gated recurrent units for object-object communication and 2) transformer encoders including self attention to classify the objects and their relationships. We address the problem of bias caused by the long tailed distribution in the dataset. This enables the network to perform even for the unseen objects or relationships in the dataset. Second, we propose to learn hand gesture recognition from RGB and RGB-D videos using attention learning. We present a novel architecture based on residual connections and an attention mechanism. Our approach successfully detects hand gestures when evaluated on three open-source datasets. Third, we explore pointing gesture recognition and localization using open-source software, i.e. OpenPtrack which uses a deep learning based iii network to track multi-persons in the scene. We use a Kinect sensor as an input device and conduct a user study with 26 users to evaluate the system using two setup types. Fourth, we propose a technique to perform eye gaze tracking using OpenFace which is based on a deep learning model and RGB webcam. We use support vector machine regression to estimate the position of eye gaze on the screen. In a study, we evaluate the system with 28 users and show that this system can perform similarly to commercially expensive eye trackers. Finally, we focus on 3D head pose estimation using two models: 1)headPosr includes residual connections for the base network followed by a transformer encoder. It outperforms existing models but has a drawback of being computationally expensive; 2) lwPosr uses depthwise separable convolutions and transformer encoders. It is a two stream network in fine-grained fashion to estimate the three angles of the head pose. We demonstrate that this method is able to predict head poses better than state-of-the-art lightweight networks.
Note de contenu : 1- Introduction
2- Background
3- State of the art
4- Scene graph generation
5- 3D hand gesture recognition
6- Pointing gesture recognition
7- Eye-gaze tracking
8- Head pose estimation
9- Lightweight head pose estimation
10- Summary
Numéro de notice : 24039
Affiliation des auteurs : non IGN
Thématique : IMAGERIE/INFORMATIQUE
Nature : Thèse étrangère
Note de thèse : PhD Thesis : Sciences : ETH Zurich :2022
DOI : sans
En ligne : https://www.research-collection.ethz.ch/handle/20.500.11850/559347
Format de la ressource électronique : URL
Permalink : https://documentation.ensg.eu/index.php?lvl=notice_display&id=101876

Semi-supervised joint learning for hand gesture recognition from a single color image / Chi Xu in Sensors, vol 21 n° 3 (February 2021)

Public

[article]
inSensors > vol 21 n° 3 (February 2021) . - n° 1007
Titre : Semi-supervised joint learning for hand gesture recognition from a single color image
Type de document : Article/Communication
Auteurs : Chi Xu, Auteur ; Yunkai Jiang, Auteur ; Jun Zhou, Auteur ; et al., Auteur
Année de publication : 2021
Article en page(s) : n° 1007
Note générale : bibliographie
Langues : Anglais (eng)
Descripteur : [Vedettes matières IGN] Traitement d'image optique
[Termes IGN] apprentissage profond
[Termes IGN] apprentissage semi-dirigé
[Termes IGN] détection d'objet
[Termes IGN] estimation de pose
[Termes IGN] image en couleur
[Termes IGN] jeu de données
[Termes IGN] reconnaissance de gestes

Résumé : (auteur) Hand gesture recognition and hand pose estimation are two closely correlated tasks. In this paper, we propose a deep-learning based approach which jointly learns an intermediate level shared feature for these two tasks, so that the hand gesture recognition task can be benefited from the hand pose estimation task. In the training process, a semi-supervised training scheme is designed to solve the problem of lacking proper annotation. Our approach detects the foreground hand, recognizes the hand gesture, and estimates the corresponding 3D hand pose simultaneously. To evaluate the hand gesture recognition performance of the state-of-the-arts, we propose a challenging hand gesture recognition dataset collected in unconstrained environments. Experimental results show that, the gesture recognition accuracy of ours is significantly boosted by leveraging the knowledge learned from the hand pose estimation task.
Numéro de notice : A2021-160
Affiliation des auteurs : non IGN
Thématique : IMAGERIE
Nature : Article
nature-HAL : ArtAvecCL-RevueIntern
DOI : 10.3390/s21031007
Date de publication en ligne : 02/02/2021
En ligne : https://doi.org/10.3390/s21031007
Format de la ressource électronique : url article
Permalink : https://documentation.ensg.eu/index.php?lvl=notice_display&id=97076

[article]

Space-time tree ensemble for action recognition and localization / Shugao Ma in International journal of computer vision, vol 126 n° 2-4 (April 2018)

Public

[article]
inInternational journal of computer vision > vol 126 n° 2-4 (April 2018) . - pp 314 - 332
Titre : Space-time tree ensemble for action recognition and localization
Type de document : Article/Communication
Auteurs : Shugao Ma, Auteur ; Jianming Zhang, Auteur ; Stan Sclaroff, Auteur ; et al., Auteur
Année de publication : 2018
Article en page(s) : pp 314 - 332
Note générale : Bibliographie
Langues : Anglais (eng)
Descripteur : [Vedettes matières IGN] Traitement d'image
[Termes IGN] arbre (mathématique)
[Termes IGN] géopositionnement
[Termes IGN] reconnaissance de gestes

Résumé : (Auteur) Human actions are, inherently, structured patterns of body movements. We explore ensembles of hierarchical spatio-temporal trees, discovered directly from training data, to model these structures for action recognition and spatial localization. Discovery of frequent and discriminative tree structures is challenging due to the exponential search space, particularly if one allows partial matching. We address this by first building a concise action word vocabulary via discriminative clustering of the hierarchical space-time segments, which is a two-level video representation that captures both static and non-static relevant space-time segments of the video. Using this vocabulary we then utilize tree mining with subsequent tree clustering and ranking to select a compact set of discriminative tree patterns. Our experiments show that these tree patterns, alone, or in combination with shorter patterns (action words and pairwise patterns) achieve promising performance on three challenging datasets: UCF Sports, HighFive and Hollywood3D. Moreover, we perform cross-dataset validation, using trees learned on HighFive to recognize the same actions in Hollywood3D, and using trees learned on UCF-Sports to recognize and localize the similar actions in JHMDB. The results demonstrate the potential for cross-dataset generalization of the trees our approach discovers.
Numéro de notice : A2018-407
Affiliation des auteurs : non IGN
Thématique : IMAGERIE
Nature : Article
nature-HAL : ArtAvecCL-RevueIntern
DOI : 10.1007/s11263-016-0980-8
Date de publication en ligne : 02/02/2017
En ligne : https://doi.org/10.1007/s11263-016-0980-8
Format de la ressource électronique : URL article
Permalink : https://documentation.ensg.eu/index.php?lvl=notice_display&id=90880

[article]

Tubelets : Unsupervised action proposals from spatiotemporal super-voxels / Mihir Jain in International journal of computer vision, vol 124 n° 3 (15 September 2017)

Public

[article]
inInternational journal of computer vision > vol 124 n° 3 (15 September 2017) . - pp 287 - 311
Titre : Tubelets : Unsupervised action proposals from spatiotemporal super-voxels
Type de document : Article/Communication
Auteurs : Mihir Jain, Auteur ; Jan van Gemert, Auteur ; Hervé Jégou, Auteur ; Patrick Bouthemy, Auteur ; Cees G. M. Snoek, Auteur
Année de publication : 2017
Article en page(s) : pp 287 - 311
Note générale : Bibliographie
Langues : Anglais (eng)
Descripteur : [Vedettes matières IGN] Traitement d'image
[Termes IGN] données spatiotemporelles
[Termes IGN] reconnaissance de gestes
[Termes IGN] rectangle englobant minimum
[Termes IGN] séquence d'images
[Termes IGN] voxel

Résumé : (Auteur) This paper considers the problem of localizing actions in videos as sequences of bounding boxes. The objective is to generate action proposals that are likely to include the action of interest, ideally achieving high recall with few proposals. Our contributions are threefold. First, inspired by selective search for object proposals, we introduce an approach to generate action proposals from spatiotemporal super-voxels in an unsupervised manner, we call them Tubelets. Second, along with the static features from individual frames our approach advantageously exploits motion. We introduce independent motion evidence as a feature to characterize how the action deviates from the background and explicitly incorporate such motion information in various stages of the proposal generation. Finally, we introduce spatiotemporal refinement of Tubelets, for more precise localization of actions, and pruning to keep the number of Tubelets limited. We demonstrate the suitability of our approach by extensive experiments for action proposal quality and action localization on three public datasets: UCF Sports, MSR-II and UCF101. For action proposal quality, our unsupervised proposals beat all other existing approaches on the three datasets. For action localization, we show top performance on both the trimmed videos of UCF Sports and UCF101 as well as the untrimmed videos of MSR-II.
Numéro de notice : A2017-812
Affiliation des auteurs : non IGN
Thématique : IMAGERIE
Nature : Article
DOI : 10.1007/s11263-017-1023-9
En ligne : https://doi.org/10.1007/s11263-017-1023-9
Format de la ressource électronique : URL article
Permalink : https://documentation.ensg.eu/index.php?lvl=notice_display&id=89252

[article]

Multi-Touch Interaction Technique / Alexandre Viard (2010)

Permalink