Descripteur
Termes IGN > informatique > intelligence artificielle > vision par ordinateur
vision par ordinateurVoir aussi |
Documents disponibles dans cette catégorie (92)
Ajouter le résultat dans votre panier
Visionner les documents numériques
Affiner la recherche Interroger des sources externes
Etendre la recherche sur niveau(x) vers le bas
Titre : Robustness of visual SLAM techniques to light changing conditions : Influence of contrasted local features, multi-planar representations and multimodal image analysis Type de document : Thèse/HDR Auteurs : Xi Wang, Auteur ; Eric Marchand, Directeur de thèse Editeur : Rennes : Université de Rennes 1 Année de publication : 2022 Importance : 153 p. Format : 21 x 30 cm Note générale : bibliographie
Thèse de Doctorat de l'Université de Rennes 1, Spécialité InformatiqueLangues : Anglais (eng) Descripteur : [Vedettes matières IGN] Traitement d'image optique
[Termes IGN] apprentissage profond
[Termes IGN] cartographie et localisation simultanées
[Termes IGN] éclairage
[Termes IGN] estimation de pose
[Termes IGN] information sémantique
[Termes IGN] primitive géométrique
[Termes IGN] programmation linéaire
[Termes IGN] robotique
[Termes IGN] vision par ordinateurIndex. décimale : THESE Thèses et HDR Résumé : (auteur) The SLAM (Simultaneous Localization And Mapping) technique concentrates on localizing and recovering the environment in a simultaneous way and is one of the core functionalities of many industrial products such as augmented reality, where the device poses should be tracked in real-time; autonomous driving, where one needs to localize the vehicle in a pre-generated map or unknown environment; and even modern filmmaking workflow, where the relative camera position and orientation are critical for post-processing or real-time prevising for directors and actors to visualise the visual effects on the stage. Multiple difficulties in different levels can influence the final performance of robot agents’s SLAM task, as the pipeline is long and complicated from the real world physics to the required information such as agent poses and 3-D map, which help us visualize colourful graphics scenes in AR devices or make hard decisions on the highway for autonomous driving. Many solutions are proposed for addressing each problem, respectively, with the means from classic statistic probability models to the modern data-driven deep neural network. However, the quest of improving the robot’s robustness under dynamic and complicated environments perisists and becomes more and more significant and active for nowadays robotics research. The need for improving the robustness of robot agents is imminent and regarded as one of most imperative factors for deploying robots ubiquitously in our daily life. Under this context, this thesis tries to address a small drop in the ocean of the problem of SLAM robustness, yet in a very systematic view: we try to break down the SLAM system into different and inter-influential modules. Then use the concept of "divide and conquer" for answering possible questions within each module and wishing to contribute to the community and help improve the robustness of SLAM systems under complicated conditions. With the above objectives, the contributions of the thesis are stated as follows for tackling the robustness problem from multiple angles: 1) From the image feature angle, we proposed a multiple layered image structure for improving the performance of traditional local image features under extreme conditions. Furthermore, an optimization method on linear searching and mutual information assisted convex optimization are designed for tuning the optimal parameters with the proposed structure; 2) From the geometric primitive angle, we proposed a relative pose estimation and SLAM framework under the multiple planar assumption, by keypoint feature-based and template tracker based methods, respectively. We tried to achieve better performance of mapping and tracking simultaneously with the help of a more general planar assumption. 3) From the angle of relocalization of the SLAM system, the idea is to recover the already passed locations of the robot agent for lowering the overall estimation error or when the robot is in lost status. We proposed a binary graph structure for embedding spatial information and heterogeneous data formats such as depth image, semantic information etc. The proposed method enables robotics SLAM systems to relocalize themselves with a higher success rate even under different lighting, weather and seasonal conditions. Note de contenu : 1- Introduction
2- Résumé
3- Background on visual SLAM techniques
4- Related work
5- Organisation
6- Multiple layers image
7- Multi-planar relative pose estimation via superpixel
8- TT-SLAM
9- Binary graph descriptor for robust relocalization on heterogeneous data
ConclusionNuméro de notice : 24074 Affiliation des auteurs : non IGN Thématique : IMAGERIE/INFORMATIQUE Nature : Thèse française Note de thèse : Thèse de Doctorat : Informatique : Rennes 1 : 2022 Organisme de stage : IRISA DOI : sans En ligne : https://www.theses.fr/2022REN1S022 Format de la ressource électronique : URL Permalink : https://documentation.ensg.eu/index.php?lvl=notice_display&id=102162 Scaling up and evaluating surface reconstruction from point clouds of open scenes / Yanis Marchand (2022)
Titre : Scaling up and evaluating surface reconstruction from point clouds of open scenes Titre original : Passage à l'échelle et évaluation de la reconstruction de surface à partir de nuage de points de scènes ouvertes Type de document : Thèse/HDR Auteurs : Yanis Marchand , Auteur ; Bruno Vallet , Directeur de thèse ; Laurent Caraffa , Encadrant Editeur : Champs-sur-Marne [France] : Université Gustave Eiffel Année de publication : 2022 Note générale : bibliographie
Thèse de doctorat de l’Université Gustave Eiffel, Ecole doctorale n° 532, Math ́ematiques et Sciences et Technologies de l’Information et de la Communication (MSTIC), Spécialité de doctorat : Informatique - Unité de recherche : LASTIG (IGN)Langues : Français (fre) Descripteur : [Vedettes matières IGN] Lasergrammétrie
[Termes IGN] informatique
[Termes IGN] reconstruction d'objet
[Termes IGN] scène 3D
[Termes IGN] semis de points
[Termes IGN] traitement réparti
[Termes IGN] vision par ordinateurIndex. décimale : THESE Thèses et HDR Résumé : (auteur) Cette thèse de doctorat traite de deux aspects de la reconstruction de surface à partir de nuage de points. Premièrement, elle aborde le cas large échelle où un nuage de points est trop volumineux pour être stocké dans la mémoire d'un seul ordinateur. Nous présentons un algorithme distribué de bout en bout permettant de traiter des nuages de points arbitrairement grands tout en garantissant l'étanchéité de la surface produite. Deuxièmement, cette thèse contribue à l'évaluation de la reconstruction de surface de par la définition de deux protocoles. Le premier nécessite des données synthétiques alors que le deuxième peut être mis en place en ayant uniquement recours à des données provenant de capteurs. Ces protocoles et les nouvelles métriques qui leur sont associées permettent de quantifier la qualité des reconstructions avec un biais moins important que les approches utilisées jusqu'alors. Numéro de notice : 17739 Affiliation des auteurs : UGE-LASTIG (2020- ) Thématique : IMAGERIE/INFORMATIQUE Nature : Thèse française Note de thèse : Thèse : Informatique : Gustave Eiffel : 2022 Organisme de stage : LASTIG (IGN) nature-HAL : Thèse DOI : sans En ligne : https://tel.hal.science/tel-04031734 Format de la ressource électronique : URL Permalink : https://documentation.ensg.eu/index.php?lvl=notice_display&id=102877
Titre : Scene understanding and gesture recognition for human-machine interaction Type de document : Thèse/HDR Auteurs : Naina Dhingra, Auteur Editeur : Zurich : Eidgenossische Technische Hochschule ETH - Ecole Polytechnique Fédérale de Zurich EPFZ Année de publication : 2022 Note générale : Bibliographie
A dissertation submitted to attain the degree of Doctor of Sciences of ETH ZurichLangues : Français (fre) Descripteur : [Vedettes matières IGN] Intelligence artificielle
[Termes IGN] apprentissage profond
[Termes IGN] attention (apprentissage automatique)
[Termes IGN] classification orientée objet
[Termes IGN] classification par réseau neuronal convolutif
[Termes IGN] classification par séparateurs à vaste marge
[Termes IGN] compréhension de l'image
[Termes IGN] image RVB
[Termes IGN] interaction homme-machine
[Termes IGN] oculométrie
[Termes IGN] reconnaissance automatique
[Termes IGN] reconnaissance de formes
[Termes IGN] reconnaissance de gestes
[Termes IGN] réseau neuronal récurrent
[Termes IGN] scène
[Termes IGN] vision par ordinateurRésumé : (auteur) Scene understanding and gesture recognition are useful for a myriad of applications such as human-robotic interaction, assisting blind and visually impaired people, advanced driver assistance systems, and autonomous driving. To work autonomously in real-world environments, automatic systems need to deliver non-verbal information to enhance the verbal communication in particular for blind people. We are exploring the holistic approach for providing the scene as well as gesture related information. We propose that incorporating attention mechanisms in neural networks which behave similarly to attention in the human brain, and conducting an integrated study using neural networks in real-time can yield significant improvements in the scene and gesture understanding, thereby enhancing the user experience. In this thesis, we investigate the understanding of visual scenes and gestures. We explore these two areas, in particular, by proposing novel architectures, training methods, user studies, and thorough evaluations. We show that, for deep learning approaches, attention or self attention mechanisms improve and push the boundaries of network performance for different tasks in consideration. We suggest that the various kinds of gestures can complement and supplement each other’s information to better understand non-verbal conversation; hence integrated gestures comprehension is useful. First, we focus on visual scene understanding using scene graph generation. We propose, BGT-Net, a new network that uses an object detection model with 1) bidirectional gated recurrent units for object-object communication and 2) transformer encoders including self attention to classify the objects and their relationships. We address the problem of bias caused by the long tailed distribution in the dataset. This enables the network to perform even for the unseen objects or relationships in the dataset. Second, we propose to learn hand gesture recognition from RGB and RGB-D videos using attention learning. We present a novel architecture based on residual connections and an attention mechanism. Our approach successfully detects hand gestures when evaluated on three open-source datasets. Third, we explore pointing gesture recognition and localization using open-source software, i.e. OpenPtrack which uses a deep learning based iii network to track multi-persons in the scene. We use a Kinect sensor as an input device and conduct a user study with 26 users to evaluate the system using two setup types. Fourth, we propose a technique to perform eye gaze tracking using OpenFace which is based on a deep learning model and RGB webcam. We use support vector machine regression to estimate the position of eye gaze on the screen. In a study, we evaluate the system with 28 users and show that this system can perform similarly to commercially expensive eye trackers. Finally, we focus on 3D head pose estimation using two models: 1)headPosr includes residual connections for the base network followed by a transformer encoder. It outperforms existing models but has a drawback of being computationally expensive; 2) lwPosr uses depthwise separable convolutions and transformer encoders. It is a two stream network in fine-grained fashion to estimate the three angles of the head pose. We demonstrate that this method is able to predict head poses better than state-of-the-art lightweight networks. Note de contenu : 1- Introduction
2- Background
3- State of the art
4- Scene graph generation
5- 3D hand gesture recognition
6- Pointing gesture recognition
7- Eye-gaze tracking
8- Head pose estimation
9- Lightweight head pose estimation
10- SummaryNuméro de notice : 24039 Affiliation des auteurs : non IGN Thématique : IMAGERIE/INFORMATIQUE Nature : Thèse étrangère Note de thèse : PhD Thesis : Sciences : ETH Zurich :2022 DOI : sans En ligne : https://www.research-collection.ethz.ch/handle/20.500.11850/559347 Format de la ressource électronique : URL Permalink : https://documentation.ensg.eu/index.php?lvl=notice_display&id=101876
Titre : A world model enabling information integrity for autonomous vehicles Type de document : Thèse/HDR Auteurs : Corentin Sanchez, Auteur ; Philippe Bonnifait, Directeur de thèse ; Philippe Xu, Directeur de thèse Editeur : Compiègne : Université de Technologie de Compiègne UTC Année de publication : 2022 Importance : 198 p. Format : 21 x 30 cm Note générale : Bibliographie
Thèse de Doctorat de l'Université de Technologie de Compiègne, Spécialité Automatique et RobotiqueLangues : Anglais (eng) Descripteur : [Vedettes matières IGN] Intelligence artificielle
[Termes IGN] attention (apprentissage automatique)
[Termes IGN] carte routière
[Termes IGN] données multisources
[Termes IGN] information sémantique
[Termes IGN] intégrité des données
[Termes IGN] milieu urbain
[Termes IGN] navigation autonome
[Termes IGN] raisonnement
[Termes IGN] réseau routier
[Termes IGN] robot mobile
[Termes IGN] sécurité routière
[Termes IGN] véhicule sans pilote
[Termes IGN] vision par ordinateurIndex. décimale : THESE Thèses et HDR Résumé : (auteur) To drive in complex urban environments, autonomous vehicles need to understand their driving context. This task, also known as the situation awareness, relies on an internal virtual representation of the world made by the vehicle, called world model. This representation is generally built from information provided by multiple sources. High definition navigation maps supply prior information such as road network topology, geometric description of the carriageway, and semantic information including traffic laws. The perception system provides a description of the space and of road users evolving in the vehicle surroundings. Conjointly, they provide representations of the environment (static and dynamic) and allow to model interactions. In complex situations, a reliable and non-misleading world model is mandatory to avoid inappropriate decision-making and to ensure safety. The goal of this PhD thesis is to propose a novel formalism on the concept of world model that fulfills the situation awareness requirements for an autonomous vehicle. This world model integrates prior knowledge on the road network topology, a lane-level grid representation, its prediction over time and more importantly a mechanism to control and monitor the integrity of information. The concept of world model is present in many autonomous vehicle architectures but may take many various forms and sometimes only implicitly. In some work, it is part of the perception process when in some other it is part of a decisionmaking process. The first contribution of this thesis is a survey on the concept of world model for autonomous driving covering different levels of abstraction for information representation and reasoning. Then, a novel representation is proposed for the world model at the tactical level combining dynamic objects and spatial occupancy information. First, a graph based top-down approach using a high-definition map is proposed to extract the areas of interests with respect to the situation from the vehicle's perspective. It is then used to build a Lane Grid Map (LGM), which is an intermediate space state representation from the ego-vehicle point of view. A top-down approach is chosen to assess and characterize the relevant information of the situation. Additionally to classical free-occupied states, the unknown state is further characterized by the notions of neutralized and safe areas that provide a deeper level of understanding of the situation. Another contribution to the world model is an integrity management mechanism that is built upon the LGM representation. It consists in managing the spatial sampling of the grid cells in order to take into account localization and perception errors and to avoid misleading information. Regardless of the confidence on localization and perception information, the LGM is capable of providing reliable information to decision making in order not to take hazardous decisions.The last part of the situation awareness strategy is the prediction of the world model based on the LGM representation. The main contribution is to show how a classical object-level prediction fits this representation and that the integrity can also be extended at the prediction stage. It is also depicted how a neutralized area can be used in the prediction stage to provide a better situation prediction. The work relies on experimental data in order to demonstrate a real application of a complex situation awareness representation. The approach is evaluated with real data obtained thanks to several experimental vehicles equipped with LiDAR sensors and IMU with RTK corrections in the city of Compi_egne. A high-definition map has also been used in the framework of the SIVALab joint laboratory between Renault and Heudiasyc CNRS-UTC. The world model module has been implemented (with ROS software) in order to fulfll real-time application and is functional on the experimental vehicles for live demonstrations. Note de contenu : General introduction
1- World model for autonomous vehicules
2- An architecture for WM
3- A lane level world model
4- Set-based LGM prediction
General conclusionNuméro de notice : 24089 Affiliation des auteurs : non IGN Thématique : INFORMATIQUE Nature : Thèse française Note de thèse : Thèse de Doctorat : Automatique et Robotique : UTC Compiègne : 2022 Organisme de stage : Laboratoire Heudiasyc DOI : sans En ligne : https://www.theses.fr/2022COMP2683 Format de la ressource électronique : URL Permalink : https://documentation.ensg.eu/index.php?lvl=notice_display&id=102509 Pose estimation and 3D reconstruction of vehicles from stereo-images using a subcategory-aware shape prior / Maximilian Alexander Coenen in ISPRS Journal of photogrammetry and remote sensing, Vol 181 (November 2021)
[article]
Titre : Pose estimation and 3D reconstruction of vehicles from stereo-images using a subcategory-aware shape prior Type de document : Article/Communication Auteurs : Maximilian Alexander Coenen, Auteur ; Franz Rottensteiner, Auteur Année de publication : 2021 Article en page(s) : pp 27 - 47 Note générale : bibliographie Langues : Anglais (eng) Descripteur : [Vedettes matières IGN] Traitement d'image optique
[Termes IGN] classification par réseau neuronal convolutif
[Termes IGN] détection d'objet
[Termes IGN] estimation de pose
[Termes IGN] modèle stochastique
[Termes IGN] problème inverse
[Termes IGN] reconstruction 3D
[Termes IGN] reconstruction d'objet
[Termes IGN] robotique
[Termes IGN] véhicule automobile
[Termes IGN] vision par ordinateurRésumé : (auteur) The 3D reconstruction of objects is a prerequisite for many highly relevant applications of computer vision such as mobile robotics or autonomous driving. To deal with the inverse problem of reconstructing 3D objects from their 2D projections, a common strategy is to incorporate prior object knowledge into the reconstruction approach by establishing a 3D model and aligning it to the 2D image plane. However, current approaches are limited due to inadequate shape priors and the insufficiency of the derived image observations for a reliable alignment with the 3D model. The goal of this paper is to show how 3D object reconstruction can profit from a more sophisticated shape prior and from a combined incorporation of different observation types inferred from the images. We introduce a subcategory-aware deformable vehicle model that makes use of a prediction of the vehicle type for a more appropriate regularisation of the vehicle shape. A multi-branch CNN is presented to derive predictions of the vehicle type and orientation. This information is also introduced as prior information for model fitting. Furthermore, the CNN extracts vehicle keypoints and wireframes, which are well-suited for model-to-image association and model fitting. The task of pose estimation and reconstruction is addressed by a versatile probabilistic model. Extensive experiments are conducted using two challenging real-world data sets on both of which the benefit of the developed shape prior can be shown. A comparison to state-of-the-art methods for vehicle pose estimation shows that the proposed approach performs on par or better, confirming the suitability of the developed shape prior and probabilistic model for vehicle reconstruction. Numéro de notice : A2021-772 Affiliation des auteurs : non IGN Thématique : IMAGERIE Nature : Article nature-HAL : ArtAvecCL-RevueIntern DOI : 10.1016/j.isprsjprs.2021.07.006 Date de publication en ligne : 14/09/2021 En ligne : https://doi.org/10.1016/j.isprsjprs.2021.07.006 Format de la ressource électronique : URL article Permalink : https://documentation.ensg.eu/index.php?lvl=notice_display&id=98829
in ISPRS Journal of photogrammetry and remote sensing > Vol 181 (November 2021) . - pp 27 - 47[article]The integration of GPS/BDS real-time kinematic positioning and visual–inertial odometry based on smartphones / Zun Niu in ISPRS International journal of geo-information, vol 10 n° 10 (October 2021)PermalinkUnsupervised self-adaptive deep learning classification network based on the optic nerve microsaccade mechanism for unmanned aerial vehicle remote sensing image classification / Ming Cong in Geocarto international, vol 36 n° 18 ([01/10/2021])Permalink3D map creation using crowdsourced GNSS data / Terence Lines in Computers, Environment and Urban Systems, vol 89 (September 2021)PermalinkGIScience integrated with computer vision for the examination of old engravings and drawings / Motti Zohar in International journal of geographical information science IJGIS, vol 35 n° 9 (September 2021)PermalinkDigital camera calibration for cultural heritage documentation: the case study of a mass digitization project of religious monuments in Cyprus / Evagoras Evagorou in European journal of remote sensing, vol 54 sup 1 (2021)PermalinkA shape transformation-based dataset augmentation framework for pedestrian detection / Zhe Chen in International journal of computer vision, vol 129 n° 4 (April 2021)PermalinkA skyline-based approach for mobile augmented reality / Mehdi Ayadi in The Visual Computer, vol 37 n° 4 (April 2021)PermalinkVisual positioning in indoor environments using RGB-D images and improved vector of local aggregated descriptors / Longyu Zhang in ISPRS International journal of geo-information, vol 10 n° 4 (April 2021)PermalinkLightweight convolutional neural network-based pedestrian detection and re-identification in multiple scenarios / Xiao Ke in Machine Vision and Applications, vol 32 n° 2 (March 2021)PermalinkUnsupervised deep representation learning for real-time tracking / Ning Wang in International journal of computer vision, vol 129 n° 2 (February 2021)PermalinkPermalinkCartographie dense et compacte par vision RGB-D pour la navigation d’un robot mobile / Bruce Canovas (2021)PermalinkPermalinkDeep convolutional neural networks for scene understanding and motion planning for self-driving vehicles / Abdelhak Loukkal (2021)PermalinkExploration of reinforcement learning algorithms for autonomous vehicle visual perception and control / Florence Carton (2021)PermalinkPermalinkGeometric computer vision: omnidirectional visual and remotely sensed data analysis / Pouria Babahajiani (2021)PermalinkPermalinkIntelligent sensors for positioning, tracking, monitoring, navigation and smart sensing in smart cities / Li Tiancheng (2021)PermalinkPermalink