A-
A
A+

L'information grandeur nature

Centre de documentation
scientifique

Actualité

L'actu ! Horaires du CDOS

Le centre est ouvert en salle de travail du lundi au vendredi, de 08h30 à 19h00. Les prêts physiques sont suspendus pour le moment. Merci d'adr...

Consulter les archives

dernière publication

Dernière publication Consultez les parutions du CDoS

Informations pratiques

Le centre de documentation scientifique
de l'Institut est aussi celui de l'ENSG-Géomatique.

Envoyez un mail au contact ci-dessous pour vos demandes.

Contact : cdos@ign.fr

Comment venir ?
Pour se rendre au centre de documentation, se reporter au 'Plan d'accès au bâtiment' tout en bas de page à droite. Le centre se trouve dans le bâtiment de l'ENSG. On y accède par le hall, entre l'aile Cassini et l'aile Laplace.

Mots RAMEAU représentant le fonds du centre de documentation
géomatique, géodésie, réseaux (géodésie), systèmes de localisation par satellites,
imagerie satellitaire, traitement d'images, photogrammétrie, lasergrammétrie,
systèmes d'information géographique, données géospatiales, cartographie,
géographie - base de données

Nouvelle recherche

Document: Thèse/HDR

Dynamic scene understanding using deep neural networks / Ye Lyu (2021)

URL

Public

Titre :	Dynamic scene understanding using deep neural networks
Type de document :	Thèse/HDR
Auteurs :	Ye Lyu, Auteur ; M. George Vosselman, Directeur de thèse ; Michael Ying Yang, Directeur de thèse
Editeur :	Enschede [Pays-Bas] : International Institute for Geo-Information Science and Earth Observation ITC
Année de publication :	2021
Langues :	Anglais (eng)
Descripteur :	[Vedettes matières IGN] Traitement d'image [Termes IGN] attention (apprentissage automatique) [Termes IGN] chaîne de traitement [Termes IGN] champ aléatoire conditionnel [Termes IGN] compréhension de l'image [Termes IGN] détection d'objet [Termes IGN] image captée par drone [Termes IGN] image vidéo [Termes IGN] poursuite de cible [Termes IGN] régression [Termes IGN] segmentation sémantique
Résumé :	(auteur) Scene understanding is an important and fundamental research field in computer vision, which is quite useful for many applications in photogrammetry and remote sensing. It focuses on locating and classifying objects in images, understanding the relationships between them. The higher goal is to interpret what event happens in the scene, when it happens and why it happens, and what should we do based on the information. Dynamic scene understanding is to use information from different time to interpret scenes and answer the above related questions. For modern scene understanding technology, deep learning has shown great potential for such task. "Deep" in deep learning refers to the use of multiple layers in the neural networks. Deep neural networks are powerful as they are highly non-linear function that possess the ability to map from one domain to another quite different domain after proper training. It is the best solution for many fundamental research tasks regarding scene understanding. This ph.D. research also takes advantage of deep learning for dynamic scene understanding. Temporal information plays an important role for dynamic scene understanding. Compared with static scene understanding from images, information distilled from the time dimension provides values in many different ways. Images across consecutive frames have very high correlation, i.e., objects observed in one frame have very high chance to be observed and identified in nearby frames as well. Such redundancy in observation could potentially reduce the uncertainty for object recognition with deep learning based methods, resulting in more consistent inference. High correlation across frames could also improve the chance for recognizing objects correctly. If the camera or the object moves, the object could be observed in multiple different views with different poses and appearance. The information captured for object recognition would be more diverse and complementary, which could be aggregated to jointly inference the categories and the properties of objects. This ph.D. research involves several tasks related to the dynamic scene understanding in computer vision, including semantic segmentation for aerial platform images (chapter 2, 3), video object segmentation and video object detection for common objects in natural scenes (chapter 4, 5), and multi-object tracking and segmentation for cars and pedestrians in driving scenes (chapter 6). Chapter2 investigates how to establish the semantic segmentation benchmark for the UAV images, which includes data collection, data labeling, dataset construction, and performance evaluation with baseline deep neural networks and the proposed multi-scale dilation net. Conditional random field with feature space optimization is used to achieve consistent semantic segmentation prediction in videos. Chapter3 investigates how to better extract the scene context information for etter object recognition performance by proposing the novel bidirectional multiscale attention networks. It achieves better performance by inferring features and attention weights for feature fusing from both higher level and lower level branches. Chapter4 investigates how to simultaneously segment multiple objects across multiple frames by combining memory modules with instance segmentation networks. Our method learns to propagate the target object labels without auxiliary data, such as optical flow, which simplifies the model. Chapter5 investigates how to improve the performance of well-trained object detectors with a light weighted and efficient plug&play tracker for object detection in video. This chapter also investigates how the proposed model performs when lacking video training data. Chapter6 investigates how to improve the performance of detection, segmentation, and tracking by jointly considering top-down and bottom-up inference. The whole pipeline follows the multi-task design, i.e., a single feature extraction backbone with multiple heads for different sub-tasks. Overall, this manuscript has delved into several different computer vision tasks, which share fundamental research problems, including detection, segmentation, and tracking. Based on the research experiments and knowledge from literature review, several reflections regarding dynamic scene understanding have been discussed: The range of object context influence the quality for object recognition; The quality of video data affect the method choice for specific computer vision task; Detection and tracking are complementary for each other. For future work, unified dynamic scene understanding task could be a trend, and transformer plus self-supervised learning is one promising research direction. Real-time processing for dynamic scene understanding requires further researches in order to put the methods into usage for real-world applications.
Numéro de notice :	12984
Affiliation des auteurs :	non IGN
Thématique :	IMAGERIE/INFORMATIQUE
Nature :	Thèse étrangère
Note de thèse :	PhD thesis : Geo-Information Science and Earth Observation : Enschede, university of Twente : 2021
DOI :	10.3990/1.9789036552233
Date de publication en ligne :	08/09/2021
En ligne :	https://library.itc.utwente.nl/papers_2021/phd/lyu.pdf
Format de la ressource électronique :	URL
Permalink :	https://documentation.ensg.eu/index.php?lvl=notice_display&id=100962

vignette

IGN

IGN / ENSG

Centre de documentation
scientifique
6 et 8 Avenue Blaise Pascal
Cité Descartes
Champs-sur-Marne
77455 Marne la Vallée
Cedex 2
L'IGN a pour vocation

de décrire la surface du territoire national et l'occupation de son sol, et d'élaborer et de mettre à jour l'inventaire permanent des ressources forestières nationales.
Accès directs
2014 -2024 IGN