Détail de l'auteur
Auteur Mengxia Tang |
Documents disponibles écrits par cet auteur (1)
Ajouter le résultat dans votre panier Affiner la recherche Interroger des sources externes
Encoder-decoder structure with multiscale receptive field block for unsupervised depth estimation from monocular video / Songnan Chen in Remote sensing, Vol 14 n° 12 (June-2 2022)
[article]
Titre : Encoder-decoder structure with multiscale receptive field block for unsupervised depth estimation from monocular video Type de document : Article/Communication Auteurs : Songnan Chen, Auteur ; Junyu Han, Auteur ; Mengxia Tang, Auteur ; et al., Auteur Année de publication : 2022 Article en page(s) : n° 2906 Note générale : bibliographie Langues : Anglais (eng) Descripteur : [Vedettes matières IGN] Traitement d'image optique
[Termes IGN] apprentissage non-dirigé
[Termes IGN] classification par réseau neuronal convolutif
[Termes IGN] couple stéréoscopique
[Termes IGN] données d'entrainement (apprentissage automatique)
[Termes IGN] image isolée
[Termes IGN] optimisation (mathématiques)
[Termes IGN] profondeur
[Termes IGN] séquence d'images
[Termes IGN] structure-from-motionRésumé : (auteur) Monocular depth estimation is a fundamental yet challenging task in computer vision as depth information will be lost when 3D scenes are mapped to 2D images. Although deep learning-based methods have led to considerable improvements for this task in a single image, most existing approaches still fail to overcome this limitation. Supervised learning methods model depth estimation as a regression problem and, as a result, require large amounts of ground truth depth data for training in actual scenarios. Unsupervised learning methods treat depth estimation as the synthesis of a new disparity map, which means that rectified stereo image pairs need to be used as the training dataset. Aiming to solve such problem, we present an encoder-decoder based framework, which infers depth maps from monocular video snippets in an unsupervised manner. First, we design an unsupervised learning scheme for the monocular depth estimation task based on the basic principles of structure from motion (SfM) and it only uses adjacent video clips rather than paired training data as supervision. Second, our method predicts two confidence masks to improve the robustness of the depth estimation model to avoid the occlusion problem. Finally, we leverage the largest scale and minimum depth loss instead of the multiscale and average loss to improve the accuracy of depth estimation. The experimental results on the benchmark KITTI dataset for depth estimation show that our method outperforms competing unsupervised methods. Numéro de notice : A2022-563 Affiliation des auteurs : non IGN Thématique : IMAGERIE Nature : Article DOI : 10.3390/rs14122906 En ligne : https://doi.org/10.3390/rs14122906 Format de la ressource électronique : URL article Permalink : https://documentation.ensg.eu/index.php?lvl=notice_display&id=101240
in Remote sensing > Vol 14 n° 12 (June-2 2022) . - n° 2906[article]