Catalogue en ligne IGN

Détail de l'auteur

Auteur Edwin Carlinet

Documents disponibles écrits par cet auteur (9)

Ajouter le résultat dans votre panier Affiner la recherche Interroger des sources externes

A benchmark of nested named entity recognition approaches in historical structured documents / Solenn Tual (2023)

Public

Titre : A benchmark of nested named entity recognition approaches in historical structured documents
Type de document : Article/Communication
Auteurs : Solenn Tual , Auteur ; Nathalie Abadie , Auteur ; Joseph Chazalon, Auteur ; Bertrand Duménieu , Auteur ; Edwin Carlinet, Auteur
Editeur : Champs-sur-Marne [France] : Université Gustave Eiffel
Année de publication : 2023
Projets : SODUCO / Perret, Julien
Importance : 18 p.
Format : 21 x 30 cm
Note générale : Bibliographie
Langues : Anglais (eng)
Descripteur : [Vedettes matières IGN] Géomatique
[Termes IGN] langage naturel (informatique)
[Termes IGN] reconnaissance de noms
[Termes IGN] traitement du langage naturel

Résumé : (Auteur) Named Entity Recognition (NER) is a key step in the creation of structured data from digitised historical documents. Traditional NER approaches deal with flat named entities, whereas entities often are nested. For example, a postal address might contain a street name and a number. This work compares three nested NER approaches, including two state-of-the-art approaches using Transformer-based architectures. We introduce a new Transformer-based approach based on joint labelling and semantic weighting of errors, evaluated on a collection of 19 th-century Paris trade directories. We evaluate approaches regarding the impact of supervised fine-tuning, unsupervised pre-training with noisy texts, and variation of IOB tagging formats. Our results show that while nested NER approaches enable extracting structured data directly, they do not benefit from the extra knowledge provided during training and reach a performance similar to the base approach on flat entities. Even though all 3 approaches perform well in terms of F1 scores, joint labelling is most suitable for hierarchically structured data. Finally, our experiments reveal the superiority of the IO tagging format on such data.
Numéro de notice : P2023-001
Affiliation des auteurs : UGE-LASTIG+Ext (2020- )
Thématique : GEOMATIQUE/TOPONYMIE
Nature : Preprint
nature-HAL : Préprint
DOI : sans
Date de publication en ligne : 20/02/2023
En ligne : https://hal.science/hal-03994759v1/document
Format de la ressource électronique : URL Article
Permalink : https://documentation.ensg.eu/index.php?lvl=notice_display&id=102602

Création d’un graphe de connaissances géohistorique à partir d’annuaires du commerce parisien du 19ème siècle : application aux métiers de la photographie / Solenn Tual (2023)

Public

Titre : Création d’un graphe de connaissances géohistorique à partir d’annuaires du commerce parisien du 19ème siècle : application aux métiers de la photographie
Type de document : Article/Communication
Auteurs : Solenn Tual , Auteur ; Nathalie Abadie , Auteur ; Bertrand Duménieu , Auteur ; Joseph Chazalon, Auteur ; Edwin Carlinet, Auteur
Editeur : Saint-Mandé : Institut national de l'information géographique et forestière - IGN (2012-)
Année de publication : 2023
Projets : SODUCO / Perret, Julien
Conférence : IC 2023, 34es journées francophones d'Ingénierie des connaissances 03/07/2023 05/07/2023 Strasbourg France
Note générale : bibliographie
Langues : Français (fre)
Descripteur : [Vedettes matières IGN] Analyse spatiale
[Termes IGN] analyse spatio-temporelle
[Termes IGN] bruit (théorie du signal)
[Termes IGN] entité géographique
[Termes IGN] réseau sémantique
[Termes IGN] visualisation 4D

Index. décimale : 37.20 Analyse spatiale et ses outils
Résumé : (auteur) Les annuaires professionnels anciens, édités à un rythme soutenu dans de nombreuses villes européennes tout au long des XIXe et XXe siècles, forment un corpus de sources unique par son volume et la possibilité qu'ils donnent de suivre les transformations urbaines à travers le prisme des activités professionnelles des habitants, de l'échelle individuelle jusqu'à celle de la ville entière. L'analyse spatiotemporelle d'un type de commerces au travers des entrées d'annuaires demande cependant un travail considérable de recensement, de transcription et de recoupement manuels. Pour pallier cette difficulté, cet article propose une approche automatique pour construire et visualiser un graphe de connaissances géohistorique des commerces figurant dans des annuaires anciens. L'approche est testée sur des annuaires du commerce parisien du XIXe siècle allant de 1799 à 1908, sur le cas des métiers de la photographie.
Numéro de notice : C2023-012
Affiliation des auteurs : UGE-LASTIG+Ext (2020- )
Thématique : GEOMATIQUE
Nature : Communication
nature-HAL : ComAvecCL&ActesPubliésIntl
DOI : sans
En ligne : https://hal.science/hal-04121643
Format de la ressource électronique : URL article
Permalink : https://documentation.ensg.eu/index.php?lvl=notice_display&id=103319

Entry separation using a mixed visual and textual language model: Application to 19th century French trade directories / Bertrand Duménieu (2023)

Public

Titre : Entry separation using a mixed visual and textual language model: Application to 19th century French trade directories
Type de document : Article/Communication
Auteurs : Bertrand Duménieu , Auteur ; Edwin Carlinet, Auteur ; Nathalie Abadie , Auteur ; Joseph Chazalon, Auteur
Editeur : Champs-sur-Marne [France] : Université Gustave Eiffel
Année de publication : 2023
Projets : SODUCO / Perret, Julien
Importance : 20 p.
Format : 21 x 30 cm
Note générale : Bibliographie
Langues : Anglais (eng)
Descripteur : [Vedettes matières IGN] Géomatique
[Termes IGN] annuaire
[Termes IGN] dix-neuvième siècle
[Termes IGN] modèle de langue
[Termes IGN] reconnaissance de noms

Résumé : (Auteur) When extracting structured data from repetitively organized documents, such as dictionaries, directories, or even newspapers, a key challenge is to correctly segment what constitutes the basic text regions for the target database. Traditionally, such a problem was tackled as part of the layout analysis and was mostly based on visual clues for dividing (top-down) approaches. Some agglomerating (bottom-up) approaches started to consider textual information to link similar contents, but they required a proper over-segmentation of ne-grained units. In this work, we propose a new pragmatic approach whose eciency is demonstrated on 19 th century French Trade Directories. We propose to consider two sub-problems: coarse layout detection (text columns and reading order), which is assumed to be eective and not detailed here, and a ne-grained entry separation stage for which we propose to adapt a state-of-the-art Named Entity Recognition (NER) approach. By injecting special visual tokens, coding, for instance, indentation or breaks, into the token stream of the language model used for NER purpose, we can leverage both textual and visual knowledge simultaneously. Code, data, results and models are available at https://github.com/soduco/ paper-entryseg-icdar23-code, https://huggingface.co/HueyNemud/ (icdar23-entrydetector* variants).
Numéro de notice : P2023-002
Affiliation des auteurs : UGE-LASTIG+Ext (2020- )
Thématique : GEOMATIQUE/INFORMATIQUE/TOPONYMIE
Nature : Preprint
nature-HAL : Préprint
DOI : sans
Date de publication en ligne : 17/02/2023
En ligne : https://hal.science/hal-03994702v1/
Format de la ressource électronique : URL Article
Permalink : https://documentation.ensg.eu/index.php?lvl=notice_display&id=102609

A benchmark of named entity recognition approaches in historical documents : application to 19th century French directories / Nathalie Abadie (2022)

Public

Titre : A benchmark of named entity recognition approaches in historical documents : application to 19th century French directories
Type de document : Article/Communication
Auteurs : Nathalie Abadie , Auteur ; Edwin Carlinet, Auteur ; Joseph Chazalon, Auteur ; Bertrand Duménieu , Auteur
Editeur : Berlin, Heidelberg, Vienne, New York, ... : Springer
Année de publication : 2022
Collection : Lecture notes in Computer Science, ISSN 0302-9743 num. 13237
Projets : SODUCO / Perret, Julien
Conférence : DAS 2022, 5th IAPR International Workshop on Document Analysis Systems 22/05/2022 25/05/2022 La Rochelle France Proceedings Springer
Importance : pp 445 - 460
Note générale : bibliographie
Langues : Anglais (eng)
Descripteur : [Vedettes matières IGN] Géomatique
[Termes IGN] classification par réseau neuronal convolutif
[Termes IGN] dix-neuvième siècle
[Termes IGN] données d'entrainement (apprentissage automatique)
[Termes IGN] exploration de texte
[Termes IGN] objet géohistorique
[Termes IGN] reconnaissance de noms
[Termes IGN] traitement du langage naturel

Résumé : (auteur) Named entity recognition (NER) is a necessary step in many pipelines targeting historical documents. Indeed, such natural language processing techniques identify which class each text token belongs to, e.g. “person name”, “location”, “number”. Introducing a new public dataset built from 19th century French directories, we first assess how noisy modern, off-the-shelf OCR are. Then, we compare modern CNN- and Transformer-based NER techniques which can be reasonably used in the context of historical document analysis. We measure their requirements in terms of training data, the effects of OCR noise on their performance, and show how Transformer-based NER can benefit from unsupervised pre-training and supervised fine-tuning on noisy data. Results can be reproduced using resources available at https://github.com/soduco/paper-ner-bench-das22 and https://zenodo.org/record/6394464.
Numéro de notice : C2022-030
Affiliation des auteurs : UGE-LASTIG+Ext (2020- )
Autre URL associée : vers HAL
Thématique : GEOMATIQUE/INFORMATIQUE
Nature : Communication
nature-HAL : ComAvecCL&ActesPubliésIntl
DOI : 10.1007/978-3-031-06555-2_30
En ligne : http://dx.doi.org/10.1007/978-3-031-06555-2_30
Format de la ressource électronique : URL article
Permalink : https://documentation.ensg.eu/index.php?lvl=notice_display&id=101088

Generic programming in modern C++ for image processing / Michaël Roynard (2022)

Public

Titre : Generic programming in modern C++ for image processing
Type de document : Thèse/HDR
Auteurs : Michaël Roynard, Auteur ; Thierry Géraud, Directeur de thèse ; Edwin Carlinet, Directeur de thèse
Editeur : Paris : Sorbonne Université
Année de publication : 2022
Importance : 237 p.
Format : 21 x 30 cm
Note générale : bibliographie
Doctoral thesis submitted to fufill the requirements for the degree of Doctor of Sorbonne Université with the doctoral speciality of "Software Engineering and Image Processing"
Langues : Anglais (eng)
Descripteur : [Vedettes matières IGN] Langages informatiques
[Termes IGN] C++
[Termes IGN] langage de programmation
[Termes IGN] morphologie mathématique
[Termes IGN] programmation informatique
[Termes IGN] taxinomie
[Termes IGN] traitement d'image

Index. décimale : THESE Thèses et HDR
Résumé : (auteur) C++ is a multi-paradigm language that enables the initiated programmer to set up efficient image processing algorithms. This language strength comes from several aspects. C++ is high-level, which enables developing powerful abstractions and mixing different programming styles to ease the development. At the same time, C++ is low-level and can fully take advantage of the hardware to deliver the best performance. It is also very portable and highly compatible which allows algorithms to be called from high-level, fast-prototyping languages such as Python or Matlab. One of the most fundamental aspects where C++ really shines is generic programming. Generic programming makes it possible to develop and reuse bricks of software on objects (images) of different natures (types) without performance loss. Nevertheless,conciliating the aspects of genericity, efficiency, and simplicity is not trivial. Modern C++ (post-2011) has brought new features that made the language simpler and more powerful. In this thesis, we first explore one particular C++20aspect: the concepts, in order to build a concrete taxonomy of image related types and algorithms. Second, we explore another addition to C++20, ranges (and views), and we apply this design to image processing algorithms and image types in order to solve issues such as how hard it is to customize/tweak image processing algorithms. Finally, we explore possibilities regarding how we can offer a bridge between static (compile-time) generic C++ code and dynamic (runtime) Python code. We offer our own hybrid solution and benchmark its performance as well as discuss what can be done in the future with JIT technologies. Considering those three axes, we will address the issue regarding the way to conciliate generic programming, efficiency and ease of use.
Note de contenu : I Context and History of Generic programming
1- Introduction
2- Generic programming (genericity)
II Applying Generic programming for Image processing in the static world
3- Taxonomy for Image Processing: Image types and algorithms
4- Image views
III Bringing Generic programming to the dynamic world
5- A bridge between the static world and the dynamic world
6- Conclusion and continuation

Numéro de notice : 24083
Affiliation des auteurs : non IGN
Thématique : IMAGERIE/INFORMATIQUE
Nature : Thèse française
Note de thèse : PhD thesis : Software Engineering and Image Processing : Sorbonne Université : 2022
Organisme de stage : EPITA
DOI : sans
En ligne : https://theses.hal.science/tel-03922670
Format de la ressource électronique : URL
Permalink : https://documentation.ensg.eu/index.php?lvl=notice_display&id=102391

Combining deep learning and mathematical morphology for historical map segmentation / Yizi Chen (2021)

Permalink
ICDAR 2021 competition on historical map segmentation / Joseph Chazalon (2021)

Permalink
Introducing the boundary-aware loss for deep image segmentation / Minh On Vu Ngoc (2021)

Permalink
Vectorization of historical maps using deep edge filtering and closed shape extraction / Yizi Chen (2021)

Permalink

IGN

Centre dedocumentation
scientifique

Accueil

Sélection de la langue

Adresse

Se connecter

Actualité

L'actu ! voici les beaux jours et l'envol des étudiants vers leurs stages 2024

Informations pratiques

Détail de l'auteur

Auteur Edwin Carlinet

Documents disponibles écrits par cet auteur (9)

IGN / ENSG

L'IGN a pour vocation

Accès directs

2014-2022 IGN

IGN

Centre dedocumentationscientifique

Accueil

Sélection de la langue

Adresse

Se connecter

Actualité

L'actu ! voici les beaux jours et l'envol des étudiants vers leurs stages 2024

Informations pratiques

Détail de l'auteur

Auteur Edwin Carlinet

Documents disponibles écrits par cet auteur (9)

IGN / ENSG

L'IGN a pour vocation

Accès directs

2014-2022 IGN

Centre dedocumentation
scientifique