Descripteur
Termes IGN > sciences humaines et sociales > linguistique > linguistique informatique > traitement du langage naturel
traitement du langage naturelSynonyme(s)traitement automatique du langage naturelVoir aussi |
Documents disponibles dans cette catégorie (69)
Ajouter le résultat dans votre panier
Visionner les documents numériques
Affiner la recherche Interroger des sources externes
Etendre la recherche sur niveau(x) vers le bas
A benchmark of nested named entity recognition approaches in historical structured documents / Solenn Tual (2023)
Titre : A benchmark of nested named entity recognition approaches in historical structured documents Type de document : Article/Communication Auteurs : Solenn Tual , Auteur ; Nathalie Abadie , Auteur ; Joseph Chazalon, Auteur ; Bertrand Duménieu , Auteur ; Edwin Carlinet, Auteur Editeur : Champs-sur-Marne [France] : Université Gustave Eiffel Année de publication : 2023 Projets : SODUCO / Perret, Julien Importance : 18 p. Format : 21 x 30 cm Note générale : Bibliographie Langues : Anglais (eng) Descripteur : [Vedettes matières IGN] Géomatique
[Termes IGN] langage naturel (informatique)
[Termes IGN] reconnaissance de noms
[Termes IGN] traitement du langage naturelRésumé : (Auteur) Named Entity Recognition (NER) is a key step in the creation of structured data from digitised historical documents. Traditional NER approaches deal with flat named entities, whereas entities often are nested. For example, a postal address might contain a street name and a number. This work compares three nested NER approaches, including two state-of-the-art approaches using Transformer-based architectures. We introduce a new Transformer-based approach based on joint labelling and semantic weighting of errors, evaluated on a collection of 19 th-century Paris trade directories. We evaluate approaches regarding the impact of supervised fine-tuning, unsupervised pre-training with noisy texts, and variation of IOB tagging formats. Our results show that while nested NER approaches enable extracting structured data directly, they do not benefit from the extra knowledge provided during training and reach a performance similar to the base approach on flat entities. Even though all 3 approaches perform well in terms of F1 scores, joint labelling is most suitable for hierarchically structured data. Finally, our experiments reveal the superiority of the IO tagging format on such data. Numéro de notice : P2023-001 Affiliation des auteurs : UGE-LASTIG+Ext (2020- ) Thématique : GEOMATIQUE/TOPONYMIE Nature : Preprint nature-HAL : Préprint DOI : sans Date de publication en ligne : 20/02/2023 En ligne : https://hal.science/hal-03994759v1/document Format de la ressource électronique : URL Article Permalink : https://documentation.ensg.eu/index.php?lvl=notice_display&id=102602 Geographic named entity recognition by employing natural language processing and an improved BERT model / Liufeng Tao in ISPRS International journal of geo-information, vol 11 n° 12 (December 2022)
[article]
Titre : Geographic named entity recognition by employing natural language processing and an improved BERT model Type de document : Article/Communication Auteurs : Liufeng Tao, Auteur ; Zhong Xie, Auteur ; Dexin Xu, Auteur ; et al., Auteur Année de publication : 2022 Article en page(s) : n° 598 Note générale : bibliographie Langues : Anglais (eng) Descripteur : [Vedettes matières IGN] Géomatique
[Termes IGN] Chine
[Termes IGN] classification dirigée
[Termes IGN] classification par réseau neuronal récurrent
[Termes IGN] données issues des réseaux sociaux
[Termes IGN] données publiques
[Termes IGN] jeu de données
[Termes IGN] reconnaissance de caractères
[Termes IGN] reconnaissance de noms
[Termes IGN] test de performance
[Termes IGN] toponyme
[Termes IGN] traitement du langage naturelRésumé : (auteur) Toponym recognition, or the challenge of detecting place names that have a similar referent, is involved in a number of activities connected to geographical information retrieval and geographical information sciences. This research focuses on recognizing Chinese toponyms from social media communications. While broad named entity recognition methods are frequently used to locate places, their accuracy is hampered by the many linguistic abnormalities seen in social media posts, such as informal sentence constructions, name abbreviations, and misspellings. In this study, we describe a Chinese toponym identification model based on a hybrid neural network that was created with these linguistic inconsistencies in mind. Our method adds a number of improvements to a standard bidirectional recurrent neural network model to help with location detection in social media messages. We demonstrate the results of a wide-ranging evaluation of the performance of different supervised machine learning methods, which have the natural advantage of avoiding human design features. A set of controlled experiments with four test datasets (one constructed and three public datasets) demonstrates the performance of supervised machine learning that can achieve good results on the task, significantly outperforming seven baseline models. Numéro de notice : A2022-945 Affiliation des auteurs : non IGN Thématique : GEOMATIQUE Nature : Article DOI : 10.3390/ijgi11120598 Date de publication en ligne : 28/11/2022 En ligne : https://doi.org/10.3390/ijgi11120598 Format de la ressource électronique : URL article Permalink : https://documentation.ensg.eu/index.php?lvl=notice_display&id=102178
in ISPRS International journal of geo-information > vol 11 n° 12 (December 2022) . - n° 598[article]Machine learning and natural language processing of social media data for event detection in smart cities / Andrei Hodorog in Sustainable Cities and Society, vol 85 (October 2022)
[article]
Titre : Machine learning and natural language processing of social media data for event detection in smart cities Type de document : Article/Communication Auteurs : Andrei Hodorog, Auteur ; Ioan Petri, Auteur ; yacine Rezgui, Auteur Année de publication : 2022 Article en page(s) : n° 104026 Note générale : bibliographie Langues : Anglais (eng) Descripteur : [Vedettes matières IGN] Géomatique web
[Termes IGN] apprentissage automatique
[Termes IGN] classification bayesienne
[Termes IGN] détection d'événement
[Termes IGN] données issues des réseaux sociaux
[Termes IGN] outil d'aide à la décision
[Termes IGN] régression multiple
[Termes IGN] taxinomie
[Termes IGN] traitement du langage naturel
[Termes IGN] ville intelligenteRésumé : (auteur) Social media data analysis in a smart city context can represent an efficacious instrument to inform decision making. The manuscript strives to leverage the power of Natural Language Processing (NLP) techniques applied to Twitter messages using supervised learning to achieve real-time automated event detection in smart cities. A semantic-based taxonomy of risks is devised to discover and analyse associated events from data streams, with a view to: (i) read and process, in real-time, published texts (ii) classify each text into one representative real-world category (iii) assign a citizen satisfaction value to each event. To select the language processing models striking the best balance between accuracy and processing speed, we conducted a pre-emptive evaluation, comparing several baseline language models formerly employed by researchers for event classification. A heuristic analysis of several smart cities and community initiatives was conducted, with a view to define real-world scenarios as basis for determining correlations between two or more co-occurring event types and their associated levels of citizen satisfaction, while further considering environmental factors. Based on Multiple Regression Analysis (MRA), we established the relationships between scenario variables, obtaining a variance of 60%–90% between the dependent and independent variables. The selected combination of supervised NLP techniques leverages an accuracy of 88.5%. We found that all regression models had at least one variable below the 0.05 threshold of the , therefore at least one statistically significant independent variable. These findings ultimately illustrate how citizens, taking the role of active social sensors, can yield vital data that authorities can use to make educated decisions and sustainably construct smarter cities. Numéro de notice : A2022-764 Affiliation des auteurs : non IGN Thématique : GEOMATIQUE Nature : Article DOI : 10.1016/j.scs.2022.104026 Date de publication en ligne : 02/07/2022 En ligne : https://doi.org/10.1016/j.scs.2022.104026 Format de la ressource électronique : URL article Permalink : https://documentation.ensg.eu/index.php?lvl=notice_display&id=101785
in Sustainable Cities and Society > vol 85 (October 2022) . - n° 104026[article]Deep learning method for Chinese multisource point of interest matching / Pengpeng Li in Computers, Environment and Urban Systems, vol 96 (September 2022)
[article]
Titre : Deep learning method for Chinese multisource point of interest matching Type de document : Article/Communication Auteurs : Pengpeng Li, Auteur ; Jiping Liu, Auteur ; An Luo, Auteur ; et al., Auteur Année de publication : 2022 Article en page(s) : n° 101821 Note générale : bibliographie Langues : Anglais (eng) Descripteur : [Vedettes matières IGN] Géomatique
[Termes IGN] appariement sémantique
[Termes IGN] apprentissage profond
[Termes IGN] classification par Perceptron multicouche
[Termes IGN] classification par réseau neuronal convolutif
[Termes IGN] extraction de traits caractéristiques
[Termes IGN] inférence sémantique
[Termes IGN] information sémantique
[Termes IGN] point d'intérêt
[Termes IGN] représentation vectorielle
[Termes IGN] traitement du langage naturelRésumé : (auteur) Multisource point of interest (POI) matching refers to the pairing of POIs that refer to the same geographic entity in different data sources. This also constitutes the core issue in geospatial data fusion and update. The existing methods cannot effectively capture the complex semantic information from a text, and the manually defined rules largely affect matching results. This study developed a multisource POI matching method based on deep learning that transforms the POI pair matching problem into a binary classification problem. First, we used three different Chinese word segmentation methods to segment the POI text attributes and used the segmentation results to train the Word2Vec model to generate the corresponding word vector representation. Then, we used the text convolutional neural network (Text-CNN) and multilayer perceptron (MLP) to extract the POI attributes' features and generate the corresponding feature vector representation. Finally, we used the enhanced sequential inference model (ESIM) to perform local inference and inference combination on each attribute to realize the classification of POI pairs. We used the POI dataset containing Baidu Map, Tencent Map, and Gaode Map from Chengdu to train, verify, and test the model. The experimental results show that the matching precision, recall rate, and F1 score of the proposed method exceed 98% on the test set, and it is significantly better than the existing matching methods. Numéro de notice : A2022-513 Affiliation des auteurs : non IGN Thématique : GEOMATIQUE/INFORMATIQUE Nature : Article DOI : 10.1016/j.compenvurbsys.2022.101821 Date de publication en ligne : 18/06/2022 En ligne : https://doi.org/10.1016/j.compenvurbsys.2022.101821 Format de la ressource électronique : URL article Permalink : https://documentation.ensg.eu/index.php?lvl=notice_display&id=101053
in Computers, Environment and Urban Systems > vol 96 (September 2022) . - n° 101821[article]GIS-KG: building a large-scale hierarchical knowledge graph for geographic information science / Jiaxin Du in International journal of geographical information science IJGIS, vol 36 n° 5 (May 2022)
[article]
Titre : GIS-KG: building a large-scale hierarchical knowledge graph for geographic information science Type de document : Article/Communication Auteurs : Jiaxin Du, Auteur ; Shaohua Wang, Auteur ; Xinyue Ye, Auteur ; et al., Auteur Année de publication : 2022 Article en page(s) : pp 873 - 897 Note générale : bibliographie Langues : Anglais (eng) Descripteur : [Vedettes matières IGN] Géomatique
[Termes IGN] apprentissage profond
[Termes IGN] approche hiérarchique
[Termes IGN] exploration de données
[Termes IGN] ingénierie des connaissances
[Termes IGN] ontologie
[Termes IGN] recherche d'information géographique
[Termes IGN] réseau sémantique
[Termes IGN] traitement du langage naturelRésumé : (auteur) An organized knowledge base can facilitate the exploration of existing knowledge and the detection of emerging topics in a domain. Knowledge about and around Geographic Information Science and its associated system technologies (GIS) is complex, extensive and emerging rapidly. Taking the challenge, we built a GIS knowledge graph (GIS-KG) by (1) merging existing GIS bodies of knowledge to create a hierarchical ontology and then (2) applying deep-learning methods to map GIS publications to the ontology. We conducted several experiments on information retrieval to evaluate the novelty and effectiveness of the GIS-KG. Results showed the robust support of GIS-KG for knowledge search of existing GIS topics and potential to explore emerging research themes. Numéro de notice : A2022-341 Affiliation des auteurs : non IGN Thématique : GEOMATIQUE Nature : Article nature-HAL : ArtAvecCL-RevueIntern DOI : 10.1080/13658816.2021.2005795 Date de publication en ligne : 26/11/2021 En ligne : https://doi.org/10.1080/13658816.2021.2005795 Format de la ressource électronique : URL article Permalink : https://documentation.ensg.eu/index.php?lvl=notice_display&id=100515
in International journal of geographical information science IJGIS > vol 36 n° 5 (May 2022) . - pp 873 - 897[article]Exemplaires(1)
Code-barres Cote Support Localisation Section Disponibilité 079-2022051 SL Revue Centre de documentation Revues en salle Disponible Automated construction of a French Entity Linking dataset to geolocate social network posts in the context of natural disasters / Gaëtan Caillaut (2022)PermalinkA benchmark of named entity recognition approaches in historical documents : application to 19th century French directories / Nathalie Abadie (2022)PermalinkCaractérisation de la ville du futur dans des corpus de science-fiction et de fiction climatique / Sami Guembour (2022)PermalinkGenerating geographical location descriptions with spatial templates: a salient toponym driven approach / Mark M. Hall in International journal of geographical information science IJGIS, vol 36 n° 1 (January 2022)PermalinkLe carrefour dont vous êtes le héros : description de carrefours pour les personnes déficientes visuelles / Jérémy Kalsron (2021)PermalinkExtracting event-related information from a corpus regarding soil industrial pollution / Chuanming Dong (2021)PermalinkPlace names in Spanish republican life stories: spatial patterns in locations and perceptions / Laurence Jolivet (2021)PermalinkSocial media as passive geo-participation in transportation planning – how effective are topic modeling & sentiment analysis in comparison with citizen surveys? / Oliver Lock in Geo-spatial Information Science, vol 23 n° 4 (December 2020)PermalinkA deep learning architecture for semantic address matching / Yue Lin in International journal of geographical information science IJGIS, vol 34 n° 3 (March 2020)PermalinkA framework for extracting urban functional regions based on multiprototype word embeddings using points-of-interest data / Sheng Hu in Computers, Environment and Urban Systems, vol 80 (March 2020)Permalink