Aller à l’une de ces dates :
20 septembre 2007, Guillaume Pitel, CEA, Paris
4 octobre 2007, Benoit Crabbé, Paris 7
18 octobre 2007, Sylvain Schmitz, INRIA/LORIA, Nancy
25 octobre 2007 Sivaji Bandyopadhyay , Jadavpur University Kolkata (India)
26 octobre 2007 Sivaji Bandyopadhyay , Jadavpur University Kolkata (India)
8 novembre 2007, Didier Bourigault, CNRS/ERSS, Toulouse
15 novembre 2007, Lukasz Debowski, Varsovie (Pologne)
22 novembre 2007, Thierry Declerck, DFKI, Saarbruecken (Germany) — ANNULE (grève SNCF)
13 décembre 2007, Rokia Bendaoud, Orpailleur/LORIA, Nancy
20 décembre 2007, Alexander Koller, U. of Edinburgh (UK)
10 janvier 2008, Nick Asher, CNRS/IRIT, Toulouse
17 janvier 2008, Sandrine Ollinger, CNRS/ATILF, Nancy
7 février 2008, Fabienne Venant, Université de Nancy 2
20 septembre ; 14:00-15:00, Salle A008
Guillaume Pitel , CEA/Paris parlera de Construction automatique de ressources en sémantique lexicale : méthodes de projection trans-linguistiques appliquées à FrameNet
4 octobre ; 14:00-15:00, Salle A006
Benoit Crabbé , U. Paris 7 parlera de A computer language for representing grammatical information : eXtensible Metagrammar
Résumé : In this talk I will introduce the key notions of eXtensible MetaGrammar, a language for describing grammatical information for tree-based formalisms of the Tree Adjoining Grammar and related formalisms family. The talk will focus on implementation issues related to the design of a large scale Tree Adjoining Grammar for French.
18 octobre ; 14:00-15:00, Salle A006
Sylvain Schmitz , INRIA/LORIA parlera de Approximating Context-Free Grammars for Parsing and Verification
Résumé : The talk applies a grammar approximation technique to the nondeterminism and ambiguity issues in programming language parsing in two different ways. The first application is the generation of noncanonical parsers, less prone to nondeterminism, mostly thanks to their ability to exploit an unbounded context-free language as right context to guide their decision. Such parsers enforce the unambiguity of the grammar, and furthermore warrant a linear time parsing complexity. The second application is ambiguity detection in itself, with the insurance that a grammar reported as unambiguous is actually so, whatever level of approximation we might choose.
25 octobre ; 14:00-15:00, Salle A006
Sivaji Bandyopadhyay , Jadavpur University Kolkata (India) parlera de Generation of Referring Expression using Prefix Tree Structure
Résumé : Generation of referring expression (GRE) is an important task in the field of Natural Language Generation (NLG) systems. The existing algorithms in GRE lie in two extremities. Incremental Algorithm is simple and speedy but less expressive in nature whereas others are complex and exhaustive but more expressive in nature. We propose a new Prefix Tree (Trie) based framework for modeling GRE problems. It incorporates intricate features of GRE (like set and boolean descriptions, context sensitivity, relational description etc.) while achieving attractive properties of Incremental algorithm (simplicity, speed etc.). The prefix tree based algorithm works in two phases. First, it encodes the description, stored in the knowledge base, in the form of prefix tree structure. Secondly, it generates the referring expression identifying the target object, which is basically a node search problem in the tree. The edges in our encoded trie structure are labeled and the path from root to that node forms the distinguishing description for the target object. The significant achievement is that incompleteness of the previous algorithms can be tackled in this model in a straightforward way. For example, in case of vague descriptions (overlapping properties), Incremental and other algorithms are unable to find unambiguous description even if it exists but our prefix tree model take into account hearer model and generate description for identifying the target object. Considering the time complexity, in case of simple non-overlapping properties we have achieved linear time complexity. Thus our model provides a simple and linguistically rich approach to GRE.
26 octobre ; 14:00-15:00, Salle A006
Sivaji Bandyopadhyay , Jadavpur University Kolkata (India) parlera de Named Entity Recognition, Transliteration and Use in MT (Machine Translation), TDT (Topic Detection and Tracking) and MDS (Multi-Document Summarization)
Résumé : The current trend in NER is to use the machine-learning approach, which is more attractive in that it is trainable and adoptable and the maintenance of a machine-learning system is much cheaper than that of a rule-based one. We have developed the Named Entity Recognition (NER) systems for Bengali using various techniques like pattern directed shallow parsing approach without and with linguistic knowledge, statistical Hidden Markov Model (HMM), Maximum Entropy (ME) Model, Conditional Random Field (CRF) and Support Vector Machine (SVM). Named Entity Recognition in Indian languages (ILs) particularly in Bengali is difficult and challenging as there is no concept of capitalization in ILs as like English. A web-based tagged Bengali news corpus of approximately 34 million wordforms in UTF-8 has been developed from the web archive of a leading Bengali newspaper and a part of this corpus has been used in NER tasks. All the systems have been evaluated and the SVM based model has outperformed others with an overall F-Score of 91.8%. We have used a modified joint source-channel model for named entity transliteration and this has been used for transliteration among English and Bengali. We are using the named entity tags for English named entities in an English-Bengali Machine translation system. We have recently started work on Story Link Detection in which each news story is represented as a collection of four vectors : locations, proper names, temporal expressions and general terms. The 4-vector representation of each news document will be used to measure the similarity between two news documents. The 4-vector representation of news stories and the similarity measure of news stories can be used further towards multidocument summarization of news stories.
8 novembre ; 14:00-15:00, Salle A006
Didier Bourigault, CNRS/ERSS, Toulouse parlera de Un analyseur syntaxique opérationnel : SYNTEX
Résumé : Nous présentons les recherches en ingénierie linguistique menées autour de la réalisation, l’évaluation et l’utilisation du logiciel SYNTEX, un analyseur syntaxique automatique du français. Nous présentons les concepts clés qui ont guidé la conception de l’ analyseur SYNTEX. L’analyse syntaxique automatique est présentée comme un problème de reconnaissance de formes, représentées par des structures de dépendance syntaxique. SYNTEX est un analyseur procédural à cascades. Sur le plan épistémologique, il peut être caractérisé comme un objet technique, au sens de la philosophie des techniques de G. Simondon.
15 novembre ; 14:00-15:00, Salle A006
Lukasz Debowski, Varsovie, Pologne , parlera de New methods for verb valence extraction from Polish texts
Résumé : We will presents a new method for extracting verb valence information from raw texts. The method has been designed for a language for which no verified treebank exists but a large phrase grammar and several valence dictionaries had been compiled by linguists. The extraction proceeds in two steps. Firstly, a deterministic grammar parser and a new instance of expectation-maximization algorithm have been used to obtain an imperfect reduced treebank from unannotated texts (with trees reduced to valence frames). In the following, partially supervised learning has been applied to receive verbal subcategorization frames from the treebank. The obtained dictionary features higher precision thanks to adjusting co-occurrence matrices for verb arguments, which is a novel idea.
22 novembre ; 14:00-15:00, Salle A006
Thierry Declerck , DFKI, Sarrebruecken parlera de Language Technology for the Semantic Annotation of Multimedia
Résumé : This lecture will address the topic of the so-called "semantic gap" in the analysis and generation of multimedia (MM) content. NLP and Semantic Web (SW) technologies can help on this in providing for ontology-based semantic annotation of textual documents (including speech transcripts) that are associated with video and images. We will describe some work dedicated to the description of an ontological framework that combines annotations provided by the combination of NLP and SW on the one side, and the so-called low level annotation features provided by MM analysis on the other side. The course will rely on the most recent advances in this field, as proposed by the EU Network of Excellence "K-Space" (Knowledge Space of semantic inference for automatic annotation and retrieval of multimedia content).
13 décembre ; 14:00-15:00, Salle A006
Rokia Bendaoud , Orpailleur/LORIA, Nancy parlera de Passage d’un thésaurus à une ontologie formelle en utilisant un corpus de textes
Résumé : L’enrichissement d’un thésaurus en une base de connaissances formelles (ontologie) sur laquelle on peut raisonner est l’un des enjeux du Web Sémantique. Nous présentons la méthodologie PACTOLE. PACTOLE consiste à enrichir et à formaliser un thésaurus pour construire une ontologie représentée en logique de description. L’enrichissement est effectué par la fusion de la connaissance des experts (le thésaurus) avec la connaissance extraite du corpus de textes par des méthodes de fouille. PACTOLE est appliqué au domaine de la microbiologie.
20 décembre ; 14:00-15:00, Salle A006
Alexander Koller , U. of Edinburgh parlera de Generation as planning
Résumé : The problem of natural language generation is intimately related to AI planning on many levels. In both problems, the computer has to search for a sequence of actions that combine in appropriate ways to achieve a given goal ; in the case of generation, these actions may correspond to uttering speech acts, sentences, or individual words. This has been recognized in the literature for several decades, but we are currently experiencing a revival of interest in exploring this connection, which has been sparked especially by the recent advances in making planners efficient.
In my talk, I will first show how sentence generation can be translated into a planning problem. This has the advantage that the (somewhat artificial) separation of sentence generation into microplanning and surface realization can be overcome, in a way that’s similar to Matthew Stone’s SPUD system. Furthermore, each plan action captures the complete grammatical, semantic, and pragmatic preconditions and effects of uttering a single word. I will then present a recent proposal for a shared task for the generation community, in which the system must generate instructions in a virtual environment, discuss some problems that arise in this application, and propose some ideas on how these can be tackled using a planning approach.
10 janvier ; 14:00-15:00, Salle A006
Nicholas Asher , CNRS/IRIT (Toulouse) parlera de Grounding and Correcting Commitments
Résumé : In this talk I will argue that current approaches to grounding fail to make the right predictions in cases where grounding is implicit. I will then argue for an approach in which individual commitments of speakers are explicitly represented in logical form and what is grounded is a logical consequence of all the individual commitments.
17 janvier ; 14:00-15:00, Salle A006
Sandrine Ollinger, CNRS/ATILF, Nancy parlera de La POMPAMO : extraction de néologismes
Résumé : La POMPAMO est un élément d’une plateforme de veille lexicale actuellement en cours de développement. La méthode générale que nous mettons en oeuvre vise à identifier les néologismes en confrontant des corpus, considérés comme des archives de pratiques linguistiques données, et des lexiques, considérés comme des simulations d’usages lexicaux attestés et correspondant à ces pratiques linguistiques. Je commencerai mon exposé en vous présentant les différents modules de la plateforme de veille lexicale envisagée au sein de l’ATILF, puis, après un bref rappel théorique sur la néologie, je détaillerai le fonctionnement de l’outil POMPAMO.
7 février ; 14:00-15:00, Salle A006
Fabienne Venant, U. Nancy 2 parlera de Sémantique lexicale et désambiguisation
Résumé : La désambiguisation automatique est un problème central en Traitement Automatique des Langues (TAL). De nombreuses applications, notamment en Recherche d’Information, nécessitent un accès au contenu sémantique des documents. Ces applications se heurtent à l’omniprésence de la polysémie. Tenter de formaliser ce phénomène demande de résoudre la double question du contenu - comment définir le sens lexical et quelle représentation en donner - et de l’organisation - comment prendre en compte les relations sémantiques entre les unités lexicales. Je présenterai une réponse géométrique à ces questions, s’appuyant sur l’analyse d’un graphe de synonymie (tiré du DES, Crisco, Caen.) La synonymie n’étant, presque toujours, que partielle, elle permet en effet de représenter à la fois en quoi les sens de mots synonymes se recouvrent et en quoi ils diffèrent. Il s’agit ici de construire des espaces sémantiques à différentes échelles, pouvant rendre compte de la sémantique d’une unité donnée, ou permettre un accès à la structuration paradigme lexical dans son ensemble. Les espaces sémantiques ainsi construits servent de base à une méthode dynamique de calcul du sens, testée sur des tâches de désambiguisation adjectivale. Cette méthode s’appuie sur une pré-désambiguisation du nom régissant l’adjectif, par le biais de classes de sélection distributionnelle. Je présenterai les premiers résultats obtenus pour des tâches de désambiguisation fine (polysémie adjectivale : influence du nom recteur et/ou de la place de l’adjectif, polysémie nominale : « facettes sémantiques » d’un nom comme livre) .

