Aller à l’une de ces dates :
9 October 2008, Anna Kupsc, Université Michel de Montaigne, Pessac
20 October 2008, Alain Polguère, Université de Montréal (Canada)
13 November 2008, Holger Schwenk, Université du Maine
27 November 2008, Günther Neumann, DFKI Saarbrücken (Germany) — Reporté au 26 mars
11 December 2008, Paola Merlo, U. Geneva (Switzerland)
9 October 2008 ; 14:00-15:00, Salle A006
Anna Kupsc , Université Michel de Montaigne, Pessac parlera de Adjectives in TreeLex
Résumé : I’m going to present TreeLex, a syntactic lexicon for French which can be used in various NLP applications. The lexicon has been automatically extracted from a treebank (a syntactically annotated corpus). The talk will focus on obtaining the valence of adjectives. The adopted method explores corpus annotations (constituent structure) and linguistic knowledge as the size of the corpus (about 1 million words) is not sufficient for applying statistical techniques. In particular, I will concentrate on automatically identifying different types of adjectival constructions (such as comparatives or an extraposition) in the corpus and distinguish them from the `canonical’ adjectival valence. I will present the current results and discuss problematic issues.
20 October 2008 ; 14:00-15:00, Salle A006
Alain Polguère, OLST, Université de Montréal, (Canada) parlera de Semantic information in lexical databases Résumé : This talk will be threefold. First, I will give an overview of the various types of semantic information that structure natural language lexicons. I will then proceed with showing to what extent the choice of given theoretical perspectives on such information has drastic implications on the structuring of formal lexical databases that target NLP applications. Finally, I will present in some detail how semantic information is modeled within the DiCo database of French semantic derivations and collocations.
13 November 2008 ; 14:00-15:00, Amphitheatre
Holger Schwenk, parlera de Apprentissage non-supervisé en traduction automatique
Résumé : Sentence-aligned bilingual texts are a crucial resource to build statistical machine translation (SMT) systems. In this task I will propose to apply lightly-supervised training to produce additional parallel data. The idea is to translate large amounts of monolingual data (up to 275M words) with an SMT system, and to use those as additional training data. Results are reported for the translation between French and English. Two setups are considered : first the intial SMT system is only trained with a very limited amount of human-produced translations, and then the case where we have more than 100 million words. In both conditions, lightly-supervised training achieves significant improvements of the BLEU score.
27 November 2008 ; 14:00-15:00, Salle B013
Günter Neumann, University of Saarbruecken (Germany) parlera de Data-oriented Parsing with Lexicalized Tree Insertion Grammars
I will present a number of strategies for the creation and parsing of Lexicalized Tree Insertion Grammars (LTIG) The grammars are automatically extracted from different sorts of treebanks, e.g., treebanks generated from HPSG parses or dependency-based treebanks. In case of HPSG, we describe a method for the automatic extraction of a Stochastic LTIG from a linguistically rich German HPSG Treebank. The extraction method is strongly guided by HPSG—based head and argument decomposition rules. The tree anchors correspond to lexical labels encoding fine—grained information. In case of dependency-based treebanks, I present a fully automatic method for transforming dependency trees encoded in the CoNLL format to a constituent-style tree format. Parsing is performed by an efficient two-level early-based parser, which among others, has a high degree of language independency and can handle multiword lexical anchors efficiently.
11 December 2008 ; 15:00-16:00, Salle A006
Paola Merlo, University of Geneva (Switzerland) parlera de Lexical and Structural Constraints in Semantic role labelling
Résumé : One of the currently debated topics in Natural Language Processing is the problem of semantic role labelling. Given a sentence like ``I want to reserve a flight from Geneva to New York’’, how do we determine automatically that ``from Geneva’’ is the source of the flight and ``to New York’’ is the destination ? The solution to this problem lies at the heart of all applications that require language understanding, such as dialogue systems, question answering, or machine translation.
Recent successes of machine learning methods in statistical parsing and lexical acquisition pave the way to a learning approach for this problem too. In this presentation, I will motivate a probabilistic model of joint learning of syntactic trees and semantic role labels, and will illustrate its computational and linguistic properties. This method achieves competitive results. The good performance of this model can be interpreted as a confirmation of the linguistic hypothesis on the relationship between syntax and lexical semantics called ``linking theory’’.