Maël Primet

Date: Wednesday, 20th June 2018 at 2pm
Place: LORIA, room A008
Speaker: Maël Primet (Snips)

Title: TBA


Lire la suite

Neil Zeghidour

Date: Wednesday, 30th May 2018 at 2pm
Place: LORIA, room C005
Speaker: Neil Zeghidour (Facebook AI Research & Ecole Normale Supérieure)

Title: End-to-end speech recognition from the raw waveform

State-of-the-art speech recognition systems rely on fixed, hand-crafted features such as mel-filterbanks to preprocess the waveform before the training pipeline. We study end-to-end systems trained directly from the raw waveform, introducing a trainable replacement of mel-filterbanks that uses a convolutional architecture, based on the scattering transform. These time-domain filterbanks (TD-filterbanks) are initialized as an approximation of melfilterbanks, and then fine-tuned jointly with the remaining convolutional architecture. We perform phone recognition experiments on TIMIT and show that models trained on TD-filterbanks consistently outperform their counterparts trained on comparable mel-filterbanks. We then improve this model and another frontend previously proposed and based on gammatones. We perform open vocabulary experiments on Wall Street Journal and show a consistent and significant improvement in Word Error Rate of our trainable frontends over mel-filterbanks, even with random initialization.

Lire la suite

Hervé Bredin

Date: Wednesday, 25th April 2018 at 2pm
Place: LORIA, room A008
Speaker: Hervé Bredin (LIMSI)

Title: Neural building blocks for speaker diarization

Speaker diarization is the task of determining “who speaks when” in an audio stream. Most diarization systems rely on statistical models to address four sub-tasks: speech activity detection (SAD), speaker change detection (SCD), speech turn clustering, and re-segmentation. First, following the recent success of recurrent neural networks (RNN) for SAD and SCD, we propose to address re-segmentation with Long-Short Term Memory (LSTM) networks. Then, we propose to use affinity propagation on top of neural speaker embeddings for speech turn clustering, outperforming regular Hierarchical Agglomerative Clustering (HAC). Finally, all these modules are combined and jointly optimized to form a speaker diarization pipeline in which all but the clustering step are based on RNNs. We provide experimental results on the French Broadcast dataset ETAPE where we reach state-of-the-art performance.

Lire la suite

Angela Fan

Date: Tuesday, 17th April 2018 at 11am
Place: LORIA, room A008
Speaker: Angela Fan (Facebook Research)

Title:Sequence to Sequence Learning for User-Controllable Abstractive Summarization

Abstract: The design of neural architectures for sequence to sequence tasks such as summarization is an active research field. I will first briefly discuss the architectural changes that enable our convolutional sequence to sequence model, such as replacing non-linearities with novel gated linear units and multi-hop attention. The second part of the talk will discuss ways to train models for large-scale summarization tasks and respect user preferences. Our model enables users to specify high-level attributes to control the shape of final summaries to suit their needs. For example, users may want to specify the length or portion of the document to summarize. With this input, the system can produce summaries that respect user preference. Without user input, the control variables can be automatically set to outperform comparable state of the art summarization models.

Lire la suite

Jonathan Berant

Date: Wednesday, 13th December 2017 at 2pm
Place: LORIA, room A008
Speaker: Jonathan Berant, Tel Aviv University (Israel)

Title: Talking to your Virtual Assistant about anything

Abstract: Conversational interfaces and virtual assistants are now part of our lives due to services such as Amazon Alexa, Google Voice, Microsoft Cortana, etc. Thus, translating natural language queries and commands into an executable form, also known as semantic parsing, is one of the prime challenges nowadays in natural language understanding. In this talk I would like to highlight the main challenges and limitations in the field of semantic parsing, and to describe ongoing work that addresses those challenges. First, semantic parsers require information to be stored in a knowledge-base, which substantially limits their coverage and applicability. Conversely, the web has huge coverage but search engines that access the web do not handle well language compositionality. We propose to treat the web as a KB and compute answers to complex questions in broad domains by decomposing the question into a sequence of simple questions, extract answers with a search engine, and recompose the answers to obtain a final result. Second, deploying virtual assistants in many domains (cars, homes, calendar, etc.) requires the ability to quickly develop semantic parsers. However, most past work trains semantic parsers from scratch for any domain, while disregarding training data from other domains. We propose a zero-shot approach for semantic parsing, where we decouple the structure of language from the contents of the domain and learn a domain-independent semantic parser. Last, one of the most popular setups for training semantic parsers is to use denotations as supervision. However, training from denotations results in a difficult search problem as well as a spuriousness issue where incorrect programs evaluate to the correct denotation. I will describe some recent work in which we try to address the challenges of search and spuriousness.

Lire la suite

Yannick Parmentier

Date: Wednesday, 6th December 2017 at 2pm
Place: LORIA, room A008
Speaker: Yannick Parmentier, (LORIA, Synalp team)

Title: From Description Languages to the Description of Description Languages


Having a fine-grained description of natural language is a prerequisite for many NLP applications such as dialogue systems or machine translation. Such descriptions were originally hand-crafted, and thus costly to build and extend. In the late 90ies, some attempts were made to semi-automatically generate these linguistic descriptions. Among these attempts, one may cite the metagrammar approach which consists in describing a linguistic description using a formal language. The XMG language and related compiler were designed in this context. XMG offers means to abstract over redundant lexicalised linguistic descriptions. While limited to tree-based linguistic description, it opened up new perspectives in linguistic resource engineering.
In this talk, we introduce XMG2, a modular and extensible tool for various linguistic description tasks. Based on the notion of meta-compilation (that is, compilation of compilers), XMG2 reuses the main concepts underlying XMG, namely logic programming and constraint satisfaction, to generate on-demand XMG-like compilers by assembling elementary units called language bricks. This brick-based definition of compilers permits users to design description languages in a highly flexible way. In particular, it makes it possible to support several levels of linguistic description (e.g. syntax, morphology) within a single description language.

Lire la suite

Yves Lepage

Date: Friday, 1st December 2017 at 10am
Place: LORIA, room B013
Speaker: Yves Lepage (Waseda University, Japan)

Title: Analogy for natural language processing (NLP) and machine translation (MT)

Abstract: In this talk, we introduce the notion of analogy and show its application to language data or to various NLP tasks.

We start from an algebraic definition between vector representations and apply it to pixel images representing Chinese characters. We illustrate a fast algorithm to rapidly structure data into analogical clusters.

By adding the notion of edit distance, we show how to capture analogies between strings of symbols and generalise analogical clusters to analogical grids, a structure similar to paradigm tables in morphology. Such a structure can be used to predict or explain word forms. In particular, we report work on explaining unseen words in Indonesian.

As solving analogical equations between strings of symbols is a key problem to address some tasks like machine translation, we report results on this topic obtained by using standard techniques or neural networks.

We finish by presenting a machine translation system by analogy and sketch future research on it.

Lire la suite

Victor Bisot

Date: Wednesday, 8th November 2017 at 2pm
Place: LORIA, room A008
Speaker: Victor Bisot (Télécom Paristech)

Title: Learning nonnegative representations for environmental sound analysis

Abstract: The growing interest for environmental sound analysis tasks is followed by an important increase of deep learning-based approaches. As most focus on finding adapted neural network architecture or algorithms, few works question the choice of the input representation. In this talk, we will discuss how nonnegative matrix factorization-based feature learning techniques are suited for such tasks, especially when used as input of deep neural networks. We start by highlighting the usefulness of unsupervised NMF variants for feature learning in multi-source environments. Next, we introduce a supervised variant of NMF known as TNMF (Task-driven NMF). The TNMF model aims at improving the quality of the decomposition by learning the nonnegative dictionaries that minimize the target classification loss. In a second part, we will exhibit similarities between NMF, TNMF and standard NN layers, justifying the potential of NMF-based features as input to various DNN models. The proposed models are evaluated on acoustic scene classification and overlapping sound event detection datasets and show improvements over standard CNN approaches for such tasks. Finally we will discuss how NMF dictionaries and neural networks parameters can be trained jointly in a common optimization problem by relying on the TNMF framework.

Lire la suite

Aurélien Bellet

Date: Wednesday, 25th October 2017 at 2pm
Place: LORIA, room C005
Speaker: Aurélien Bellet (Inria Lille – Nord Europe)

Title: Private algorithms for decentralized collaborative machine learning

Abstract: With the advent of connected devices with computation and storage capabilities, it becomes possible to run machine learning on-device to provide personalized services. However, the currently dominant approach is to centralize data from all users on an external server for batch processing, sometimes without explicit consent from users and with little oversight. This centralization poses important privacy issues in applications involving sensitive data such as speech, medical records or geolocation logs.

In this talk, I will discuss an alternative setting where many agents with local datasets collaborate to learn models by engaging in a fully decentralized peer-to-peer network. We introduce and analyze asynchronous algorithms in the gossip and broadcast communication model that allow agents to improve upon their locally trained model by exchanging information with other agents that have similar objectives. Our first approach aims to smooth pre-trained local models over the network while accounting for the confidence that each agent has in its initial model. In our second approach, agents jointly learn and propagate their model by making iterative updates based on both their local dataset and the behavior of their neighbors. I will also describe how to make such algorithms differentially private to avoid leaking information about the local datasets, and analyze the resulting privacy-utility trade-off.

Lire la suite