Past seminars in 2018

Angela Fan on Tuesday, April 17

Date: Tuesday, 17th April 2018 at 11am
Place: LORIA, room A008
Speaker: Angela Fan (Facebook Research)

Title: Sequence to Sequence Learning for User-Controllable Abstractive Summarization

Abstract: The design of neural architectures for sequence to sequence tasks such as summarization is an active research field. I will first briefly discuss the architectural changes that enable our convolutional sequence to sequence model, such as replacing non-linearities with novel gated linear units and multi-hop attention. The second part of the talk will discuss ways to train models for large-scale summarization tasks and respect user preferences. Our model enables users to specify high-level attributes to control the shape of final summaries to suit their needs. For example, users may want to specify the length or portion of the document to summarize. With this input, the system can produce summaries that respect user preference. Without user input, the control variables can be automatically set to outperform comparable state of the art summarization models.

Hervé Bredin on Wednesday, April 25

Date: Wednesday, 25th April 2018 at 2pm
Place: LORIA, room A008
Speaker: Hervé Bredin (LIMSI)

Title: Neural building blocks for speaker diarization

Abstract:
Speaker diarization is the task of determining “who speaks when” in an audio stream. Most diarization systems rely on statistical models to address four sub-tasks: speech activity detection (SAD), speaker change detection (SCD), speech turn clustering, and re-segmentation. First, following the recent success of recurrent neural networks (RNN) for SAD and SCD, we propose to address re-segmentation with Long-Short Term Memory (LSTM) networks. Then, we propose to use affinity propagation on top of neural speaker embeddings for speech turn clustering, outperforming regular Hierarchical Agglomerative Clustering (HAC). Finally, all these modules are combined and jointly optimized to form a speaker diarization pipeline in which all but the clustering step are based on RNNs. We provide experimental results on the French Broadcast dataset ETAPE where we reach state-of-the-art performance.

Neil Zeghidour on Wednesday, May 30

Date: Wednesday, 30th May 2018 at 2pm
Place: LORIA, room C005
Speaker: Neil Zeghidour (Facebook AI Research & Ecole Normale Supérieure)

Title: End-to-end speech recognition from the raw waveform

Abstract:
State-of-the-art speech recognition systems rely on fixed, hand-crafted features such as mel-filterbanks to preprocess the waveform before the training pipeline. We study end-to-end systems trained directly from the raw waveform, introducing a trainable replacement of mel-filterbanks that uses a convolutional architecture, based on the scattering transform. These time-domain filterbanks (TD-filterbanks) are initialized as an approximation of melfilterbanks, and then fine-tuned jointly with the remaining convolutional architecture. We perform phone recognition experiments on TIMIT and show that models trained on TD-filterbanks consistently outperform their counterparts trained on comparable mel-filterbanks. We then improve this model and another frontend previously proposed and based on gammatones. We perform open vocabulary experiments on Wall Street Journal and show a consistent and significant improvement in Word Error Rate of our trainable frontends over mel-filterbanks, even with random initialization.

Maël Primet on Wednesday, June 20

Date: Wednesday, 20th June 2018 at 2pm
Place: LORIA, room A008
Speaker: Maël Primet (Snips)

Title: Presenting Snips: a Platform for Natural Language Understanding and Speech Recognition on the edge

Abstract: Snips is building a Voice AI platform running 100% on embedded devices, comprising of a wake-word detector, an automated speech recognizer, and natural language understanding. As microprocessors become cheaper and more powerful, moving computations to the edge provides many advantages: privacy-by-design, 100% availability, lower costs, higher reactivity. But this also requires tradeoffs in terms of accuracy, as larger models are not able to run on commodity chips. This talk will present some aspects of the Snips platform and tradeoffs which have been made to guarantee high accuracy while reducing model sizes.

Edward L. Keenan on Tuesday, June 26

Date: Tuesday, 26th June 2018 at 10am
Place: LORIA, room A008
Speaker: Edward L. Keenan (Department of Linguistics, UCLA)

Title: Individuals explained away

Abstract: As a linguist wading into philosophical waters I begin with two semantic observations concerning some intensional common nouns and their modifiers. I provide them with a minimal semantic analysis whose justification is twofold. One, it is linguistically enlightening: it points out some semantic generalizations about English and provides a formally explicit analysis of the relevant entailment patterns. Two, it generalizes standard extensional model theory without adding novel entities such as possible worlds or propositions. Such entities may facilitate the semantic analysis of modal adverbs and propositional attitude verbs, but are not needed for the intensional expressions we study here. Our analysis is explanatory in that it characterizes what we are trying to understand only using notions we already understand, not novel ones we don’t. Our analysis may also have some consequences for Direct Reference Theory (see Almog 2012, Bianchi 2012, Kaplan 1989, Napoli 1995, Wettstein 2004). Among them is that it eliminates from our naive ontology a universe of objects we think of singular terms as denoting and unbound pronouns and individual variables in logic as ranging over. It also establishes some boundary points limiting the purview of Direct Reference Theory. And, if nothing else, it may serve a “bad example” function – “Just look at what happens if you do not adopt a direct reference stance”. As well, our use of judgments of entailment is more at home in a Fregean setting than a Direct Reference one (Capuano 2012).

Paul Magron on Wednesday, October 17

Date: Wednesday, 17th October 2018 at 2pm
Place: LORIA, room A008
Speaker: Paul Magron (Tampere University of Technology)

Title: Probabilistic modeling of the phase for audio source separation

Abstract:
Many audio source separation techniques act on a time-frequency representation of the data, such as the short-time Fourier transform (STFT), since it reveals the underlying structure of sounds. These methods usually discard the phase information and process spectrogram-like quantities only. The sources are finally retrieved by means of a Wiener-like filter, which assigns the phase of the original mixture to each isolated source. However, this introduces interference and artifacts in the estimates, which highlights the need for more sophisticated phase recovery techniques. In this talk, we will present our recent work on phase-aware probabilistic models for audio source separation. Firstly, we will model the phase as a non-uniform random variable based on the von Mises distribution. This allows us to incorporate some prior knowledge about the phase, e.g., that arise from a signal model (sums of sinusoids). In particular, we will show that the traditional uniform model and the von Mises model are not contradictory, but rather rely on different assumptions about the phase. Secondly, we will present mixture models based on the anisotropic Gaussian distribution, from which we can derive phase-aware estimators of the sources in the STFT domain. This results in an anisotropic Wiener filter, which preserves some of the interesting statistical properties of the Wiener filter, while enabling one to account for a phase model. Finally, we will propose techniques for jointly inferring the magnitude and the phase based on this framework. Indeed, by structuring the variance parameters of these models through e.g., nonnegative matrix factorization or deep neural networks, we can derive complete and phase-aware source separation systems.

Antoine Deleforge on Wednesday, November 14

Date: Wednesday, 14th November 2018 at 2pm
Place: LORIA, room C005
Speaker: Antoine Deleforge (Inria Nancy – Grand Est)

Title: Audio signal processing with a little help from echoes

Abstract:
When a sound wave propagates from a point source through a medium and is reflected on surfaces before reaching microphones, the measured signals consist of mixtures of the direct path signal with delayed and attenuated copies of itself. This acoustical phenomenon is referred to as echoes, or reverberation, and is generally considered as a nuisance in audio signal processing. After introducing some basic signal processing and acoustic background, this seminar will present recent works showing how acoustic echoes can be blindly estimated from audio recordings, and how the knowledge of such echoes can actually help some audio signal processing tasks such as beamforming, source separation or sound source localization.

Emmanuel Dupoux on Wednesday, November 21

Date: Wednesday, 21st November 2018 at 2pm
Place: LORIA, room A008
Speaker: Emmanuel Dupoux (EHESS, Laboratoire de Sciences Cognitives et Psycholinguistique)

Title: Towards developmental AI

Abstract:
Even though current machine learning techniques yield systems that achieve parity with humans on several high level tasks, the learning algorithms themselves are orders of magnitude less data efficient than those used by humans, as evidenced by the speed and resilience with which infants learn language and common sense. I review some of our recent attempts to reverse engineer such abilities in the area of unsupervised or weakly supervised learning of speech representations and speech terms, and the learning the laws of intuitive physics by observation of videos. I argue that a triple effort in data collection, algorithm development and fine grained human/machine comparisons is needed to uncover these developmental algorithms.

Chloé Braud on Wednesday, December 5

Date: Wednesday, 5th December 2018 at 2pm
Place: LORIA, room A008
Speaker: Chloé Braud (CNRS – LORIA)

Title: Transfer learning for discourse parsing

Abstract:
Discourse structures describe the organization of documents in terms of discourse or rhetorical relations (such as « Explanation » or « Contrast ») linking clauses and sentences. Discourse analysis could be useful for various downstream applications, such as automatic summarization, question-answering or sentiment analysis. However, the range of applications and the performance are still limited by the low scores of the existing discourse parsers and their focus on English. Discourse parsing is known to be a hard task: It involves several complex and interacting factors, touching upon all layers of linguistic analysis, from syntax, semantics up to pragmatics. Consequently, also annotation is complex and time consuming, and hence available annotated corpora are sparse and limited in size. In this presentation, I will present attempts to tackle these issues using transfer learning strategies. First, I will describe experiments on identifying implicit discourse relations (i.e. lacking a discourse connective such as « but » or « because ») by transferring knowledge from the explicit examples to the implicit ones, either by augmenting the size of the training set, or by building a task-tailored representation of the words. I will then present two full discourse parsers. The first one involves a combination of several corpora annotated for different languages, leading to improvements on English and to the first systems for Basque and Dutch developed without any training data. The second parser relies on multi-task learning to transfer information among several discourse related tasks.

Philippe Muller on Friday, December 7

Date: Friday, 7th December 2018 at 10am
Place: LORIA, room A008
Speaker: Philippe Muller (IRIT, Toulouse)

Title: Sentential distributional semantics: Learning semantic sentence representations and their compositions (Joint work with Damien Sileo et Tim van de Cruys)

Abstract:
Distributional semantics aims at automatic representation of textual semantic content based on the observation of a large representative corpus. There is a large body of work on lexical distributional semantics, based on the assumption that words appearing in similar contexts should have similar semantic representations. This popularized the representation of words as vectors in a semantic space. More recently, a lot of effort in the NLP field has been devoted to building similar representations for sentences, or even larger textual elements. This raises several questions: how to build sentence representations from word representations in vector spaces, preferably in a compositional manner, and how to guide the representations so that they capture important semantic aspect at the sentence level? Arguably, sequential compositional models such as recurrent neural network offer a simple composition at the lexical level that can be used in supervised settings to make accurate predictions in textual classification, while building a representation of the sentential context in their internal state. This is however specific to each task, and researchers have tried to find ways of building so-called « universal » sentence representations, or more exactly transferable representations. In this perspective several settings have been proposed that evokes supervised distributional approaches at the word level, with auxilliary tasks that could induce semantically relevant representations at the sentence level: for instance trying to predict if two sentences follow each other in a text, or if one is a consequence of the other. These in turn must compose the two sentences in a way that allows for the learning of their relationships. Composition of representations is also important in all tasks that involve predicting a relation between a pair of textual elements: sentence similarity, entailment, discourse relations. The compositions considered in NLP are often quite superficial, and we will show more expressive compositions by taking inspiration from Statistical Relational Learning. Moreover we propose an unsupervised training task to induce sentence representations, based on the prediction of discourse connections between sentences in a large corpus.