Past seminars in 2018

Angela Fan on Tuesday, April 17

Date: Tuesday, 17th April 2018 at 11am
Place: LORIA, room A008
Speaker: Angela Fan (Facebook Research)
Title: Sequence to Sequence Learning for User-Controllable Abstractive Summarization
Abstract: The design of neural architectures for sequence to sequence tasks such as summarization is an active research field. I will first briefly discuss the architectural changes that enable our convolutional sequence to sequence model, such as replacing non-linearities with novel gated linear units and multi-hop attention. The second part of the talk will discuss ways to train models for large-scale summarization tasks and respect user preferences. Our model enables users to specify high-level attributes to control the shape of final summaries to suit their needs. For example, users may want to specify the length or portion of the document to summarize. With this input, the system can produce summaries that respect user preference. Without user input, the control variables can be automatically set to outperform comparable state of the art summarization models.

Hervé Bredin on Wednesday, April 25

Date: Wednesday, 25th April 2018 at 2pm
Place: LORIA, room A008
Speaker: Hervé Bredin (LIMSI)

Title: Neural building blocks for speaker diarization

Speaker diarization is the task of determining “who speaks when” in an audio stream. Most diarization systems rely on statistical models to address four sub-tasks: speech activity detection (SAD), speaker change detection (SCD), speech turn clustering, and re-segmentation. First, following the recent success of recurrent neural networks (RNN) for SAD and SCD, we propose to address re-segmentation with Long-Short Term Memory (LSTM) networks. Then, we propose to use affinity propagation on top of neural speaker embeddings for speech turn clustering, outperforming regular Hierarchical Agglomerative Clustering (HAC). Finally, all these modules are combined and jointly optimized to form a speaker diarization pipeline in which all but the clustering step are based on RNNs. We provide experimental results on the French Broadcast dataset ETAPE where we reach state-of-the-art performance.

Neil Zeghidour on Wednesday, May 30

Date: Wednesday, 30th May 2018 at 2pm
Place: LORIA, room C005
Speaker: Neil Zeghidour (Facebook AI Research & Ecole Normale Supérieure)

Title: End-to-end speech recognition from the raw waveform

State-of-the-art speech recognition systems rely on fixed, hand-crafted features such as mel-filterbanks to preprocess the waveform before the training pipeline. We study end-to-end systems trained directly from the raw waveform, introducing a trainable replacement of mel-filterbanks that uses a convolutional architecture, based on the scattering transform. These time-domain filterbanks (TD-filterbanks) are initialized as an approximation of melfilterbanks, and then fine-tuned jointly with the remaining convolutional architecture. We perform phone recognition experiments on TIMIT and show that models trained on TD-filterbanks consistently outperform their counterparts trained on comparable mel-filterbanks. We then improve this model and another frontend previously proposed and based on gammatones. We perform open vocabulary experiments on Wall Street Journal and show a consistent and significant improvement in Word Error Rate of our trainable frontends over mel-filterbanks, even with random initialization.

Maël Primet on Wednesday, June 20

Date: Wednesday, 20th June 2018 at 2pm
Place: LORIA, room A008
Speaker: Maël Primet (Snips)

Title: Presenting Snips: a Platform for Natural Language Understanding and Speech Recognition on the edge

Abstract: Snips is building a Voice AI platform running 100% on embedded devices, comprising of a wake-word detector, an automated speech recognizer, and natural language understanding. As microprocessors become cheaper and more powerful, moving computations to the edge provides many advantages: privacy-by-design, 100% availability, lower costs, higher reactivity. But this also requires tradeoffs in terms of accuracy, as larger models are not able to run on commodity chips. This talk will present some aspects of the Snips platform and tradeoffs which have been made to guarantee high accuracy while reducing model sizes.

Edward L. Keenan on Tuesday, June 26

Date: Tuesday, 26th June 2018 at 10am
Place: LORIA, room A008
SpeakerEdward L. Keenan (Department of LinguisticsUCLA)

Title: Individuals explained away

Abstract: As a linguist wading into philosophical waters I begin with two semantic observations concerning some intensional common nouns and their modifiers. I provide them with a minimal semantic analysis whose justification is twofold. One, it is linguistically enlightening: it points out some semantic generalizations about English and provides a formally explicit analysis of the relevant entailment patterns. Two, it generalizes standard extensional model theory without adding novel entities such as possible worlds or propositions. Such entities may facilitate the semantic analysis of modal adverbs and propositional attitude verbs, but are not needed for the intensional expressions we study here. Our analysis is explanatory in that it characterizes what we are trying to understand only using notions we already understand, not novel ones we don’t. Our analysis may also have some consequences for Direct Reference Theory (see Almog 2012, Bianchi 2012, Kaplan 1989, Napoli 1995, Wettstein 2004). Among them is that it eliminates from our naive ontology a universe of objects we think of singular terms as denoting and unbound pronouns and individual variables in logic as ranging over. It also establishes some boundary points limiting the purview of Direct Reference Theory. And, if nothing else, it may serve a “bad example” function – “Just look at what happens if you do not adopt a direct reference stance”. As well, our use of judgments of entailment is more at home in a Fregean setting than a Direct Reference one (Capuano 2012).

Paul Magron on Wednesday, October 17

Date: Wednesday, 17th October 2018 at 2pm
Place: LORIA, room A008
Speaker: Paul Magron (Tampere University of Technology)
Title: Probabilistic modeling of the phase for audio source separation
Many audio source separation techniques act on a time-frequency representation of the data, such as the short-time Fourier transform (STFT), since it reveals the underlying structure of sounds. These methods usually discard the phase information and process spectrogram-like quantities only. The sources are finally retrieved by means of a Wiener-like filter, which assigns the phase of the original mixture to each isolated source. However, this introduces interference and artifacts in the estimates, which highlights the need for more sophisticated phase recovery techniques. In this talk, we will present our recent work on phase-aware probabilistic models for audio source separation. Firstly, we will model the phase as a non-uniform random variable based on the von Mises distribution. This allows us to incorporate some prior knowledge about the phase, e.g., that arise from a signal model (sums of sinusoids). In particular, we will show that the traditional uniform model and the von Mises model are not contradictory, but rather rely on different assumptions about the phase. Secondly, we will present mixture models based on the anisotropic Gaussian distribution, from which we can derive phase-aware estimators of the sources in the STFT domain. This results in an anisotropic Wiener filter, which preserves some of the interesting statistical properties of the Wiener filter, while enabling one to account for a phase model. Finally, we will propose techniques for jointly inferring the magnitude and the phase based on this framework. Indeed, by structuring the variance parameters of these models through e.g., nonnegative matrix factorization or deep neural networks, we can derive complete and phase-aware source separation systems.

Antoine Deleforge on Wednesday, November 14

Date: Wednesday, 14th November 2018 at 2pm
Place: LORIA, room C005
Speaker: Antoine Deleforge (Inria Nancy – Grand Est)

Title: Audio signal processing with a little help from echoes

When a sound wave propagates from a point source through a medium and is reflected on surfaces before reaching microphones, the measured signals consist of mixtures of the direct path signal with delayed and attenuated copies of itself. This acoustical phenomenon is referred to as echoes, or reverberation, and is generally considered as a nuisance in audio signal processing. After introducing some basic signal processing and acoustic background, this seminar will present recent works showing how acoustic echoes can be blindly estimated from audio recordings, and how the knowledge of such echoes can actually help some audio signal processing tasks such as beamforming, source separation or sound source localization.