Date: Tuesday, 17th April 2018 at 11am Place: LORIA, room A008 Speaker: Angela Fan (Facebook Research)Title: Sequence to Sequence Learning for User-Controllable Abstractive SummarizationAbstract: The design of neural architectures for sequence to sequence tasks such as summarization is an active research field. I will first briefly discuss the architectural changes that enable our convolutional sequence to sequence model, such as replacing non-linearities with novel gated linear units and multi-hop attention. The second part of the talk will discuss ways to train models for large-scale summarization tasks and respect user preferences. Our model enables users to specify high-level attributes to control the shape of final summaries to suit their needs. For example, users may want to specify the length or portion of the document to summarize. With this input, the system can produce summaries that respect user preference. Without user input, the control variables can be automatically set to outperform comparable state of the art summarization models.
Hervé Bredin on Wednesday, April 25
Date: Wednesday, 25th April 2018 at 2pm Place: LORIA, room A008 Speaker: Hervé Bredin (LIMSI)
Title: Neural building blocks for speaker diarization
Speaker diarization is the task of determining “who speaks when” in an audio stream. Most diarization systems rely on statistical models to address four sub-tasks: speech activity detection (SAD), speaker change detection (SCD), speech turn clustering, and re-segmentation. First, following the recent success of recurrent neural networks (RNN) for SAD and SCD, we propose to address re-segmentation with Long-Short Term Memory (LSTM) networks. Then, we propose to use affinity propagation on top of neural speaker embeddings for speech turn clustering, outperforming regular Hierarchical Agglomerative Clustering (HAC). Finally, all these modules are combined and jointly optimized to form a speaker diarization pipeline in which all but the clustering step are based on RNNs. We provide experimental results on the French Broadcast dataset ETAPE where we reach state-of-the-art performance.
Neil Zeghidour on Wednesday, May 30
Date: Wednesday, 30th May 2018 at 2pm Place: LORIA, room C005 Speaker: Neil Zeghidour (Facebook AI Research & Ecole Normale Supérieure)
Title: End-to-end speech recognition from the raw waveform
State-of-the-art speech recognition systems rely on fixed, hand-crafted features such as mel-filterbanks to preprocess the waveform before the training pipeline. We study end-to-end systems trained directly from the raw waveform, introducing a trainable replacement of mel-filterbanks that uses a convolutional architecture, based on the scattering transform. These time-domain filterbanks (TD-filterbanks) are initialized as an approximation of melfilterbanks, and then fine-tuned jointly with the remaining convolutional architecture. We perform phone recognition experiments on TIMIT and show that models trained on TD-filterbanks consistently outperform their counterparts trained on comparable mel-filterbanks. We then improve this model and another frontend previously proposed and based on gammatones. We perform open vocabulary experiments on Wall Street Journal and show a consistent and significant improvement in Word Error Rate of our trainable frontends over mel-filterbanks, even with random initialization.
Maël Primet on Wednesday, June 20
Date: Wednesday, 20th June 2018 at 2pm Place: LORIA, room A008 Speaker: Maël Primet (Snips)
Title: Presenting Snips: a Platform for Natural Language Understanding and Speech Recognition on the edge
Abstract: Snips is building a Voice AI platform running 100% on embedded devices, comprising of a wake-word detector, an automated speech recognizer, and natural language understanding. As microprocessors become cheaper and more powerful, moving computations to the edge provides many advantages: privacy-by-design, 100% availability, lower costs, higher reactivity. But this also requires tradeoffs in terms of accuracy, as larger models are not able to run on commodity chips. This talk will present some aspects of the Snips platform and tradeoffs which have been made to guarantee high accuracy while reducing model sizes.
Abstract: As a linguist wading into philosophical waters I begin with two semantic observations concerning some intensional common nouns and their modifiers. I provide them with a minimal semantic analysis whose justification is twofold. One, it is linguistically enlightening: it points out some semantic generalizations about English and provides a formally explicit analysis of the relevant entailment patterns. Two, it generalizes standard extensional model theory without adding novel entities such as possible worlds or propositions. Such entities may facilitate the semantic analysis of modal adverbs and propositional attitude verbs, but are not needed for the intensional expressions we study here. Our analysis is explanatory in that it characterizes what we are trying to understand only using notions we already understand, not novel ones we don’t. Our analysis may also have some consequences for Direct Reference Theory (see Almog 2012, Bianchi 2012, Kaplan 1989, Napoli 1995, Wettstein 2004). Among them is that it eliminates from our naive ontology a universe of objects we think of singular terms as denoting and unbound pronouns and individual variables in logic as ranging over. It also establishes some boundary points limiting the purview of Direct Reference Theory. And, if nothing else, it may serve a “bad example” function – “Just look at what happens if you do not adopt a direct reference stance”. As well, our use of judgments of entailment is more at home in a Fregean setting than a Direct Reference one (Capuano 2012).