Victor Bisot

Date: Wednesday, 8th November 2017 at 2pm
Place: LORIA, room A008
Speaker: Victor Bisot (Télécom Paristech)

Title: Learning nonnegative representations for environmental sound analysis

Abstract: The growing interest for environmental sound analysis tasks is followed by an important increase of deep learning-based approaches. As most focus on finding adapted neural network architecture or algorithms, few works question the choice of the input representation. In this talk, we will discuss how nonnegative matrix factorization-based feature learning techniques are suited for such tasks, especially when used as input of deep neural networks. We start by highlighting the usefulness of unsupervised NMF variants for feature learning in multi-source environments. Next, we introduce a supervised variant of NMF known as TNMF (Task-driven NMF). The TNMF model aims at improving the quality of the decomposition by learning the nonnegative dictionaries that minimize the target classification loss. In a second part, we will exhibit similarities between NMF, TNMF and standard NN layers, justifying the potential of NMF-based features as input to various DNN models. The proposed models are evaluated on acoustic scene classification and overlapping sound event detection datasets and show improvements over standard CNN approaches for such tasks. Finally we will discuss how NMF dictionaries and neural networks parameters can be trained jointly in a common optimization problem by relying on the TNMF framework.