Date: Wednesday, 17th October 2018 at 2pm
Place: LORIA, room A008
Speaker: Paul Magron (Tampere University of Technology)
Title: Probabilistic modeling of the phase for audio source separation
Many audio source separation techniques act on a time-frequency representation of the data, such as the short-time Fourier transform (STFT), since it reveals the underlying structure of sounds. These methods usually discard the phase information and process spectrogram-like quantities only. The sources are finally retrieved by means of a Wiener-like filter, which assigns the phase of the original mixture to each isolated source. However, this introduces interference and artifacts in the estimates, which highlights the need for more sophisticated phase recovery techniques. In this talk, we will present our recent work on phase-aware probabilistic models for audio source separation. Firstly, we will model the phase as a non-uniform random variable based on the von Mises distribution. This allows us to incorporate some prior knowledge about the phase, e.g., that arise from a signal model (sums of sinusoids). In particular, we will show that the traditional uniform model and the von Mises model are not contradictory, but rather rely on different assumptions about the phase. Secondly, we will present mixture models based on the anisotropic Gaussian distribution, from which we can derive phase-aware estimators of the sources in the STFT domain. This results in an anisotropic Wiener filter, which preserves some of the interesting statistical properties of the Wiener filter, while enabling one to account for a phase model. Finally, we will propose techniques for jointly inferring the magnitude and the phase based on this framework. Indeed, by structuring the variance parameters of these models through e.g., nonnegative matrix factorization or deep neural networks, we can derive complete and phase-aware source separation systems.