Deep Clustering for Single-Channel Ego-Noise Suppression
* Presenting author
In the context of audio signal processing for microphone-equipped robots, the robot’s self-created movement noise, so-called ego-noise, is a crucial problem. It massively corrupts the microphone signal and degrades the robot’s capability to interact intuitively with its environment. Therefore, ego-noise suppression is a key processing step in robot audition, which is commonly addressed using learning-based dictionary or template approaches.In this contribution, we introduce a deep-learning framework called Deep Clustering (DC) for ego-noise suppression in a single microphone channel, which was initially introduced by Hershey et al. for the task of speech separation. In DC, a bi-directional recurrent neural network is trained to embed each time-frequency bin of a mixture, containing ego-noise and speech, to a higher dimensional domain under the constraint that embeddings of bins dominated by ego-noise have maximal distance to those dominated by speech. During testing, clustering is performed in the embedding domain to assign each time-frequency bin uniquely to one of the two signal components and thereby allowing the estimation of both.We demonstrate that DC allows a significant reduction of ego-noise in the reconstructed signal. Additionally, we investigate the influence of the embedding size and the amount of training data on the suppression performance.