Deep Learning Applied to Dereverberation and Sound Event Classification in Reverberant Environments
* Presenting author
This paper investigates dereverberation and sound event detection techniques, with the aid of deep learning. The system consists of two units: a microphone array front-end and a deep learning back-end. The system is examined in the context of two important acoustic signal processing: dereverberation and sound event classification (SED). For the dereverberation problem, a neural network-based approach is compared with other state-of-the-art methods such as beamforming, the multi-channel inverse filtering (MINT), the multi-channel Wiener filter (MWF), and the variance-normalized delayed linear prediction (NDLP). For the SEC problem, two approaches are also compared. Approach 1 is based on the signals enhanced by the microphone array front-end, which serves as the input to the back-end deep neural network (DNN) classifier. The DNN is implemented by using the VGGNet®. Approach 2 is a direct approach that uses reverberant data directly to train the classifier without the front-end enhancement. The audio features are extracted as MFSC from the AudioSet from Google®. A room response simulator based on the image source method is employed to create reverberant signals for numerous RT60 conditions in the training phase. Perceptual evaluation of speech quality (PESQ) and F1-Score are adopted to assess the audio quality and classification performance.