Audiovisual active speaker localization and enhancement for multirotor micro aerial vehicles
* Presenting author
We address the problem of localizing a speaker and enhancing his voice using audio-visual sensors installed on a multirotor micro aerial vehicle (MAV). Acoustic-only localization and signal enhancement through beamforming techniques is especially challenging in this conditions, due to the nature and intensity of disturbances originated by the electrical engines and the propellers. We propose a solution in which an efficient beamforming-based algorithm for both localization and enhancement of the source is paired to a video-based human detection. The video processing front-end detects the human silhouettes and provides an estimation of direction of arrivals (DOAs) on the array. When the acoustic localization front-end detects a speech activity originating from one of the possible directions estimated by the visual components, the acoustic source localization is refined and the recorded signal is enhanced through acoustic beamforming. The proposed algorithm was tested on a MAV equipped with a compact uniform linear array (ULA) of four microphones. A set of scenes featuring two human subjects lying in the field of view and speaking one at a time is analyzed through this method. The experimental results conducted in stable hovering conditions are illustrated, and the localization and signal enhancing performances are analized.