Article
Version 1
Preserved in Portico This version is not peer-reviewed
Iteratively Refined Multi-Channel Speech Separation
Version 1
: Received: 22 May 2024 / Approved: 22 May 2024 / Online: 22 May 2024 (12:44:36 CEST)
A peer-reviewed article of this Preprint also exists.
Zhang, X.; Bao, C.; Yang, X.; Zhou, J. Iteratively Refined Multi-Channel Speech Separation. Appl. Sci. 2024, 14, 6375. Zhang, X.; Bao, C.; Yang, X.; Zhou, J. Iteratively Refined Multi-Channel Speech Separation. Appl. Sci. 2024, 14, 6375.
Abstract
The combination of neural network and beamforming has been proved to be very effective in multi-channel speech separation. But its performance faces a challenge in complex environment. In this paper, an iteratively refined multi-channel speech separation method is proposed for the challenge, where the proposed is composed of initial separation and iterative separation. In initial separation, the time-frequency domain dual-path recurrent neural network neural network (TFDPRNN), minimum variance distortionless response (MVDR) beamformer and post-separation (also TFDPRNN) are cascaded for obtaining the first additional input in iterative separation. In iterative separation, the MVDR beamformer and post-separation are iteratively used, where the output of the MVDR beamformer is used as an additional input of the post-separation network and the final output comes from post-separation module. This iteration of the beamformer and post-separation is fully employed for promoting their individual optimization, which ultimately improves the overall performance of speech separation in multi-speaker scenarios. Experiments on the spatialized version of the WSJ0-2mix corpus show that our proposed method is significantly better than the current popular methods. In addition, the method also has a good effect on the dereverberation task.
Keywords
speech separation; microphone array; minimum variance distortionless response (MVDR); beamforming; iterative separation
Subject
Engineering, Electrical and Electronic Engineering
Copyright: This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Comments (0)
We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.
Leave a public commentSend a private comment to the author(s)
* All users must log in before leaving a comment