Felipe Farias, William Alberto Cruz Castañeda, Wilmer Lobato, Marcellus Amadeus. Bilingual ASR model with language identification for Brazilian Portuguese and South-American Spanish. DOI: 10.14209/sbrt.2023.1570916069.
Felipe Farias, William Alberto Cruz Castañeda, Wilmer Lobato, Marcellus Amadeus. Bilingual ASR model with language identification for Brazilian Portuguese and South-American Spanish. DOI: 10.14209/sbrt.2023.1570916069.
Felipe Farias, William Alberto Cruz Castañeda, Wilmer Lobato, Marcellus Amadeus. Bilingual ASR model with language identification for Brazilian Portuguese and South-American Spanish. DOI: 10.14209/sbrt.2023.1570916069.
Felipe Farias, William Alberto Cruz Castañeda, Wilmer Lobato, Marcellus Amadeus. Bilingual ASR model with language identification for Brazilian Portuguese and South-American Spanish. DOI: 10.14209/sbrt.2023.1570916069.
Abstract
This paper documents the development of a special case of multilingual Automatic Speech Recognition model, specifically tailored to attend two languages spoken by the majority of Latin America, Portuguese and Spanish. The bilingual model combines Language Identification and Speech Recognition developed with the Wav2Vec2.0 architecture and trained on several open and private speech datasets. In this model, the feature encoder is trained jointly for all tasks and different context encoders are trained for each task. The model is evaluated separately on two tasks: language identification and speech recognition. The results indicate that this model achieves good performance on speech recognition and average performance on language identification, training on a low quantity of speech material. The average accuracy of the language identification module on the MLS dataset is 66.75%. The average Word Error Rate in the same scenario is 13.89%, which is better than average 22.58% achieved by the commercial speech recognizer developed by Google.
Keywords
Speech Recognition; Automatic Speech Recognition; Language Identification; Wav2Vec2; Multilingual
Subject
Computer Science and Mathematics, Artificial Intelligence and Machine Learning
Copyright:
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.