Version 1
: Received: 22 April 2021 / Approved: 23 April 2021 / Online: 23 April 2021 (10:35:20 CEST)
How to cite:
Kubal, D.; Palivela, H. Unified Model for Paraphrase Generation and Paraphrase Identification. Preprints2021, 2021040630. https://doi.org/10.20944/preprints202104.0630.v1
Kubal, D.; Palivela, H. Unified Model for Paraphrase Generation and Paraphrase Identification. Preprints 2021, 2021040630. https://doi.org/10.20944/preprints202104.0630.v1
Kubal, D.; Palivela, H. Unified Model for Paraphrase Generation and Paraphrase Identification. Preprints2021, 2021040630. https://doi.org/10.20944/preprints202104.0630.v1
APA Style
Kubal, D., & Palivela, H. (2021). Unified Model for Paraphrase Generation and Paraphrase Identification. Preprints. https://doi.org/10.20944/preprints202104.0630.v1
Chicago/Turabian Style
Kubal, D. and Hemant Palivela. 2021 "Unified Model for Paraphrase Generation and Paraphrase Identification" Preprints. https://doi.org/10.20944/preprints202104.0630.v1
Abstract
Paraphrase Generation is one of the most important and challenging tasks in the field of Natural Language Generation. The paraphrasing techniques help to identify or to extract/generate phrases/sentences conveying the similar meaning. The paraphrasing task can be bifurcated into two sub-tasks namely, Paraphrase Identification (PI) and Paraphrase Generation (PG). Most of the existing proposed state-of-the-art systems have the potential to solve only one problem at a time. This paper proposes a light-weight unified model that can simultaneously classify whether given pair of sentences are paraphrases of each other and the model can also generate multiple paraphrases given an input sentence. Paraphrase Generation module aims to generate fluent and semantically similar paraphrases and the Paraphrase Identification systemaims to classify whether sentences pair are paraphrases of each other or not. The proposed approach uses an amalgamation of data sampling or data variety with a granular fine-tuned Text-To-Text Transfer Transformer (T5) model. This paper proposes a unified approach which aims to solve the problems of Paraphrase Identification and generation by using carefully selected data-points and a fine-tuned T5 model. The highlight of this study is that the same light-weight model trained by keeping the objective of Paraphrase Generation can also be used for solving the Paraphrase Identification task. Hence, the proposed system is light-weight in terms of the model’s size along with the data used to train the model which facilitates the quick learning of the model without having to compromise with the results. The proposed system is then evaluated against the popular evaluation metrics like BLEU (BiLingual Evaluation Understudy):, ROUGE (Recall-Oriented Understudy for Gisting Evaluation), METEOR, WER (Word Error Rate), and GLEU (Google-BLEU) for Paraphrase Generation and classification metrics like accuracy, precision, recall and F1-score for Paraphrase Identification system. The proposed model achieves state-of-the-art results on both the tasks of Paraphrase Identification and paraphrase Generation.
Keywords
Paraphrase Identification; Paraphrase Generation; Natural Language Generation; Language Model; Encoder Decoder; Transformer
Subject
Computer Science and Mathematics, Algebra and Number Theory
Copyright:
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.