Version 1
: Received: 4 December 2018 / Approved: 11 December 2018 / Online: 11 December 2018 (07:24:04 CET)
How to cite:
Choi, C.; Yoon, Y.; Lee, J.; Kim, J. Simultaneous Recognition of Horizontal and Vertical Text in Natural Images. Preprints2018, 2018120114. https://doi.org/10.20944/preprints201812.0114.v1
Choi, C.; Yoon, Y.; Lee, J.; Kim, J. Simultaneous Recognition of Horizontal and Vertical Text in Natural Images. Preprints 2018, 2018120114. https://doi.org/10.20944/preprints201812.0114.v1
Choi, C.; Yoon, Y.; Lee, J.; Kim, J. Simultaneous Recognition of Horizontal and Vertical Text in Natural Images. Preprints2018, 2018120114. https://doi.org/10.20944/preprints201812.0114.v1
APA Style
Choi, C., Yoon, Y., Lee, J., & Kim, J. (2018). Simultaneous Recognition of Horizontal and Vertical Text in Natural Images. Preprints. https://doi.org/10.20944/preprints201812.0114.v1
Chicago/Turabian Style
Choi, C., Junsu Lee and Junseok Kim. 2018 "Simultaneous Recognition of Horizontal and Vertical Text in Natural Images" Preprints. https://doi.org/10.20944/preprints201812.0114.v1
Abstract
Recent state-of-the-art scene text recognition methods have primarily focused on horizontal text in images. However, in several Asian countries, including China, large amounts of text in signs, books, and TV commercials are vertically directed. Because the horizontal and vertical texts exhibit different characteristics, developing an algorithm that can simultaneously recognize both types of text in real environments is necessary. To address this problem, we adopted the direction encoding mask (DEM) and selective attention network (SAN) methods based on supervised learning. DEM contains directional information to compensate in cases that lack text direction; therefore, our network is trained using this information to handle the vertical text. The SAN method is designed to work individually for both types of text. To train the network to recognize both types of text and to evaluate the effectiveness of the designed model, we prepared a new synthetic vertical text dataset and collected an actual vertical text dataset (VTD142) from the Web. Using these datasets, we proved that our proposed model can accurately recognize both vertical and horizontal text and can achieve state-of-the-art results in experiments using benchmark datasets, including the street view test (SVT), IIIT-5k, and ICDAR. Although our model is relatively simple as compared to its predecessors, it maintains the accuracy and is trained in an end-to-end manner.
Keywords
directional encoding mask; selective attention network; supervised learning; horizontal and vertical text recognition
Subject
Computer Science and Mathematics, Artificial Intelligence and Machine Learning
Copyright:
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.