Svoboda | Graniru | BBC Russia | Golosameriki | Facebook
skip to main content

A Survey on Breaking Technique of Text-Based CAPTCHA

Authors: Jun Chen, Xiangyang Luo, Yanqing Guo, Yi Zhang, Daofu Gong Academic Editor: Zhenxing QianAuthors Info & Claims
Security and Communication Networks, Volume 2017
https://doi.org/10.1155/2017/6898617
Published: 01 January 2017 Publication History

Abstract

The CAPTCHA has become an important issue in multimedia security. Aimed at a commonly used text-based CAPTCHA, this paper outlines some typical methods and summarizes the technological progress in text-based CAPTCHA breaking. First, the paper presents a comprehensive review of recent developments in the text-based CAPTCHA breaking field. Second, a framework of text-based CAPTCHA breaking technique is proposed. And the framework mainly consists of preprocessing, segmentation, combination, recognition, postprocessing, and other modules. Third, the research progress of the technique involved in each module is introduced, and some typical methods of segmentation and recognition are compared and analyzed. Lastly, the paper discusses some problems worth further research.

References

[1]
L. Von Ahn, M. Blum, and J. Langford, “Telling humans and computers apart automatically,” Communications of the ACM, vol. 47, no. 2, pp. 56–60, 2004.
[2]
K. Chellapilla and P. Y. Simard, “Using Machine Learning to Break Visual Human Interaction Proofs (HIPs),” in Proceedings of the Advances in Neural Information Processing Systems, pp. 265–272, ofAdvances in Neural Information Processing Systems, 2004.
[3]
N. Roshanbin and J. Miller, “A survey and analysis of current CAPTCHA approaches,” Journal of Web Engineering, vol. 12, no. 1-2, pp. 001–040, 2013.
[4]
J. Yan and A. S. E. Ahmad, “A low-cost attack on a microsoft CAPTCHA,” in Proceedings of the 15th ACM conference on Computer and Communications Security, CCS'08, pp. 543–554, USA, October 2008.
[5]
F. Jean-Baptiste and R. Paucher, “The Captchacker Project,” 2009, http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.800.3065&rep=rep1&type=pdf.
[6]
S.-Y. Huang, Y.-K. Lee, G. Bell, and Z.-H. Ou, “An efficient segmentation algorithm for CAPTCHAs with line cluttering and character warping,” Multimedia Tools and Applications, vol. 48, no. 2, pp. 267–289, 2010.
[7]
R. A. Nachar, E. Inaty, P. J. Bonnin, and Y. Alayli, “Breaking down Captcha using edge corners and fuzzy logic segmentation/recognition technique,” Security and Communication Networks, vol. 8, no. 18, pp. 3995–4012, 2015.
[8]
L. von Ahn, M. Blum, N. J. Hopper, and J. Langford, “CAPTCHA: using hard AI problems for security,” in Advances in cryptology---EUROCRYPT 2003, vol. 2656 of Lecture Notes in Computer Science, pp. 294–311, Springer, Berlin, Germany, 2003.
[9]
[10]
[11]
[12]
A. Schlaikjer and A. Dual, “Use Speech CAPTCHA: Aiding Visually Impaired Web Users while Providing Transcriptions of Audio Streams,” LTI-CMU-07-014, Carnegie Mellon University, Pittsburgh, Pa, USA, 2007.
[13]
J. Tam, J. Simsa et al., “Improving Audio CAPTCHAs,” in Proceedings of the Symposium on Usable Privacy and Security, 2008.
[14]
J. Tam, S. Hyde, J. Simsa, and L. Von Ahn, “Breaking audio CAPTCHAs,” in Proceedings of the 22nd Annual Conference on Neural Information Processing Systems, NIPS 2008, pp. 1625–1632, can, December 2008.
[15]
H. S. Baird and K. Popat, “Human Interactive Proofs and Document Image Analysis,” in Proceedings of the International Workshop on Document Analysis Systems, vol. 2423 of Lecture Notes in Computer Science, pp. 507–518, Springer, 2002.
[16]
A. L. Coates, H. S. Baird, and R. J. Fateman, “Pessimal print: A reverse turing test,” in Proceedings of the 6th International Conference on Document Analysis and Recognition, ICDAR 2001, pp. 1154–1158, usa, September 2001.
[17]
M. Chew and H. S. Baird, “Baffletext: A human interactive proof,” in Proceedings of the Document Recognition and Retrieval X, pp. 305–316, USA, January 2003.
[18]
R. Chow, P. Golle, M. Jakobsson, L. Wang, and X. Wang, “Making CAPTCHAs clickable,” in Proceedings of the 9th Workshop on Mobile Computing Systems and Applications, HotMobile 2008, pp. 91–94, USA, February 2008.
[19]
P. Golle, “Machine learning attacks against the asirra CAPTCHA,” in Proceedings of the 15th ACM conference on Computer and Communications Security, CCS'08, pp. 535–542, USA, October 2008.
[20]
G. Mori and J. Malik, “Recognizing objects in adversarial clutter: breaking a visual CAPTCHA,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 134–144, June 2003.
[21]
M. Chew and J. D. Tygar, “Image Recognition CAPTCHAs,” in Proceedings of the 7th International Information Security Conference, vol. 3225 of Lecture Notes in Computer Science, pp. 268–279, Springer.
[22]
K. Chellapilla, K. Larson, P. Simard, and M. Czerwinski, “Designing human friendly human interaction proofs (HIPs),” in Proceedings of the the SIGCHI conference, p. 711, Portland, Oregon, USA, April 2005.
[23]
P. Y. Simard, R. Szeliski, J. Benaloh, J. Couvreur, and I. Calinov, “Using character recognition and segmentation to tell computer from humans,” in Proceedings of the 7th International Conference on Document Analysis and Recognition, ICDAR 2003, pp. 418–423, UK, August 2003.
[24]
K. Chellapilla, K. Larson, P. Y. Simard, and M. Czerwinski, “Building segmentation based human-friendly human interaction proofs (HIPs),” in Proceedings of the Second International Workshop on Human Interactive Proofs, HIP 2005, pp. 1–26, usa, May 2005.
[25]
K. Chellapilla, K. Larson, P. Simard, and M. Czerwinski, “Computers beat humans at single character recognition in reading based human interaction proofs (HIPs),” in Proceedings of the 2nd Conference on Email and Anti-Spam, usa, July 2005.
[26]
J. Elson, J. R. Douceur, J. Howell, and J. Saul, “Asirra: A CAPTCHA that exploits interest-aligned manual image categorization,” in Proceedings of the 14th ACM Conference on Computer and Communications Security, CCS'07, pp. 366–374, USA, November 2007.
[27]
Y. Rui and Z. Liu, “ARTiFACIAL: Automated reverse turing test using FACIAL features,” Multimedia Systems, vol. 9, no. 6, pp. 493–502, 2004.
[28]
K. A. Kluever and R. Zanibbi, “Balancing usability and security in a video CAPTCHA,” in Proceedings of the 5th Symposium On Usable Privacy and Security, SOUPS 2009, USA, July 2009.
[29]
R. Gossweiler, M. Kamvar, and S. Baluja, “What's up CAPTCHA? A CAPTCHA based on image orientation,” in Proceedings of the 18th International World Wide Web Conference, WWW 2009, pp. 841–850, Spain, April 2009.
[30]
I. J. Goodfellow, Y. Bulatov, J. Ibarz et al., “Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks,” 2014, https://www.researchgate.net/publication/259399973_Multi-digit_Number_Recognition_from_Street_View_Imagery_using_Deep_Convolutional_Neural_Networks.
[31]
T.-Y. Chan, “Using a test-to-speech synthesizer to generate a reverse Turing test,” in Proceedings of the 15th IEEE International Conference on Tools with Artificial Intelligence, pp. 226–232, Sacramento, Calif, USA, 2003.
[32]
G. Kochanski, D. Lopresti, and C. Shih, “A reverse turing test using speech,” in Proceedings of the 7th International Conference on Spoken Language Processing, ICSLP 2002, pp. 1357–1360, September 2002.
[33]
[34]
J. Yan and A. S. El Ahmad, “Breaking visual CAPTCHAs with naïve pattern recognition algorithms,” in Proceedings of the 23rd Annual Computer Security Applications Conference, ACSAC 2007, pp. 279–291, December 2007.
[35]
J. Yan and A. S. El Ahmad, “Usability of CAPTCHAs or usability issues in CAPTCHA design,” in Proceedings of the 4th Symposium on Usable Privacy and Security, SOUPS 2008, pp. 44–55, July 2008.
[36]
A. S. El Ahmad, J. Yan, and L. Marshall, “The robustness of a new CAPTCHA,” in Proceedings of the 3rd European Workshop on System Security, EUROSEC'10, pp. 36–41, April 2010.
[37]
B. B. Zhu, J. Yan, Q. Li, C. Yang, J. Liu, N. Xu, M. Yi, and K. Cai, “Attacks and design of image recognition CAPTCHAs,” in Proceedings of the 17th ACM Conference on Computer and Communications Security, CCS'10, pp. 187–200, October 2010.
[38]
A. S. E. Ahmad, J. Yan, and M. Tayara, “The Robustness of Google CAPTCHAs,” Computing Science Technical Report, CS-TR-1278, Newcastle University, 2011.
[39]
A. S. El Ahmad, J. Yan, and W.-Y. Ng, “CAPTCHA design: Color, usability, and security,” IEEE Internet Computing, vol. 16, no. 2, pp. 44–51, 2012.
[40]
A. Algwil, D. Ciresan, B. Liu, and J. Yan, “A security analysis of automated Chinese turing tests,” in Proceedings of the 32nd Annual Computer Security Applications Conference, ACSAC 2016, pp. 520–532, December 2016.
[41]
H. Gao, W. Wang, J. Qi, X. Wang, X. Liu, and J. Yan, “The robustness of hollow CAPTCHAs,” in Proceedings of the ACM SIGSAC Conference on Computer and Communications Security, CCS 2013, pp. 1075–1085, November 2013.
[42]
H. Gao, J. Yan, F. Cao, Z. Zhang, L. Lei, M. Tang, P. Zhang, X. Zhou, X. Wang, and J. Li, “A Simple Generic Attack on Text Captchas,” in Proceedings of the Network and Distributed System Security Symposium, pp. 1–14, San Diego, Calif, USA, 2016.
[43]
http://web.xidian.edu.cn/hchgao/paper.html.
[44]
H. Gao, W. Wang, and Y. Fan, “Divide and conquer: An efficient attack on Yahoo! CAPTCHA,” in Proceedings of the 11th IEEE International Conference on Trust, Security and Privacy in Computing and Communications, TrustCom-2012, pp. 9–16, June 2012.
[45]
F. Dai, H. Gao, and D. Liu, “Breaking CAPTCHAs with second template matching and BP neural network algorithms,” International Journal of Information Processing and Management, vol. 4, no. 3, pp. 126–133, 2013.
[46]
H. Gao, W. Wang, Y. Fan, J. Qi, and X. Liu, “The robustness of "connecting characters together" CAPTCHAs,” Journal of Information Science and Engineering, vol. 30, no. 2, pp. 347–369, 2014.
[47]
H. Gao, X. Wang, F. Cao, Z. Zhang, L. Lei, J. Qi, and X. Liu, “Robustness of text-based completely automated public turing test to tell computers and humans apart,” IET Information Security, vol. 10, no. 1, pp. 45–52, 2016.
[48]
R. Hussain, H. Gao, and R. A. Shaikh, “Segmentation of connected characters in text-based CAPTCHAs for intelligent character recognition,” Multimedia Tools and Applications, pp. 1–15, 2016.
[49]
R. Hussain, H. Gao, R. A. Shaikh, and S. P. Soomro, “Recognition based segmentation of connected characters in text based CAPTCHAs,” in Proceedings of the 8th IEEE International Conference on Communication Software and Networks, ICCSN 2016, pp. 673–676, June 2016.
[50]
[51]
[52]
[53]
[54]
[55]
[56]
[57]
G. Moy, N. Jones, C. Harkless, and R. Potter, “Distortion estimation techniques in solving visual CAPTCHAs,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2004, pp. II23–II28, July 2004.
[58]
A. Bansal, D. Garg, and A. Gupta, “Breaking a Visual CAPTCHA: A Novel Approach using HMM,” 2008, https://pdfs.semanticscholar.org/3c2c/9af1e9a3b7095edaf8f205dfbadc30f917fb.pdf.
[59]
S. Li, S. A. H. Shah, M. A. U. Khan, S. A. Khayam, A.-R. Sadeghi, and R. Schmitz, “Breaking e-banking CAPTCHAs,” in Proceedings of the 26th Annual Computer Security Applications Conference, ACSAC 2010, pp. 171–180, December 2010.
[60]
C. Hong, B. Lopez-Pineda, K. Rajendran, and A. Recasens, “Breaking Microsoft’s CAPTCHA,” 2015, https://courses.csail.mit.edu/6.857/2016/files/hong-lopezpineda-rajendran-recansens.pdf.
[61]
O. Starostenko, C. Cruz-Perez, F. Uceda-Ponga, and V. Alarcon-Aquino, “Breaking text-based CAPTCHAs with variable word and character orientation,” Pattern Recognition, vol. 48, no. 4, pp. 1097–1108, 2015.
[62]
L. Zhang, L. Zhang, S.-G. Huang, and Z.-X. Shi, “A highly reliable CAPTCHA recognition algorithm based on rejection,” Acta Automatica Sinica, vol. 37, no. 7, pp. 891–900, 2011.
[63]
R. Chen, J. Yang, R.-G. Hu, and S.-G. Huang, “A novel LSTM-RNN decoding algorithm in CAPTCHA recognition,” in Proceedings of the 3rd International Conference on Instrumentation and Measurement, Computer, Communication and Control, IMCCC 2013, pp. 766–771, September 2013.
[64]
S. Sano, T. Otsuka, K. Itoyama, and H. G. Okuno, “HMM-based attacks on Google’s ReCAPTCHA with continuous visual and audio symbols,” Journal of Information Processing, vol. 23, no. 6, pp. 814–826, 2015.
[65]
J. Sauvola and M. Pietikäinen, “Adaptive document image binarization,” Pattern Recognition, vol. 33, no. 2, pp. 225–236, 2000.
[66]
N. Otsu, “A threshold selection method from gray-level histograms,” IEEE Transactions on Systems, Man, and Cybernetics, vol. 9, no. 1, pp. 62–66, 1979.
[67]
C. J. Hilditch, “Linear Skeletons from Square Cupboards,” Machine Intelligence, pp. 403–420, 1969.
[68]
T. Y. Zhang and C. Y. Suen, “A fast parallel algorithm for thinning digital patterns,” Communications of the ACM, vol. 27, no. 3, pp. 236–239, 1984.

Cited By

View all
  • (2023)Breaking Captcha System with Minimal Exertion through Deep Learning: Real-time Risk Assessment on Indian Government WebsitesDigital Threats: Research and Practice10.1145/35849744:2(1-24)Online publication date: 10-Aug-2023
  • (2023)An Experimental Investigation of Text-based CAPTCHA Attacks and Their RobustnessACM Computing Surveys10.1145/355975455:9(1-38)Online publication date: 16-Jan-2023
  • (2022)Using Adversarial Defences Against Image Classification CAPTCHAProceedings of the Twelfth ACM Conference on Data and Application Security and Privacy10.1145/3508398.3519367(355-357)Online publication date: 14-Apr-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

Security and Communication Networks  Volume 2017, Issue
2017
1833 pages
ISSN:1939-0114
EISSN:1939-0122
Issue’s Table of Contents
This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Publisher

John Wiley & Sons, Inc.

United States

Publication History

Published: 01 January 2017

Qualifiers

  • Review-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 04 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Breaking Captcha System with Minimal Exertion through Deep Learning: Real-time Risk Assessment on Indian Government WebsitesDigital Threats: Research and Practice10.1145/35849744:2(1-24)Online publication date: 10-Aug-2023
  • (2023)An Experimental Investigation of Text-based CAPTCHA Attacks and Their RobustnessACM Computing Surveys10.1145/355975455:9(1-38)Online publication date: 16-Jan-2023
  • (2022)Using Adversarial Defences Against Image Classification CAPTCHAProceedings of the Twelfth ACM Conference on Data and Application Security and Privacy10.1145/3508398.3519367(355-357)Online publication date: 14-Apr-2022
  • (2022)Counteracting Dark Web Text-Based CAPTCHA with Generative Adversarial Learning for Proactive Cyber Threat IntelligenceACM Transactions on Management Information Systems10.1145/350522613:2(1-21)Online publication date: 10-Mar-2022
  • (2022)Breaking CAPTCHA with Capsule NetworksNeural Networks10.1016/j.neunet.2022.06.041154:C(246-254)Online publication date: 1-Oct-2022
  • (2021)The Role of Visual Features in Text-Based CAPTCHAsComputational Intelligence and Neuroscience10.1155/2021/88424202021Online publication date: 1-Jan-2021
  • (2021)Benchmarks for Designing a Secure Devanagari CAPTCHASN Computer Science10.1007/s42979-020-00445-z2:1Online publication date: 19-Jan-2021
  • (2021)Development of a character CAPTCHA recognition system for the visually impaired community using deep learningMachine Vision and Applications10.1007/s00138-020-01160-832:1Online publication date: 1-Jan-2021
  • (2020)A Generative Adversarial Learning Framework for Breaking Text-Based CAPTCHA in the Dark Web2020 IEEE International Conference on Intelligence and Security Informatics (ISI)10.1109/ISI49825.2020.9280537(1-6)Online publication date: 9-Nov-2020
  • (2019)Breaking Text-Based CAPTCHA with Sparse Convolutional Neural NetworksPattern Recognition and Image Analysis10.1007/978-3-030-31321-0_35(404-415)Online publication date: 1-Jul-2019

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media

View Issue’s Table of Contents