Skip to main navigation Skip to search Skip to main content

Enhancing Convolutional Recurrent Neural Network Model for Improved Text Extraction From Videos and Images

Research output: Contribution to journalArticlepeer-review

Abstract

Text recognition in images and videos is crucial for applications like document automation and accessibility, but challenges like irregular text and low accuracy persist. This study enhances the convolutional recurrent neural network (CRNN) by addressing its limitations in accuracy and context processing. We introduce label smoothing (LS) to improve generalization and a bidirectional cloze network (BCN) to capture text context better, enhancing recognition of complex text patterns. Evaluated on six datasets (IC13, SVT, IIIT5K, IC15, SVTP, CUTE), our CRNN+BCN+LS model outperforms the baseline CRNN and competes with advanced methods, like robust arbitrary text extraction and vision transformer of scene text recognition. The model achieves up to 94.45% accuracy on IC13, a 4.05% improvement over the baseline, demonstrating its potential for practical applications in business and accessibility.

Original languageEnglish
Pages (from-to)83-88
Number of pages6
JournalIT Professional
Volume28
Issue number2
DOIs
StatePublished - 1 Mar 2026

Fingerprint

Dive into the research topics of 'Enhancing Convolutional Recurrent Neural Network Model for Improved Text Extraction From Videos and Images'. Together they form a unique fingerprint.

Cite this