ROHIT SALUJA

Robust multilingual OCR:
from Ancient Indic Texts to Modern Indian Street Signs

IndicOCR:

Optical Character Recognition (OCR) is the process of converting the document images into an editable electronic format. This has many advantages like data compression, enabling search or edit options in the images/text, and creating the database for other applications like Machine Translation, Speech Recognition, and enhancing dictionaries and language models. OCR in Indian Languages is quite challenging due to richness in inflections.

Using Open Source and Commercial OCR systems, we have observed the Word Error Rates (WER) of around 20-50% on printed documents in four different Indic languages. Moreover, developing a highly accurate OCR system with an accuracy as high as 90% is not useful unless aided by the mechanism to identify errors. So, we started with the problem of developing "OpenOCRCorrect", an end-to-end framework for Error Detection and Corrections in Indic-OCR. Our models outperform state-of-the-art results in “Error Detection in Indic-OCR” for six Indic languages with varied inflections and we have solved the Out of Vocabulary problem for “Error Correction in Indic-OCR” in our ICDAR-2017 conference paper. We further improve the results with the help of sub-word embeddings in our ICDAR-2019 conference paper.

1. ICDAR 2019 Post-OCR Competetion:

a. Our team "CLAM" secured 2nd position in Multilingual PostOCR Competetion at ICDAR'19. Our model achieved highest corrections of 44% in Finnish, which is significantly higher than overall topper (8% in Finnish). Final report and poster available.


2. ICDAR2019:

a. You can read the paperhere


3. ICDAR2017:

a. You can read the paperhere

b. Dataset can be requested via email onrohitsaluja@cse.iitb.ac.in


4. ICDAR-OST 2017: OpenOCRCorrect

a. You can read the paperhere

b. Demo video for our framework ishere

c. Source code for our framework is availablehere

OCR On-the-go:

We work on the problem of recognizing license plates and street signs automatically, particularly in challenging conditions such as chaotic traffic. We leverage state-of-the-art text spotters to generate a large amount of noisy labeled training data. The data is subsequently filtered using a pattern derived from domain knowledge. We augment training and testing data with interpolated boxes and annotations which makes our training and testing robust. We further use synthetic data during training to increase the coverage of the training data. We trained two different models for recognition. Our baseline is a conventional Convolution Neural Network (CNN) as the encoder followed by a Recurrent Neural Network (RNN) decoder. As our first contribution, we bypass the detection phase by augmenting the baseline with an Attention mechanism in the RNN decoder. Next, we build in the capability of training the model end-to-end on scenes containing license plates by incorporating inception based CNN encoder that makes the model robust to multiple scales. We achieve improvements of as large as 3.75% at the sequence level, over the baseline model. We present the first results of using multi-headed attention models on text recognition in images and illustrate the advantages of using multiple-heads over a single head. We observe even more gains as large as 7.18% by incorporating multi-headed attention. We also experiment with multi-headed attention models on French Street Name Signs dataset (FSNS) and a new Indian Street dataset that we release for experiments. We observe that such models with multiple attention masks perform better than the model with single-headed attention on three different datasets with varying complexities. Our models also outperform state-of-the-art results on FSNS dataset and IIIT-ILST Devanagari dataset.

1. ICDAR2019:

a. You can read the paperhere

b. Source Code for the paper is here

c. Dataset can be requested via email onrohitsaluja@cse.iitb.ac.in

d. Demo video for our ALPR model ishere


2. ICDAR-OST 2019: StreetOCRCorrect

a. You can read the paperhere

b. Demo video for our framework ishere

c. Source code for our framework is availablehere