Robust OCR Pipeline for Automated Digitization of Mother and Child
Protection Cards in India
DEVESH PANT, IIT Delhi, India
DIBYENDU TALUKDER, Gram Vaani, India
AADITESHWAR SETH, IIT Delhi & Gram Vaani, India
DINESH PANT, Raah Foundation, India
ROHIT SINGH, Gram Vaani, India
BREJESH DUA, Gram Vaani, India
RACHIT PANDEY, Gram Vaani, India
SRIRAMA MARUTHI, Gram Vaani, India
MIRA JOHRI, Université de Montréal, Canada
CHETAN ARORA, IIT Delhi, India
The Universal Immunization Programme (UIP) in India has a mandate to fully vaccinate all of India’s 27 million children born
annually. The vaccination doses are recorded by frontline health workers on standardized paper-based Mother and Child
Protection (MCP) cards, which are manually digitized by data entry operators, resulting in poor data quality, delays, and
signiicant time and resources. In our paper, we focus on Optical Character Recognition (OCR) based automated digitization
of MCP card images captured through a smartphone application developed by us. By utilizing a standardized template for
the MCP cards, which is available a-priori, we register the card images and perform OCR on the extracted region of interest
(ROIs). Since the cards with curvature or torn edges had poor ROIs, we built a global-local alignment technique which irst
approximates the ROI using global Homography and then reines using a local Homography resulting in improved accuracy.
Our pipeline gives a character level accuracy of 98.73% on our dataset, against 75.02% by Google Cloud Vision and 79.26% by
Azure OCR. We also describe our ield testing experience, where the digitized MCP card images were used to provide useful
features on the smartphone application for health workers to conduct vaccination sessions.
CCS Concepts: · Computing methodologies → Computer vision;· Applied computing → Health care information
systems.
Additional Key Words and Phrases: Optical character recognition, homography, handwritten digits, image reinement, template
matching
Authors’ addresses: Devesh Pant, devesh98.iitd@gmail.com, IIT Delhi, New Delhi, Delhi, India, 110016; Dibyendu Talukder, Gram Vaani,
India, dibyendu.t@oniondev.com; Aaditeshwar Seth, IIT Delhi & Gram Vaani, New Delhi, India, aseth@cse.iitd.ac.in; Dinesh Pant, Raah
Foundation, India, dineshpant84@gmail.com; Rohit Singh, Gram Vaani, India, rohit.singh@oniondev.com; Brejesh Dua, Gram Vaani, India,
brejesh.dua@oniondev.com; Rachit Pandey, Gram Vaani, India, rachit.pandey@oniondev.com; Srirama Maruthi, Gram Vaani, India, srirama.
maruthi@oniondev.com; Mira Johri, Université de Montréal, Canada, mira.johri@umontreal.ca; Chetan Arora, IIT Delhi, New Delhi, India,
chetan@cse.iitd.ac.in.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that
copies are not made or distributed for proit or commercial advantage and that copies bear this notice and the full citation on the irst page.
Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy
otherwise, or republish, to post on servers or to redistribute to lists, requires prior speciic permission and/or a fee. Request permissions from
permissions@acm.org.
© 2023 Copyright held by the owner/author(s). Publication rights licensed to ACM.
2834-5533/2023/8-ART $15.00
https://doi.org/10.1145/3608114
ACM J. Comput. Sustain. Soc.