Robust OCR Pipeline for Automated Digitization of Mother and Child Protection Cards in India DEVESH PANT, IIT Delhi, India DIBYENDU TALUKDER, Gram Vaani, India AADITESHWAR SETH, IIT Delhi & Gram Vaani, India DINESH PANT, Raah Foundation, India ROHIT SINGH, Gram Vaani, India BREJESH DUA, Gram Vaani, India RACHIT PANDEY, Gram Vaani, India SRIRAMA MARUTHI, Gram Vaani, India MIRA JOHRI, Université de Montréal, Canada CHETAN ARORA, IIT Delhi, India The Universal Immunization Programme (UIP) in India has a mandate to fully vaccinate all of India’s 27 million children born annually. The vaccination doses are recorded by frontline health workers on standardized paper-based Mother and Child Protection (MCP) cards, which are manually digitized by data entry operators, resulting in poor data quality, delays, and signiicant time and resources. In our paper, we focus on Optical Character Recognition (OCR) based automated digitization of MCP card images captured through a smartphone application developed by us. By utilizing a standardized template for the MCP cards, which is available a-priori, we register the card images and perform OCR on the extracted region of interest (ROIs). Since the cards with curvature or torn edges had poor ROIs, we built a global-local alignment technique which irst approximates the ROI using global Homography and then reines using a local Homography resulting in improved accuracy. Our pipeline gives a character level accuracy of 98.73% on our dataset, against 75.02% by Google Cloud Vision and 79.26% by Azure OCR. We also describe our ield testing experience, where the digitized MCP card images were used to provide useful features on the smartphone application for health workers to conduct vaccination sessions. CCS Concepts: · Computing methodologies Computer visionApplied computing Health care information systems. Additional Key Words and Phrases: Optical character recognition, homography, handwritten digits, image reinement, template matching Authors’ addresses: Devesh Pant, devesh98.iitd@gmail.com, IIT Delhi, New Delhi, Delhi, India, 110016; Dibyendu Talukder, Gram Vaani, India, dibyendu.t@oniondev.com; Aaditeshwar Seth, IIT Delhi & Gram Vaani, New Delhi, India, aseth@cse.iitd.ac.in; Dinesh Pant, Raah Foundation, India, dineshpant84@gmail.com; Rohit Singh, Gram Vaani, India, rohit.singh@oniondev.com; Brejesh Dua, Gram Vaani, India, brejesh.dua@oniondev.com; Rachit Pandey, Gram Vaani, India, rachit.pandey@oniondev.com; Srirama Maruthi, Gram Vaani, India, srirama. maruthi@oniondev.com; Mira Johri, Université de Montréal, Canada, mira.johri@umontreal.ca; Chetan Arora, IIT Delhi, New Delhi, India, chetan@cse.iitd.ac.in. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for proit or commercial advantage and that copies bear this notice and the full citation on the irst page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior speciic permission and/or a fee. Request permissions from permissions@acm.org. © 2023 Copyright held by the owner/author(s). Publication rights licensed to ACM. 2834-5533/2023/8-ART $15.00 https://doi.org/10.1145/3608114 ACM J. Comput. Sustain. Soc.