Optical Character Recognition Engine to extract Food-items and Prices from Grocery Receipt Images via Templating and Dictionary-Traversal Technique
Keywords:Accurate image to text converter, Receipt parsing using template matching, OCR using receipts template, Text retrieval from receipts images
This paper proposes a mix of some old and few novel techniques to nail down the fundamental problem of Food-Items and Prices recognition and eventual extraction of them from the Grocery Receipts. Considering in our research we didn't find any existing OCR engine that is up to that standard let alone specialized for this specific purpose. Since the target was to create a specialized OCR system, we began with an idea of creating the wrappers around basic OCR system to empower it with context of Grocery Receipt. For this, we've built pre-function and post-function wrappers over existing system called Tesseract-OCR. Our system follows specific work-flow to enhance basic OCR output. First it runs the provided image to image filters to make it most suitable for Section-level extraction. Our system then bifurcates the image into sections (like Price, Item-Names, Quantity are dealt separately from one another) according to given template layouts. Specific portion of images (sections) are then forwarded to Tesseract engine for basic OCR. Then text-extracted is forwarded to a contextual pattern matcher, to make sense of the text-extracted in a contextual manner. After testing system on particular grocery stores receipts, we successfully conclude that our techniques significantly improve on both the accuracy of overall context based text recognition and close-match detection when compared to an unassisted/ vanilla Tesseract OCR. Proposed system will empower Food-Kitchen Assistance Mobile Apps in the market.