Extracting Credit Card Number and Name Information using OpenCV and Pytesseract

Link to Github repo

Introduction

Accurately identifying a credit card’s 16-digit number and card holder name in an automated manner based on that card’s image is a critical processing function for financial institutions and businesses to leverage in building large-scale credit card database repositories that can enable improved fraud detection and transaction capabilities. As such the goal of this project is to build a system capable of processing a credit card image’s 16-digit credit card number and cardholder name using the OpenCV framework with a self-imposed time constraint of 0.5s per image in order to theoretically enable scalable credit card data mapping.

As few large-scale sample credit card image datasets recording both credit card number and card holder name appear to exist open-source, this project code is tested on a 23-image dataset of sample credit card images collected through Google Images and manually annotated for card number and cardholder name. Using this dataset, we are able to demonstrate 48% and 65% recall in our image transformation pipeline correctly identifying credit card number and card holder text respectively across these 23 credit card images. The full dataset used in these explorations can be found in the following Github repo for this project.


Methodology

As I wanted to test both OpenCV’s and Google Pytesseract’s Optical Character Recognition (OCR) capabilities using out-of-the-box functions from both libraries, our approach leverages OpenCV’s template matching function to recognize credit card number digits while cardholder character text is similarly recognized through Google’s Pytesseract library.

i) Digit recognition

Template matching works by creating a template of an image pattern to be recognized, such as the digit “0”, and exhaustively assigning each pixel in the base credit card image a score capturing how closely the immediate pixel neighborhood existing below that pixel matches that template, in much the same way a 2D convolutional neural network would exhaustively apply a fixed-size kernel across a image in order to produce confidence scores for detection proposals.

For visualization purposes we use the starting credit card image displayed in Figure 2 to walk through and visualize our data preprocessing pipeline’s various image transformations, ending with our conclusion showing final prediction results for this sample image.

Figure 2. Sample base credit card image

Our first step in this processing pipeline is to create ten distinct templates of our ten possible digits by applying the below shown read_ocr() and get_digits() functions to our OCR reference digit image as shown in Figure 3. As documented in the below code, these functions work by computing rectangular bounding boxes for the contours of each reference digit which are subsequently stored in a reference digit dictionary mapping digit to image bounding boxes. These bounding boxes will later be used as our base templates in our template matching methodology.

Figure 3. Starting OCR image serving as base template for matching our credit card digits against

In order to find digit regions of interest in our base credit card image to compare against our extracted templates, we next apply a tophat transformation to our base image to make more salient lighter regions (i.e. white or grey credit card numbers) in the image against the card’s darker background (i.e. the darker credit card background) as shown in Figures 4, followed by applying a Sobel operator in order to further emphasize the edges of our lighter regions.

Figure 4. Result of applying tophat transformation to base image

Our final preprocessing step is to apply an automatic Otsu thresholding transformation to convert Sobel’s resulting grayscale image into a full black-and-white image with the binary black-white threshold automatically chosen to minimize the intra-class intensity variance of our pixel distribution. The resulting image is shown in Figure 5.

Figure 5. Result of applying Otsu thresholding transformation to Figure 3 image
import cv2
import imutils
import argparse
import numpy as np
from imutils import contours
    
    
def read_ocr(ocr_path):
    # load the reference OCR-A image from disk, convert it to grayscale,
    # and threshold it, such that the digits appear as *white* on a
    # *black* background
    # and invert it, such that the digits appear as *white* on a *black*
    ref = cv2.imread(ocr_path)
    ref = cv2.cvtColor(ref, cv2.COLOR_BGR2GRAY)
    ref = cv2.threshold(ref, 10, 255, cv2.THRESH_BINARY_INV)[1]
    
    # find contours in the OCR-A image (i.e,. the outlines of the digits)
    # sort them from left to right, and initialize a dictionary to map
    # digit name to the ROI
    refCnts = cv2.findContours(ref.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
    refCnts = imutils.grab_contours(refCnts)
    refCnts = contours.sort_contours(refCnts, method="left-to-right")[0]
    return ref, refCnts

def get_digits(ref, refCnts):
    digits = {}
    # loop over the OCR-A reference contours
    for (i, c) in enumerate(refCnts):
        # Compute the bounding box for the digit, 
        # extract it, and resize it to a fixed size. 
        (x, y, w, h) = cv2.boundingRect(c)
        roi = ref[y:y + h, x:x + w]
        roi = cv2.resize(roi, (57, 88))

        # update the digits dictionary, mapping the digit name to the ROI
        digits[i] = roi
    return digits



ocr_path = 'ocr_a_reference.png'
ref, refCnts = read_ocr(ocr_path)
digits = get_digits(ref, refCnts)

Once this Otsu-binarization applied to our image the next step in this process is to extract the contours of light regions in our image using OpenCV’s findContours() and imutils’ grab_contours() functions. We then computer rectangular bounding boxes encapsulating each contour using OpenCV’s boundingRect() and iteratively check the y-axis values of each to exclude any contour bounding boxes not falling in the 85-145 pixel height range given this height range is empirically observed to include all 16-digit credit in our credit card images out of a standardized maximum 190 pixel credit card image height. Contour bounding boxes meeting this condition are then further filtered in order to exclude any boxes with aspect ratios (width / height) falling outside the [2.5, 4.0] range given each of our four 4-digit credit card groupings making up our 16-digit number is further observed to exist in this aspect ratio range. We finally filter this resulting set to ensure that the bounding rectangles of each contour do not fall outside the [40,55] and [10,20] pixel ranges as these ranges are similarly observed to contain all 4-digit bounding boxes in our dataset. A final view of the four extracted bounding boxes for our image is shown in Figure 6.

Once these 4-digit bounding box regions extracted we then iteratively score how closely each pixel’s pixel neighborhood in our extracted 4-digit sub-image matches our 0-9 digit templates extracted from Figure 3 using the OpenCV’s matchTemplate() function. The maximum score value and location of the corresponding pixel are then compiled using OpenCV’s minMaxLoc() function, allowing us to compute the precise location where a digit match was found in our card image.

        # loop over the digit contours
        for c in digitCnts:
            # compute the bounding box of the individual digit, extract
            # the digit, and resize it to have the same fixed size as
            # the reference OCR-A images
            (x, y, w, h) = cv2.boundingRect(c)
            roi = group[y:y + h, x:x + w]
            roi = cv2.resize(roi, (57, 88))
            # initialize a list of template matching scores
            scores = []
            # loop over the reference digit name and digit ROI
            for (digit, digitROI) in digits.items():
                # apply correlation-based template matching, take the
                # score, and update the scores list
                result = cv2.matchTemplate(roi, digitROI, cv2.TM_CCOEFF)
                (_, score, _, _) = cv2.minMaxLoc(result)
                scores.append(score)

The final result of our digit template matching can be seen below, with each of our sixteen card number digits correctly identified by OpenCV’s template matching functionality:

Figure 7. Final digit recognition output for sample image
ii) Cardholder Character Text recognition

In order to detect cardholder character text in our image we similarly select the portion of each image falling between 150 and 190 pixels and height and 0 and 15 and 200 pixels in width in order to capture the bottom left of the credit card image and apply an identical binary Otsu thresholding transformation to our grayscale image in order to make more salient our white cardholder characters against the darker credit card background. The resulting transformed sub-image is shown below in Figure 8.

Figure 8. Transformed cardholder image

Each selected sub-image is then iteratively passed through the below get_chars() function outputting the detected text in the image using the Google Pytesseract library’s image_to_string() function as shown below. Pytesseract’s underlying OCR function works by passing our image through a pre-trained neural net-based OCR engine and outputs the detected text in the image, and in our case outputs the string ‘CARDHOLDER,’ for this passed image as showin in Figure 9 below.

def get_chars(img, show_image = False, **kwargs):
    if isinstance(img, str):
        img = cv2.imread(img)
    if show_image:
        plt.imshow(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))
        plt.title('Test Image'); plt.show()
    image = Image.fromarray(img)
    text = tess.image_to_string(image, **kwargs)
    print("PyTesseract Detected the following text: ", text)
    return text
Figure 9. Detected cardholder text output

Lastly in order to remove any extraneous punctuation that Pytesseract may erroneously detect in our image as the above this text is passed through the below function returning the first line of an uppercase string with all punctuation removed using Python’s regex module. Our final outputted text of ‘CARDHOLDER’ therefore matches this ground truth and leads to a correct text prediction.

def process_str(string):
    string = string.upper()
    string = re.sub(r'[^\w\s]','',string)
    string = string.splitlines()
    if len(string) == 0:
        return string
    return string[0]

Full Dataset Results

Applying the above pipeline across our full 23-image dataset produces 48% and 65% recall in correctly identifying digits and cardholder text in these images. In the case of text character recognition, as the minimum pixel height parameter used to select our cardholder sub-image was shown to have a relatively significant impact on model performance given non-insignificant variance observed between locations of cardholder names in our images, additional experiments exhaustively searching for optimal value over the [140,160] pixel range further revealed 150 pixels as the optimal minimum height to use in identifying cardholder locations.

Additional areas of investigation to further improve the performance of our credit card character recognition system would be to train our own Optical Character Recognition deep learning model fitted on publicly available datasets such as the MNIST and DDI-100 datasets in the case of digit and text recognition respectively in order to produce models capable of improving on this 50-65% recall performance benchmark.

Thanks for reading!