document image degradation

Last update: Nov 18, 2022

Related tags

Overview

ocrodeg

The ocrodeg package is a small Python library implementing document image degradation for data augmentation for handwriting recognition and OCR applications.

The following illustrates the kinds of degradations available from ocrodeg.

%pylab inline

Populating the interactive namespace from numpy and matplotlib

rc("image", cmap="gray", interpolation="bicubic")
figsize(10, 10)
import scipy.ndimage as ndi
import ocrodeg

image = imread("testdata/W1P0.png")
imshow(image)

<matplotlib.image.AxesImage at 0x7fabcc7ab390>

PAGE ROTATION

This is just for illustration; for large page rotations, you can just use ndimage.

for i, angle in enumerate([0, 90, 180, 270]):
    subplot(2, 2, i+1)
    imshow(ndi.rotate(image, angle))

RANDOM GEOMETRIC TRANSFORMATIONS

random_transform generates random transformation parameters that work reasonably well for document image degradation. You can override the ranges used by each of these parameters by keyword arguments.

ocrodeg.random_transform()

{'angle': -0.016783842893063807,
 'aniso': 0.805280370671964,
 'scale': 0.9709145529604223,
 'translation': (0.014319657859164045, 0.03676897986267606)}

Here are four samples generated by random transforms.

for i in xrange(4):
    subplot(2, 2, i+1)
    imshow(ocrodeg.transform_image(image, **ocrodeg.random_transform()))

You can use transform_image directly with the different parameters to get a feel for the ranges and effects of these parameters.

for i, angle in enumerate([-2, -1, 0, 1]):
    subplot(2, 2, i+1)
    imshow(ocrodeg.transform_image(image, angle=angle*pi/180))

for i, angle in enumerate([-2, -1, 0, 1]):
    subplot(2, 2, i+1)
    imshow(ocrodeg.transform_image(image, angle=angle*pi/180)[1000:1500, 750:1250])

for i, aniso in enumerate([0.5, 1.0, 1.5, 2.0]):
    subplot(2, 2, i+1)
    imshow(ocrodeg.transform_image(image, aniso=aniso))

for i, aniso in enumerate([0.5, 1.0, 1.5, 2.0]):
    subplot(2, 2, i+1)
    imshow(ocrodeg.transform_image(image, aniso=aniso)[1000:1500, 750:1250])

for i, scale in enumerate([0.5, 0.9, 1.0, 2.0]):
    subplot(2, 2, i+1)
    imshow(ocrodeg.transform_image(image, scale=scale))

for i, scale in enumerate([0.5, 0.9, 1.0, 2.0]):
    subplot(2, 2, i+1)
    h, w = image.shape
    imshow(ocrodeg.transform_image(image, scale=scale)[h//2-200:h//2+200, w//3-200:w//3+200])

RANDOM DISTORTIONS

Pages often also have a small degree of warping. This can be modeled by random distortions. Very small and noisy random distortions also model ink spread, while large 1D random distortions model paper curl.

for i, sigma in enumerate([1.0, 2.0, 5.0, 20.0]):
    subplot(2, 2, i+1)
    noise = ocrodeg.bounded_gaussian_noise(image.shape, sigma, 5.0)
    distorted = ocrodeg.distort_with_noise(image, noise)
    h, w = image.shape
    imshow(distorted[h//2-200:h//2+200, w//3-200:w//3+200])

RULED SURFACE DISTORTIONS

for i, mag in enumerate([5.0, 20.0, 100.0, 200.0]):
    subplot(2, 2, i+1)
    noise = ocrodeg.noise_distort1d(image.shape, magnitude=mag)
    distorted = ocrodeg.distort_with_noise(image, noise)
    h, w = image.shape
    imshow(distorted[:1500])

BLUR, THRESHOLDING, NOISE

There are a range of utilities for modeling imaging artifacts: blurring, noise, inkspread.

patch = image[1900:2156, 1000:1256]
imshow(patch)

<matplotlib.image.AxesImage at 0x7fabc88c7e10>

for i, s in enumerate([0, 1, 2, 4]):
    subplot(2, 2, i+1)
    blurred = ndi.gaussian_filter(patch, s)
    imshow(blurred)

for i, s in enumerate([0, 1, 2, 4]):
    subplot(2, 2, i+1)
    blurred = ndi.gaussian_filter(patch, s)
    thresholded = 1.0*(blurred>0.5)
    imshow(thresholded)

reload(ocrodeg)
for i, s in enumerate([0.0, 1.0, 2.0, 4.0]):
    subplot(2, 2, i+1)
    blurred = ocrodeg.binary_blur(patch, s)
    imshow(blurred)

for i, s in enumerate([0.0, 0.1, 0.2, 0.3]):
    subplot(2, 2, i+1)
    blurred = ocrodeg.binary_blur(patch, 2.0, noise=s)
    imshow(blurred)

MULTISCALE NOISE

reload(ocrodeg)
for i in range(4):
    noisy = ocrodeg.make_multiscale_noise_uniform((512, 512))
    subplot(2, 2, i+1); imshow(noisy, vmin=0, vmax=1)

RANDOM BLOBS

for i, s in enumerate([2, 5, 10, 20]):
    subplot(2, 2, i+1)
    imshow(ocrodeg.random_blobs(patch.shape, 3e-4, s))

reload(ocrodeg)
blotched = ocrodeg.random_blotches(patch, 3e-4, 1e-4)
#blotched = minimum(maximum(patch, ocrodeg.random_blobs(patch.shape, 30, 10)), 1-ocrodeg.random_blobs(patch.shape, 15, 8))
subplot(121); imshow(patch); subplot(122); imshow(blotched)

<matplotlib.image.AxesImage at 0x7fabc8a35490>

FIBROUS NOISE

imshow(ocrodeg.make_fibrous_image((256, 256), 700, 300, 0.01))

<matplotlib.image.AxesImage at 0x7fabc8852450>

FOREGROUND / BACKGROUND SELECTION

subplot(121); imshow(patch); subplot(122); imshow(ocrodeg.printlike_multiscale(patch))

<matplotlib.image.AxesImage at 0x7fabc8676d90>

subplot(121); imshow(patch); subplot(122); imshow(ocrodeg.printlike_fibrous(patch))

<matplotlib.image.AxesImage at 0x7fabc8d1b250>

document image degradation

Related tags

Overview

ocrodeg

PAGE ROTATION

RANDOM GEOMETRIC TRANSFORMATIONS

RANDOM DISTORTIONS

RULED SURFACE DISTORTIONS

BLUR, THRESHOLDING, NOISE

MULTISCALE NOISE

RANDOM BLOBS

FIBROUS NOISE

FOREGROUND / BACKGROUND SELECTION

Owner

NVIDIA Research Projects

Camelot: PDF Table Extraction for Humans

Automatically fishes for you while you are afk :)

Code for the paper STN-OCR: A single Neural Network for Text Detection and Text Recognition

Python tool that takes the OCR.space JSON output as input and draws a text overlay on top of the image.

⛓ marc is a small, but flexible Markov chain generator

Single Shot Text Detector with Regional Attention

Face_mosaic - Mosaic blur processing is applied to multiple faces appearing in the video

Introduction to image processing, most used and popular functions of OpenCV

Web interface for browsing arXiv papers

Code for generating synthetic text images as described in "Synthetic Data for Text Localisation in Natural Images", Ankush Gupta, Andrea Vedaldi, Andrew Zisserman, CVPR 2016.

OpenCV-Erlang/Elixir bindings

An interactive interface for using OpenCV's GrabCut algorithm for image segmentation.

Deep learning based page layout analysis

Scan the MRZ code of a passport and extract the firstname, lastname, passport number, nationality, date of birth, expiration date and personal numer.

An easy to use an (hopefully useful) captcha solution for pyTelegramBotAPI

CRAFT-Pyotorch：Character Region Awareness for Text Detection Reimplementation for Pytorch

An advanced 2D image manipulation with features such as edge detection and image segmentation built using OpenCV

Some bits of javascript to transcribe scanned pages using PageXML

Textboxes : Image Text Detection Model : python package (tensorflow)

Open Source Computer Vision Library