Complete portable pipeline for masking of Aadhaar Number adhering to Govt. Privacy Guidelines.

Overview

Aadhaar Number Masking Pipeline

Implementation of a complete pipeline that masks the Aadhaar Number in given images to adhere to Govt. of India's Privacy Guidelines for storage of Aadhaar Card images in digital repository. The following project was carried out as an internship for Muthoot Finance. We make use of the open source packages CRAFT text detector | Paper | Pretrained Model | Github Repo provided by Clova AI Research for OSD and combine a heurestic model with pytesseract OCR for masking.

Rohit Ranjan, Ram Sundaram.

Sample Results

teaser

Versions

The search for the best masking pipeline led us to experiment with several different approaches. We have documented our experiments in other branches.

Branch(->model) Speed/Performance Pipeline
main Best performing CRAFT + pytesseract + dimensional heuristics
CNN_OCR->cnn_model Fastest masking CRAFT + LeNet trained by us
CNN_OCR->cnn_model_2 Fastest masking CRAFT + LeNet trained by us
UNET_OCDR Theoretically Fastest but trained model unavailable** UNet

**We proposed and implemented a pipeline which uses a single UNet model for achieving a desirable mask. A single model would have made the inference very fast and real time use capable on mobile devices. Training meant creating a dataset since the company could not legally provide us the needed data. After several trials, we halted work on this model because with barely 150 unique datapoints available, a data hungry UNet Model is simply unsatiable for now.

Datasets

Lenet was trained on our self-created labelled dataset | labels.

Getting started

Install dependencies

Requirements

  • torch
  • opencv-python
  • tesseract-ocr
  • check requirements.txt
pip install -r requirements.txt

Test instructions

  • Clone this repository
git clone https://github.com/thefurorjuror/Aadhaar_Masker.git
  • Run on an image folder
python [folder path to the cloned repo]/masker.py --test_folder=[folder path to test images] --output_folder=[folder path to output images] --cuda=[True/False]
#Example- When one is inside the cloned repo
python masker.py --test_folder=./images/ --output_folder=./output/ --cuda=True

cuda is set to False by default.

Citation

@inproceedings{baek2019character,
  title={Character Region Awareness for Text Detection},
  author={Baek, Youngmin and Lee, Bado and Han, Dongyoon and Yun, Sangdoo and Lee, Hwalsuk},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
  pages={9365--9374},
  year={2019}
}

License

Copyright (c) 2021-present Rohit Ranjan & Ram Sundaram.

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.
A Pythonic wrapper for the Wikipedia API

Wikipedia Wikipedia is a Python library that makes it easy to access and parse data from Wikipedia. Search Wikipedia, get article summaries, get data

Jonathan Goldsmith 2.5k Dec 28, 2022
ShadowMusic - A Telegram Music Bot with proper functions written in Python with Pyrogram and Py-Tgcalls.

⭐️ Shadow Music ⭐️ A Telegram Music Bot written in Python using Pyrogram and Py-Tgcalls Ready to use method A Support Group, Updates Channel and ready

TeamShadow 8 Aug 17, 2022
Palo Alto Networks PAN-OS SDK for Python

Palo Alto Networks PAN-OS SDK for Python The PAN-OS SDK for Python (pan-os-python) is a package to help interact with Palo Alto Networks devices (incl

Palo Alto Networks 281 Dec 09, 2022
A PowerFull Telegram Mirror Bot.......

- [ DEAD REPO AND NO MORE UPDATE ] Slam Mirror Bot Slam Mirror Bot is a multipurpose Telegram Bot written in Python for mirroring files on the Interne

αвιנтн 2 Nov 09, 2021
A httpx token generator for discord [ hcaptcha bypass ]

Discord-Token-Generator-Yazato A httpx token generator for discord This generator was developed by Aced#0001, Dreamy Tos Follower#0001, Scripted#0131

23 Oct 26, 2021
Ethone-Selfbot - Open Source Discord Self-Bot, written in discord.py

Ethone SB Table of contents Newest open-source Discord SelfBot with useful commands and easy documentation on how to add your own and change the exist

Ethone 3 Jan 08, 2022
A discord token nuker With loads of options that will screw an account up real bad, also has inbuilt massreport, GroupChat Spammer and Token/Password/Creditcard grabber and so much more!

Installation | Important | Changelogs | Discord NOTE: Hazard is not finished! You can expect bugs, crashes, and non-working functions. Please make an

Rdimo 470 Aug 09, 2022
A simple Telegram bot that can add caption to any media on your channel

Channel Auto Caption This bot can add a caption for any media/document sent to a channel. Just deploy bot and add bot as admin to a channel. Deploy to

22 Nov 14, 2022
Código que Utiliza Programação Dinâmica para resolver o problema da Moeda

Programação Dinâmica: Modelo baseado em recursão Utiliza a técnica de Memorização Não pode ser aplicada quando existe dependência entre as respostas G

Hemili Beatriz 1 Jan 08, 2022
A zero-dependency Python library for getting the Kubernetes token of a AWS EKS cluster

tokeks A zero-dependency Python library for getting the Kubernetes token of a AWS EKS cluster. No AWS CLI, third-party client or library (boto3, botoc

Chris Karageorgiou Kaneen 6 Nov 04, 2022
Create Fast and easy image datasets using reddit

Reddit-Image-Scraper Reddit Reddit is an American Social news aggregation, web content rating, and discussion website. Reddit has been devided by topi

Wasin Silakong 4 Apr 27, 2022
A simpler way to make forms, surveys, and reaction input using discord.py.

discord-ext-forms An easier way to make forms and surveys in discord.py. This module is a very simple way to ask questions and create complete forms i

thrizzle 16 Nov 06, 2022
Gdrive-python: A wrapping module in python of gdrive

gdrive-python gdrive-python is a wrapping module in python of gdrive made by @pr

Vittorio Pippi 3 Feb 19, 2022
Simple script to extract useful informations from the combo BloodHound + Neo4j

bloodhound-quickwin Simple script to extract useful informations from the combo BloodHound + Neo4j. Can help to choose a target. Prerequisites python3

140 Dec 21, 2022
Mandatory join to channel using pyTelegramBotAPI

Running set your bot token to config.py set channel username to config.py set channel url to config.py $ python join.py Attention Bot must be administ

Abdulatif 6 Oct 08, 2022
Personal Discord Python Bot based on Discord.py

Personal Discord bot using the discord.py library by Rapptz

2 Dec 14, 2022
A surviv.io bot that helps you manage you clan in surviv.io!

Scooter-Surviv.io-Clan-Bot A Surviv.io Discord Bot This is a bot that helps manage your surviv.io clan! Read below for more!!. Features Lets you creat

cosmic|duck 1 Jan 03, 2022
Youtube Music Playlist Organizer

Youtube Music Playlist Organizer, a simple Python application that uses ytmusicapi to help user edit their playlists and organize in other playlists.

Bedir Tapkan 1 Oct 24, 2021
A simple Telegram bot which handles images in whole different way

zeroimagebot thezeroimagebot 🌟 I Can Edit Dimension Of An image which is required by @stickers 🌟 I Can Extract Text From An Image 🌟 !!! New Updates

RAVEEN KUMAR 4 Jul 01, 2021
A simple script that will watch a stream for you and earn the channel points.

Credits Main idea: https://github.com/gottagofaster236/Twitch-Channel-Points-Miner Bet system (Selenium): https://github.com/ClementRoyer/TwitchAutoCo

Alessandro Maggio 1.1k Jan 08, 2023