The official code for “DocTr: Document Image Transformer for Geometric Unwarping and Illumination Correction”, ACM MM, Oral Paper, 2021.

Last update: Dec 26, 2022

Related tags

Text Data & NLP DocTr

Overview

Good news! Our new work exhibits state-of-the-art performances on DocUNet benchmark dataset: DocScanner: Robust Document Image Rectification with Progressive Learning

DocTr

DocTr: Document Image Transformer for Geometric Unwarping and Illumination Correction
ACM MM 2021 Oral

Any questions or discussions are welcomed!

Training

For geometric unwarping, we train the GeoTr network using the Doc3d dataset.
For illumination correction, we train the IllTr network based on the DRIC dataset.

Inference

Download the pretrained models here and put them to $ROOT/model_pretrained/.
Geometric unwarping:
```
python inference.py
```
Geometric unwarping and illumination rectification:
```
python inference.py --ill_rec True
```

Evaluation

We use the same evaluation code as DocUNet benchmark dataset based on Matlab 2019a.
Please compare the scores according to your Matlab version.
Use the images available here for reproducing the quantitative performance reported in the paper and further comparison.

Citation

If you find this code useful for your research, please use the following BibTeX entry.

@inproceedings{feng2021doctr,
  title={DocTr: Document Image Transformer for Geometric Unwarping and Illumination Correction},
  author={Feng, Hao and Wang, Yuechen and Zhou, Wengang and Deng, Jiajun and Li, Houqiang},
  booktitle={Proceedings of the 29th ACM International Conference on Multimedia},
  pages={273--281},
  year={2021}
}

@article{feng2021docscanner,
  title={DocScanner: Robust Document Image Rectification with Progressive Learning},
  author={Feng, Hao and Zhou, Wengang and Deng, Jiajun and Tian, Qi and Li, Houqiang},
  journal={arXiv preprint arXiv:2110.14968},
  year={2021}
}

The official code for “DocTr: Document Image Transformer for Geometric Unwarping and Illumination Correction”, ACM MM, Oral Paper, 2021.

Related tags

Overview

DocTr

Training

Inference

Evaluation

Citation

Owner

Hao Feng

Data and evaluation code for the paper WikiNEuRal: Combined Neural and Knowledge-based Silver Data Creation for Multilingual NER (EMNLP 2021).

Search for documents in a domain through Google. The objective is to extract metadata

Beautiful visualizations of how language differs among document types.

Subtitle Workshop (subshop): tools to download and synchronize subtitles

超轻量级bert的pytorch版本，大量中文注释，容易修改结构，持续更新

HuggingTweets - Train a model to generate tweets

Prompt-learning is the latest paradigm to adapt pre-trained language models (PLMs) to downstream NLP tasks

Simplified diarization pipeline using some pretrained models - audio file to diarized segments in a few lines of code

txtai: Build AI-powered semantic search applications in Go

skweak: A software toolkit for weak supervision applied to NLP tasks

Python port of Google's libphonenumber

An end to end ASR Transformer model training repo

ACL22 paper: Imputing Out-of-Vocabulary Embeddings with LOVE Makes Language Models Robust with Little Cost

Skipgram Negative Sampling in PyTorch

This is the 25 + 1 year anniversary version of the 1995 Rachford-Rice contest

Plugin repository for Macast

🏖 Easy training and deployment of seq2seq models.

SimCSE: Simple Contrastive Learning of Sentence Embeddings

A retro text-to-speech bot for Discord

Common Voice Dataset explorer