MAT: Mask-Aware Transformer for Large Hole Image Inpainting

Last update: Dec 29, 2022

Related tags

Deep Learning MAT

Overview

MAT: Mask-Aware Transformer for Large Hole Image Inpainting (CVPR2022, Oral)

Wenbo Li, Zhe Lin, Kun Zhou, Lu Qi, Yi Wang, Jiaya Jia

[Paper]

News

This is the official implementation of MAT. The training and testing code is released. We also provide our masks for CelebA-HQ-val and Places-val here.

Visualization

We present a transformer-based model (MAT) for large hole inpainting with high fidelity and diversity.

Compared to other methods, the proposed MAT restores more photo-realistic images with fewer artifacts.

Usage

Clone the repository.

git clone https://github.com/fenglinglwb/MAT.git

Install the dependencies.
- Python 3.7
- PyTorch 1.7.1
- Cuda 11.0
- Other packages
```
pip install -r requirements.txt
```

Quick Test

We provide models trained on CelebA-HQ and Places365-Standard at 512x512 resolution. Download models from One Drive and put them into the 'pretrained' directory. The released models are retrained, and hence the visualization results may slightly differ from the paper.
Obtain inpainted results by running
```
python generate_image.py --network model_path --dpath data_path --outdir out_path [--mpath mask_path]
```
where the mask path is optional. If not assigned, random 512x512 masks will be generated. Note that 0 and 1 values in a mask refer to masked and remained pixels.

For example, run
```
python generate_image.py --network pretrained/CelebA-HQ.pkl --dpath test_sets/CelebA-HQ/images --mpath test_sets/CelebA-HQ/masks --outdir samples
```
Note. Our implementation only supports generating an image whose size is a multiple of 512. You need to pad or resize the image to make its size a multiple of 512. Please pad the mask with 0 values.

Train

For example, if you want to train a model on Places, run a bash script with

python train.py \
    --outdir=output_path \
    --gpus=8 \
    --batch=32 \
    --metrics=fid36k5_full \
    --data=training_data_path \
    --data_val=val_data_path \
    --dataloader=datasets.dataset_512.ImageFolderMaskDataset \
    --mirror=True \
    --cond=False \
    --cfg=places512 \
    --aug=noaug \
    --generator=networks.mat.Generator \
    --discriminator=networks.mat.Discriminator \
    --loss=losses.loss.TwoStageLoss \
    --pr=0.1 \
    --pl=False \
    --truncation=0.5 \
    --style_mix=0.5 \
    --ema=10 \
    --lr=0.001

Description of arguments:

outdir: output path for saving logs and models
gpus: number of used gpus
batch: number of images in all gpus
metrics: find more metrics in 'metrics/metric_main.py'
data: training data
data_val: validation data
dataloader: you can define your own dataloader
mirror: use flip augmentation or not
cond: use class info, default: false
cfg: configuration, find more details in 'train.py'
aug: use augmentation of style-gan-ada or not, default: false
generator: you can define your own generator
discriminator: you can define your own discriminator
loss: you can define your own loss
pr: ratio of perceptual loss
pl: use path length regularization or not, default: false
truncation: truncation ratio proposed in stylegan
style_mix: style mixing ratio proposed in stylegan
ema: exponoential moving averate, ~K samples
lr: learning rate

Evaluation

We provide evaluation scrtips for FID/U-IDS/P-IDS/LPIPS/PSNR/SSIM/L1 metrics in the 'evaluation' directory. Only need to give paths of your results and GTs.

Citation

@inproceedings{li2022mat,
    title={MAT: Mask-Aware Transformer for Large Hole Image Inpainting},
    author={Li, Wenbo and Lin, Zhe and Zhou, Kun and Qi, Lu and Wang, Yi and Jia, Jiaya},
    booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
    year={2022}
}

License and Acknowledgement

The code and models in this repo are for research purposes only. Our code is bulit upon StyleGAN2-ADA.

MAT: Mask-Aware Transformer for Large Hole Image Inpainting

Related tags

Overview

MAT: Mask-Aware Transformer for Large Hole Image Inpainting (CVPR2022, Oral)

Wenbo Li, Zhe Lin, Kun Zhou, Lu Qi, Yi Wang, Jiaya Jia

[Paper]

News

Visualization

Usage

Quick Test

Train

Evaluation

Citation

License and Acknowledgement

Owner

[ICCV21] Official implementation of the "Social NCE: Contrastive Learning of Socially-aware Motion Representations" in PyTorch.

A model that attempts to learn and benefit from data collected on card counting.

Implementation of the paper: "SinGAN: Learning a Generative Model from a Single Natural Image"

[ICRA 2022] CaTGrasp: Learning Category-Level Task-Relevant Grasping in Clutter from Simulation

A Dataset of Python Challenges for AI Research

The code for our paper submitted to RAL/IROS 2022: OverlapTransformer: An Efficient and Rotation-Invariant Transformer Network for LiDAR-Based Place Recognition.

Minimalistic PyTorch training loop

Expert Finding in Legal Community Question Answering

Spatially-Adaptive Pixelwise Networks for Fast Image Translation, CVPR 2021

This project generates news headlines using a Long Short-Term Memory (LSTM) neural network.

PyTorch inference for "Progressive Growing of GANs" with CelebA snapshot

Deep deconfounded recommender (Deep-Deconf) for paper "Deep causal reasoning for recommendations"

百度2021年语言与智能技术竞赛机器阅读理解Pytorch版baseline

Companion repository to the paper accepted at the 4th ACM SIGSPATIAL International Workshop on Advances in Resilient and Intelligent Cities

HW3 ― GAN, ACGAN and UDA

ByteTrack with ReID module following the paradigm of FairMOT, tracking strategy is borrowed from FairMOT/JDE.

MetaBalance: High-Performance Neural Networks for Class-Imbalanced Data

A PyTorch implementation of Radio Transformer Networks from the paper "An Introduction to Deep Learning for the Physical Layer".

The code for Bi-Mix: Bidirectional Mixing for Domain Adaptive Nighttime Semantic Segmentation

Code for our paper "SimCLS: A Simple Framework for Contrastive Learning of Abstractive Summarization", ACL 2021