Multi-modal Content Creation Model Training Infrastructure including the FACT model (AI Choreographer) implementation.

Last update: Dec 30, 2022

Related tags

Deep Learning mint

Overview

AI Choreographer: Music Conditioned 3D Dance Generation with AIST++ [ICCV-2021].

Overview

This package contains the model implementation and training infrastructure of our AI Choreographer.

Get started

Pull the code

git clone https://github.com/liruilong940607/mint --recursive

Note here --recursive is important as it will automatically clone the submodule (orbit) as well.

Install dependencies

conda create -n mint python=3.7
conda activate mint
conda install protobuf numpy
pip install tensorflow absl-py tensorflow-datasets librosa

sudo apt-get install libopenexr-dev
pip install --upgrade OpenEXR
pip install tensorflow-graphics tensorflow-graphics-gpu

git clone https://github.com/arogozhnikov/einops /tmp/einops
cd /tmp/einops/ && pip install . -U

git clone https://github.com/google/aistplusplus_api /tmp/aistplusplus_api
cd /tmp/aistplusplus_api && pip install -r requirements.txt && pip install . -U

Note if you meet environment conflicts about numpy, you can try with pip install numpy==1.20.

Get the data

See the website

Get the checkpoint

Download from google drive here, and put them to the folder ./checkpoints/

Run the code

complie protocols

protoc ./mint/protos/*.proto

preprocess dataset into tfrecord

python tools/preprocessing.py \
    --anno_dir="/mnt/data/aist_plusplus_final/" \
    --audio_dir="/mnt/data/AIST/music/" \
    --split=train
python tools/preprocessing.py \
    --anno_dir="/mnt/data/aist_plusplus_final/" \
    --audio_dir="/mnt/data/AIST/music/" \
    --split=testval

run training

python trainer.py --config_path ./configs/fact_v5_deeper_t10_cm12.config --model_dir ./checkpoints

Note you might want to change the batch_size in the config file if you meet OUT-OF-MEMORY issue.

run testing and evaluation

# caching the generated motions (seed included) to `./outputs`
python evaluator.py --config_path ./configs/fact_v5_deeper_t10_cm12.config --model_dir ./checkpoints
# calculate FIDs
python tools/calculate_scores.py

Citation

@inproceedings{li2021dance,
  title={AI Choreographer: Music Conditioned 3D Dance Generation with AIST++},
  author={Ruilong Li and Shan Yang and David A. Ross and Angjoo Kanazawa},
  booktitle = {The IEEE International Conference on Computer Vision (ICCV)},
  year = {2021}
}

Multi-modal Content Creation Model Training Infrastructure including the FACT model (AI Choreographer) implementation.

Related tags

Overview

AI Choreographer: Music Conditioned 3D Dance Generation with AIST++ [ICCV-2021].

Overview

Get started

Pull the code

Install dependencies

Get the data

Get the checkpoint

Run the code

Citation

Owner

Google Research

Flexible time series feature extraction & processing

PAMI stands for PAttern MIning. It constitutes several pattern mining algorithms to discover interesting patterns in transactional/temporal/spatiotemporal databases

FaceVerse: a Fine-grained and Detail-controllable 3D Face Morphable Model from a Hybrid Dataset (CVPR2022)

Balancing Principle for Unsupervised Domain Adaptation

Second-order Attention Network for Single Image Super-resolution (CVPR-2019)

Python library to receive live stream events like comments and gifts in realtime from TikTok LIVE.

Sample code from the Neural Networks from Scratch book.

Official implementation of "CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud Understanding" (CVPR, 2022)

[CVPR 2021] "Multimodal Motion Prediction with Stacked Transformers": official code implementation and project page.

Toward Multimodal Image-to-Image Translation

Implementation of the Paper: "Parameterized Hypercomplex Graph Neural Networks for Graph Classification" by Tuan Le, Marco Bertolini, Frank Noé and Djork-Arné Clevert

Scripts and outputs related to the paper Prediction of Adverse Biological Effects of Chemicals Using Knowledge Graph Embeddings.

NeuralTalk is a Python+numpy project for learning Multimodal Recurrent Neural Networks that describe images with sentences.

A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning

CVPR 2021 - Official code repository for the paper: On Self-Contact and Human Pose.

Implementation of EMNLP 2017 Paper "Natural Language Does Not Emerge 'Naturally' in Multi-Agent Dialog" using PyTorch and ParlAI

This repository contains the code for: RerrFact model for SciVer shared task

JAXDL: JAX (Flax) Deep Learning Library

This codebase is the official implementation of Test-Time Classifier Adjustment Module for Model-Agnostic Domain Generalization (NeurIPS2021, Spotlight)

Hepsiburada - Hepsiburada Urun Bilgisi Cekme