Real-time VIBE: Frame by Frame Inference of VIBE (Video Inference for Human Body Pose and Shape Estimation)

Related tags

Deep LearningRT-VIBE
Overview

Real-time VIBE

Inference VIBE frame-by-frame.

Overview

This is a frame-by-frame inference fork of VIBE at [https://github.com/mkocabas/VIBE].

Usage:

import cv2
from vibe.rt.rt_vibe import RtVibe

rt_vibe = RtVibe()
cap = cv2.VideoCapture('sample_video.mp4')
while cap.isOpened():
    ret, frame = cap.read()
    rt_vibe(frame)  # This will open a cv2 window

SMPL Render takes most of the time, which can be closed with vibe_live.render = False

Getting Started

Installation:

# conda must be installed first
wget https://github.com/zc402/RT-VIBE/releases/download/v1.0.0/RT-VIBE.tar.gz
tar zxf RT-VIBE.tar.gz
cd RT-VIBE
# This will create a new conda env called vibe_env
source scripts/install_conda.sh
pip install .  # Install rt-vibe

Run on sample video:

python rt_demo.py  # (This runs sample_video.mp4)
# or
python rt_demo.py --vid_file=multiperson.mp4

Run on camera:

python rt_demo.py --camera

Try with google colab

This notebook provides video and camera inference example.

(there are some dependency errors during pip install, which is safe to ignore. Remember to restart environment after installing pytorch.)

https://colab.research.google.com/drive/1VKXGTfwIYT-ltbbEjhCpEczGpksb8I7o?usp=sharing

Features

  • Make VIBE an installable package
  • Fix GRU hidden states lost between batches in demo.py
  • Add realtime interface which processes the video stream frame-by-frame
  • Decrease GPU memory usage

Explain

  1. Pip installable.

  • This repo renames "lib" to "vibe" ("lib" is not a feasible package name), corrects corresponding imports, adds __init__.py files. It can be installed with:
pip install git+https://github.com/zc402/RT-VIBE
  1. GRU hidden state lost:

  • The original vibe.py reset GRU memory for each batch, which causes discontinuous predictions.

  • The GRU hidden state is reset at:

# .../models/vibe.py
# class TemporalEncoder
# def forward()
y, _ = self.gru(x)

# The "_" is the final hidden state and should be preserved
# https://pytorch.org/docs/stable/generated/torch.nn.GRU.html
  • This repo preserve GRU hidden state within the lifecycle of the model, instead of one batch.
# Fix:

# __init__()
self.gru_final_hidden = None

# forward()
y, self.gru_final_hidden = self.gru(x, self.gru_final_hidden)
  1. Real-time interface

  • This feature makes VIBE run on webcam.

  • Processing steps of the original VIBE :

    • use ffmpeg to split video into images, save to /tmp
    • process the human tracking for whole video, keep results in memory
    • predict smpl params with VIBE for whole video, 1 person at a time.
    • (optional) render and show (frame by frame)
    • save rendered result
  • Processing steps of realtime interface

    • create VIBE model.
    • read a frame with cv2
    • run tracking for 1 frame
    • predict smpl params for each person, keep the hidden states separately.
    • (optional) render and show
  • Changes

    • Multi-person-tracker is modified to receive image instead of image folder.
    • a dataset wrapper is added to convert single image into a pytorch dataset.
    • a rt_demo.py is added to demonstrate the usage.
    • ImageFolder dataset is modified
    • ImgInference dataset is modified
    • requirements are modified to freeze current tracker version. (Class in my repo inherits the tracker and changes its behavior)
  1. Decrease inference memory usage

  • The default batch_size in demo.py needs ~10GB GPU memory
  • Original demo.py needs large vibe_batch_size to keep GRU hidden states
  • Since the GRU hidden state was fixed now, lowering the memory usage won't harm the accuracy anymore.
  • With the default setting in this repo, inference occupies ~1.3GB memory, which makes it runable on low-end GPU.
  • This will slow down the inference a little. The current setting (batchsize==1) reflect actual realtime processing speed.
# Large batch causes OOM in low-end memory card
tracker_batch_size = 12 -> 1
vibe_batch_size = 450 -> 1

Other fixes

Remove seqlen. The seqlen in demo.py has no usage (GRU sequence length is decided in runtime and equals to batch_size). With the fix in this repo, it is safe to set batch_size to 1.

You might also like...
OpenPose: Real-time multi-person keypoint detection library for body, face, hands, and foot estimation
OpenPose: Real-time multi-person keypoint detection library for body, face, hands, and foot estimation

Build Type Linux MacOS Windows Build Status OpenPose has represented the first real-time multi-person system to jointly detect human body, hand, facia

Repository for the paper
Repository for the paper "PoseAug: A Differentiable Pose Augmentation Framework for 3D Human Pose Estimation", CVPR 2021.

PoseAug: A Differentiable Pose Augmentation Framework for 3D Human Pose Estimation Code repository for the paper: PoseAug: A Differentiable Pose Augme

Face and Pose detector that emits MQTT events when a face or human body is detected and not detected.
Face and Pose detector that emits MQTT events when a face or human body is detected and not detected.

Face Detect MQTT Face or Pose detector that emits MQTT events when a face or human body is detected and not detected. I built this as an alternative t

pytorch implementation of openpose including Hand and Body Pose Estimation.
pytorch implementation of openpose including Hand and Body Pose Estimation.

pytorch-openpose pytorch implementation of openpose including Body and Hand Pose Estimation, and the pytorch model is directly converted from openpose

Monocular 3D pose estimation. OpenVINO. CPU inference or iGPU (OpenCL) inference.
Monocular 3D pose estimation. OpenVINO. CPU inference or iGPU (OpenCL) inference.

human-pose-estimation-3d-python-cpp RealSenseD435 (RGB) 480x640 + CPU Corei9 45 FPS (Depth is not used) 1. Run 1-1. RealSenseD435 (RGB) 480x640 + CPU

A large-scale video dataset for the training and evaluation of 3D human pose estimation models
A large-scale video dataset for the training and evaluation of 3D human pose estimation models

ASPset-510 ASPset-510 (Australian Sports Pose Dataset) is a large-scale video dataset for the training and evaluation of 3D human pose estimation mode

A large-scale video dataset for the training and evaluation of 3D human pose estimation models
A large-scale video dataset for the training and evaluation of 3D human pose estimation models

ASPset-510 (Australian Sports Pose Dataset) is a large-scale video dataset for the training and evaluation of 3D human pose estimation models. It contains 17 different amateur subjects performing 30 sports-related actions each, for a total of 510 action clips.

Expressive Body Capture: 3D Hands, Face, and Body from a Single Image
Expressive Body Capture: 3D Hands, Face, and Body from a Single Image

Expressive Body Capture: 3D Hands, Face, and Body from a Single Image [Project Page] [Paper] [Supp. Mat.] Table of Contents License Description Fittin

Code for
Code for "3D Human Pose and Shape Regression with Pyramidal Mesh Alignment Feedback Loop"

PyMAF This repository contains the code for the following paper: 3D Human Pose and Shape Regression with Pyramidal Mesh Alignment Feedback Loop Hongwe

Releases(v1.0.0)
The implementation of "Shuffle Transformer: Rethinking Spatial Shuffle for Vision Transformer"

Shuffle Transformer The implementation of "Shuffle Transformer: Rethinking Spatial Shuffle for Vision Transformer" Introduction Very recently, window-

87 Nov 29, 2022
CS50's Introduction to Artificial Intelligence Test Scripts

CS50's Introduction to Artificial Intelligence Test Scripts 🤷‍♂️ What's this? 🤷‍♀️ This repository contains Python scripts to automate tests for mos

Jet Kan 2 Dec 28, 2022
HashNeRF-pytorch - Pure PyTorch Implementation of NVIDIA paper on Instant Training of Neural Graphics primitives

HashNeRF-pytorch Instant-NGP recently introduced a Multi-resolution Hash Encodin

Yash Sanjay Bhalgat 616 Jan 06, 2023
TensorFlowOnSpark brings TensorFlow programs to Apache Spark clusters.

TensorFlowOnSpark TensorFlowOnSpark brings scalable deep learning to Apache Hadoop and Apache Spark clusters. By combining salient features from the T

Yahoo 3.8k Jan 04, 2023
Simple tutorials using Google's TensorFlow Framework

TensorFlow-Tutorials Introduction to deep learning based on Google's TensorFlow framework. These tutorials are direct ports of Newmu's Theano Tutorial

Nathan Lintz 6k Jan 06, 2023
Official code repository for Continual Learning In Environments With Polynomial Mixing Times

Official code for Continual Learning In Environments With Polynomial Mixing Times Continual Learning in Environments with Polynomial Mixing Times This

Sharath Raparthy 1 Dec 19, 2021
Code accompanying the paper Say As You Wish: Fine-grained Control of Image Caption Generation with Abstract Scene Graphs (Chen et al., CVPR 2020, Oral).

Say As You Wish: Fine-grained Control of Image Caption Generation with Abstract Scene Graphs This repository contains PyTorch implementation of our pa

Shizhe Chen 178 Dec 29, 2022
BMW TechOffice MUNICH 148 Dec 21, 2022
Applying CLIP to Point Cloud Recognition.

PointCLIP: Point Cloud Understanding by CLIP This repository is an official implementation of the paper 'PointCLIP: Point Cloud Understanding by CLIP'

Renrui Zhang 175 Dec 24, 2022
Released code for Objects are Different: Flexible Monocular 3D Object Detection, CVPR21

MonoFlex Released code for Objects are Different: Flexible Monocular 3D Object Detection, CVPR21. Work in progress. Installation This repo is tested w

Yunpeng 169 Dec 06, 2022
DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.

DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.

Microsoft 8.4k Jan 01, 2023
Source code of AAAI 2022 paper "Towards End-to-End Image Compression and Analysis with Transformers".

Towards End-to-End Image Compression and Analysis with Transformers Source code of our AAAI 2022 paper "Towards End-to-End Image Compression and Analy

37 Dec 21, 2022
Neural Network Libraries

Neural Network Libraries Neural Network Libraries is a deep learning framework that is intended to be used for research, development and production. W

Sony 2.6k Dec 30, 2022
An implementation of Video Frame Interpolation via Adaptive Separable Convolution using PyTorch

This work has now been superseded by: https://github.com/sniklaus/revisiting-sepconv sepconv-slomo This is a reference implementation of Video Frame I

Simon Niklaus 984 Dec 16, 2022
Implementation of NĂśWA, state of the art attention network for text to video synthesis, in Pytorch

NĂśWA - Pytorch (wip) Implementation of NĂśWA, state of the art attention network for text to video synthesis, in Pytorch. This repository will be popul

Phil Wang 463 Dec 28, 2022
UnpNet - Rethinking 3-D LiDAR Point Cloud Segmentation(IEEE TNNLS)

UnpNet Citation Please cite the following paper if you use this repository in your reseach. @article {PMID:34914599, Title = {Rethinking 3-D LiDAR Po

Shijie Li 4 Jul 15, 2022
Read number plates with https://platerecognizer.com/

HASS-plate-recognizer Read vehicle license plates with https://platerecognizer.com/ which offers free processing of 2500 images per month. You will ne

Robin 69 Dec 30, 2022
Implementation of CoCa, Contrastive Captioners are Image-Text Foundation Models, in Pytorch

CoCa - Pytorch Implementation of CoCa, Contrastive Captioners are Image-Text Foundation Models, in Pytorch. They were able to elegantly fit in contras

Phil Wang 565 Dec 30, 2022
SustainBench: Benchmarks for Monitoring the Sustainable Development Goals with Machine Learning

Datasets | Website | Raw Data | OpenReview SustainBench: Benchmarks for Monitoring the Sustainable Development Goals with Machine Learning Christopher

67 Dec 17, 2022
A template repository for submitting a job to the Slurm Cluster installed at the DISI - University of Bologna

Cluster di HPC con GPU per esperimenti di calcolo (draft version 1.0) Per poter utilizzare il cluster il primo passo è abilitare l'account istituziona

20 Dec 16, 2022