PhysGen: Rigid-Body Physics-Grounded
Image-to-Video Generation

ECCV, 2024
Shaowei Liu · Zhongzheng Ren · Saurabh Gupta* · Shenlong Wang* ·

This repository contains the pytorch implementation for the paper PhysGen: Rigid-Body Physics-Grounded Image-to-Video Generation, ECCV 2024. In this paper, we present a novel training-free image-to-video generation pipeline integrates physical simulation and generative video diffusion prior.

Overview

Installation

Clone this repository:

git clone --recurse-submodules https://github.com/stevenlsw/physgen.git
cd physgen

Install requirements by the following commands:

conda create -n physgen python=3.9
conda activate physgen
pip install -r requirements.txt

Colab Notebook

Run our Colab notebook for quick start!

Quick Demo

Run image space dynamics simulation in just 3 seconds without GPU and any displace device and additional setup required!

export PYTHONPATH=$(pwd)
name="pool"
python simulation/animate.py --data_root data --save_root outputs --config data/${name}/sim.yaml

The output video should be saved in outputs/${name}/composite.mp4. Try set name to be domino, balls, pig_ball and car for other scenes exploration. The example outputs are shown below:

Input Image Simulation Output Video

Perception

Please see perception/README.md for details.

Input	Segmentation	Normal	Albedo	Shading	Inpainting

Simulation

Simulation requires the following input for each image:

image folder/ 
  ├── original.png
  ├── mask.png  # segmentation mask
  ├── inpaint.png # background inpainting
  ├── sim.yaml # simulation configuration file

sim.yaml specify the physical properties of each object and initial conditions (force and speed on each object). Please see data/pig_ball/sim.yaml for an example. Set display to true to visualize the simulation process with display device, set save_snapshot to true to save the simulation snapshots.

Run the simulation by the following command:

cd simulation
python animate.py --data_root ../data --save_root ../outputs --config ../data/${name}/sim.yaml

The outputs are saved in outputs/${name} as follows:

output folder/
  ├── history.pkl # simulation history
  ├── composite.mp4 # composite video
  |── composite.pt # composite video tensor
  ├── mask_video.pt # foreground masked video tensor
  ├── trans_list.pt # objects transformation list tensor

Rendering

Relighting

Relighting requires the following input:

image folder/ # 
  ├── normal.npy # normal map
  ├── shading.npy # shading map by intrinsic decomposition
previous output folder/
  ├── composite.pt # composite video
  ├── mask_video.pt # foreground masked video tensor
  ├── trans_list.pt # objects transformation list tensor

The perception_input is the image folder contains the perception result. The previous_output is the output folder from the previous simulation step.

Run the relighting by the following command:

cd relight
python relight.py --perception_input ../data/${name} --previous_output ../outputs/${name}

The output relight.mp4 and relight.pt is the relighted video and tensor.
Compare between composite video and relighted video:

Input Image Composite Video Relight Video

Video Diffusion Rendering

Download the SEINE model follow instruction

# install git-lfs beforehand
mkdir -p diffusion/SEINE/pretrained
git clone https://huggingface.co/CompVis/stable-diffusion-v1-4 diffusion/SEINE/pretrained/stable-diffusion-v1-4
wget -P diffusion/SEINE/pretrained https://huggingface.co/Vchitect/SEINE/resolve/main/seine.pt

The video diffusion rendering requires the following input:

image folder/ # 
  ├── original.png # input image
  ├── sim.yaml # simulation configuration file (optional)
previous output folder/
  ├── relight.pt # composite video
  ├── mask_video.pt # foreground masked video tensor

Run the video diffusion rendering by the following command:
```
cd diffusion
python video_diffusion.py --perception_input ../data/${name} --previous_output ../outputs/${name} 
```
denoise_strength and prompt could be adjusted in the above script. denoise_strength controls the amount of noise added, 0 means no denoising, 1 means denoise from scratch with lots of variance to the input image. prompt is the input prompt for video diffusion model, we use default foreground object names from perception model as prompt.
The output final_video.mp4 is the rendered video.
Compare between relight video and diffuson rendered video:

Input Image Relight Video Final Video

All-in-One command

We integrate the simulation, relighting and video diffusion rendering in one script. Please follow the Video Diffusion Rendering to download the SEINE model first.

bash scripts/run_demo.sh ${name}

Evaluation

We compare ours against open-sourced img-to-video models DynamiCrafter, I2VGen-XL, SEINE and collected reference videos GT in Sec. 4.3.

Install pytorch-fid:
```
pip install pytorch-fid
```
Download the evaluation data from here for all comparisons and unzip to evaluation directory. Choose ${method name} from DynamiCrafter, I2VGen-XL, SEINE, ours.

Evaluate image FID:

python -m pytorch_fid evaluation/${method name}/all  evaluation/GT/all

Evaluate motion FID:

python -m pytorch_fid evaluation/${method name}/all_flow  evaluation/GT/all_flow

For motion FID, we use RAFT to compute optical flow between neighbor frames. The video processing scripts can be found here.

Custom Image Video Generation

Our method should generally work for side-view and top-down view images. For custom images, please follow the perception, simulation, rendering pipeline to generate the video.
Critical steps (assume proper environment installed)
Input:
```
image folder/ 
  ├── original.png
```

Perception:

cd perception/
python gpt_ram.py --img_path ${image folder}
python run_gsam.py --input ${image folder}
python run_depth_normal.py --input ${image folder} --vis
python run_fg_bg.py --input ${image folder} --vis_edge
python run_inpaint.py --input ${image folder} --dilate_kernel_size 20
python run_albedo_shading.py --input ${image folder} --vis

After perception step, you should get

image folder/ 
  ├── original.png
  ├── mask.png  # foreground segmentation mask
  ├── inpaint.png # background inpainting
  ├── normal.npy # normal map
  ├── shading.npy # shading map by intrinsic decomposition
  ├── edges.json # edges
  ├── physics.yaml # physics properties of foreground objects

Compose ${image folder}/sim.yaml for simulation by specifying the object init conditions (you could check foreground objects ids in ${image folder}/intermediate/fg_mask_vis.png), please see example in data/pig_ball/sim.yaml, copy the content in physics.yaml to sim.yaml and edges information from edges.json.

Run simulation:

cd simulation/
python animate.py --data_root ${image_folder} --save_root ${image_folder} --config ${image_folder}/sim.yaml

Run rendering:

cd relight/   
python relight.py --perception_input ${image_folder} --previous_output ${image_folder}
cd ../diffusion/
python video_diffusion.py --perception_input ${image_folder} --previous_output ${image_folder} --denoise_strength ${denoise_strength}

We put some custom images under custom_data folder. You could play with each image by running the above steps and see different physical simulations.

Balls Shelf Boxes Kitchen Table Toy

Citation

If you find our work useful in your research, please cite:

@inproceedings{liu2024physgen,
  title={PhysGen: Rigid-Body Physics-Grounded Image-to-Video Generation},
  author={Liu, Shaowei and Ren, Zhongzheng and Gupta, Saurabh and Wang, Shenlong},
  booktitle={European Conference on Computer Vision ECCV},
  year={2024}
}

Acknowledgement

Grounded-Segment-Anything for segmentation in perception
GeoWizard for depth and normal estimation in perception
Intrinsic for intrinsic image decomposition in perception
Inpaint-Anything for image inpainting in perception
Pymunk for physics simulation in simulation
SEINE for video diffusion in rendering

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PhysGen: Rigid-Body Physics-Grounded
Image-to-Video Generation

Overview

📄 Table of Contents

Installation

Colab Notebook

Quick Demo

Perception

Simulation

Rendering

Relighting

Video Diffusion Rendering

All-in-One command

Evaluation

Custom Image Video Generation

Citation

Acknowledgement

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
assets		assets
custom_data		custom_data
data		data
diffusion		diffusion
perception		perception
relight		relight
scripts		scripts
simulation		simulation
.gitmodules		.gitmodules
README.md		README.md
requirements.txt		requirements.txt

Input Image	Simulation	Output Video

stevenlsw/physgen

Folders and files

Latest commit

History

Repository files navigation

PhysGen: Rigid-Body Physics-Grounded Image-to-Video Generation

Overview

📄 Table of Contents

Installation

Colab Notebook

Quick Demo

Perception

Simulation

Rendering

Relighting

Video Diffusion Rendering

All-in-One command

Evaluation

Custom Image Video Generation

Citation

Acknowledgement

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

PhysGen: Rigid-Body Physics-Grounded
Image-to-Video Generation

Packages