Skip to content

Commit

Permalink
Update README
Browse files Browse the repository at this point in the history
  • Loading branch information
qjadud1994 authored Nov 4, 2024
1 parent 471df37 commit 99aea19
Showing 1 changed file with 77 additions and 9 deletions.
86 changes: 77 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,28 +5,96 @@

<sub>NAVER Cloud, ImageVision</sub><br />

[![Paper](https://img.shields.io/badge/Paper-arxiv)](https://arxiv.org)
[![Paper](https://img.shields.io/badge/Paper-arxiv)]([https://arxiv.org](https://arxiv.org/pdf/2411.00626))
[![Page](https://img.shields.io/badge/Project_page-blue)](https://naver-ai.github.io/ZIM)
[![Demo](https://img.shields.io/badge/Demo-yellow)](https://huggingface.co/spaces/naver-iv/ZIM_Zero-Shot-Image-Matting)
[![Data](https://img.shields.io/badge/Data-gray)](https://huggingface.co/datasets/naver-iv/MicroMat-3K)


![Teaser]([./assets/teaser.png](https://github.com/naver-ai/ZIM/releases/download/asset-v1/teaser.png))
![Model overview]([./assets/method_overview.png](https://github.com/naver-ai/ZIM/releases/download/asset-v1/method_overview.png))

## Introduction

In this paper, we introduce a novel zero-shot image matting model. Recent models like SAM (Segment Anything Model) exhibit strong zero-shot capabilities, but they fall short in generating fine-grained, high-precision masks. To address this limitation, we propose two key contributions: First, we develop a label converter that transforms segmentation labels into detailed matte labels, creating the new SA1B-Matte dataset. This enables the model to generate high-quality, micro-level matte masks without costly manual annotations. Second, we design a zero-shot matting model equipped with a hierarchical pixel decoder and prompt-aware masked attention mechanism, improving both the resolution of mask outputs and the model’s ability to focus on specific regions based on user prompts. We evaluate our model using the newly introduced ZIM test set, which contains high-quality micro-level matte labels. Experimental results show that our model outperforms SAM and other existing methods in precision and zero-shot generalization. Furthermore, we demonstrate the versatility of our approach in downstream tasks, including image inpainting and 3D neural radiance fields (NeRF), where the ability to produce precise matte masks is crucial. Our contributions provide a robust foundation for advancing zero-shot image matting and its applications across a wide range of computer vision tasks.

The recent segmentation foundation model, Segment Anything Model (SAM), exhibits strong zero-shot segmentation capabilities, but it falls short in generating fine-grained precise masks. To address this limitation, we propose a novel zero-shot image matting model, called ZIM, with two key contributions: First, we develop a label converter that transforms segmentation labels into detailed matte labels, constructing the new SA1B-Matte dataset without costly manual annotations. Training SAM with this dataset enables it to generate precise matte masks while maintaining its zero-shot capability. Second, we design the zero-shot matting model equipped with a hierarchical pixel decoder to enhance mask representation, along with a prompt-aware masked attention mechanism to improve performance by enabling the model to focus on regions specified by visual prompts. We evaluate ZIM using the newly introduced MicroMat-3K test set, which contains high-quality micro-level matte labels. Experimental results show that ZIM outperforms existing methods in fine-grained mask generation and zero-shot generalization. Furthermore, we demonstrate the versatility of ZIM in various downstream tasks requiring precise masks, such as image inpainting and 3D NeRF. Our contributions provide a robust foundation for advancing zero-shot matting and its downstream applications across a wide range of computer vision tasks.

## Updates
**Available Soon**
- 2024.11.04: official ZIM code update


## Getting Started

Please refer [here](./INSTALL.md) for installation instructions and dataset preparation.

After installation step is done, you can utilize our model in just a few lines as below. `ZimPredictor` is compatible with `SamPredictor`, such as `set_image()` or `predict()`.
```python
from zim import zim_model_registry, ZimPredictor

backbone = "vit_l"
ckpt_p = "results/zim_vit_l_2092"

model = zim_model_registry[backbone](checkpoint=ckpt_p)
if torch.cuda.is_available():
model.cuda()

predictor = ZimPredictor(model)
predictor.set_image(<image>)
masks, _, _ = predictor.predict(<input_prompts>)
```

We also provide code for generating masks for an entire image and visualization:

```python
from zim import zim_model_registry, ZimAutomaticMaskGenerator
from zim.utils import show_mat_anns

backbone = "vit_l"
ckpt_p = "results/zim_vit_l_2092"

model = zim_model_registry[backbone](checkpoint=ckpt_p)
if torch.cuda.is_available():
model.cuda()

mask_generator = ZimAutomaticMaskGenerator(model)
masks = mask_generator.generate(<image>) # Automatically generated masks
masks_vis = show_mat_anns(<image>, masks) # Visualize masks
```

Additionally, masks can be generated for images from the command line:
```bash
bash script/run_amg.sh
```

Moreover, we provide a Gradio demo code in `demo/gradio_demo.py`. You can run our model demo locally by running:

```bash
bash demo/gradio_demo.py
```


## Evaluation

We provide an evaluation script, which includes a comparison with SAM, in `script/run_eval.sh`. Make sure the dataset structure is prepared according to the [instructions](./INSTALL.md).

First, modify `data_root` in `script/run_eval.sh`
```bash
...
data_root="/path/to/dataset/"
...
```

## Installation
Then, run evaluation script file.
```bash
bash script/run_eval.sh
```

Our implementation is based on [SAM](https://github.com/facebookresearch/segment-anything).
The evaluation result on the MicroMat-3K dataset would be as follows:

Please check the [installation instructions](INSTALL.md)
![Table]([./assets/Table1.png](https://github.com/naver-ai/ZIM/releases/download/asset-v1/Table1.png))

## License

Available Soon
```
ZIM
Copyright (c) 2024-present NAVER Cloud Corp.
CC BY-NC 4.0 (https://creativecommons.org/licenses/by-nc/4.0/)
```

0 comments on commit 99aea19

Please sign in to comment.