Skip to content
This repository has been archived by the owner on Aug 29, 2023. It is now read-only.

Latest commit

 

History

History
450 lines (422 loc) · 24.9 KB

MODEL_ZOO.md

File metadata and controls

450 lines (422 loc) · 24.9 KB

MaskFormer Model Zoo and Baselines

Introduction

This file documents a collection of models reported in our paper. All numbers were obtained on Big Basin servers with 8 NVIDIA V100 GPUs & NVLink (except COCO panoptic segmentation models are trained with 64 NVIDIA V100 GPUs).

How to Read the Tables

  • The "Name" column contains a link to the config file. Running train_net.py --num-gpus 8 with this config file will reproduce the model (except for COCO panoptic segmentation models are trained with 64 NVIDIA V100 GPUs with distributed training).
  • The model id column is provided for ease of reference. To check downloaded file integrity, any model on this page contains its md5 prefix in its file name.
  • Training curves and other statistics can be found in metrics for each model.

Detectron2 ImageNet Pretrained Models

It's common to initialize from backbone models pre-trained on ImageNet classification tasks. The following backbone models are available:

Note: below are available pretrained models in Detectron2 that we do not use in our paper.

Third-party ImageNet Pretrained Models

Our paper also uses ImageNet pretrained models that are not part of Detectron2, please refer to tools to get those pretrained models.

License

All models available for download through this document are licensed under the Creative Commons Attribution-NonCommercial 4.0 International License.

Semantic Segmentation Models

ADE20K Semantic Segmentation

Name Backbone crop
size
lr
sched
train
mem
(MB)
mIoU mIoU
(ms+flip)
model id download
PerPixelBaseline R50 512x512 160k 2451 39.2 40.9 40913338_1 model | metrics
PerPixelBaseline+ R50 512x512 160k 5817 41.9 42.9 40931736_2 model | metrics
MaskFormer R50 512x512 160k 4334 44.5 46.7 40931736_14 model | metrics
MaskFormer R101 512x512 160k 4905 45.5 47.2 40986936_1 model | metrics
MaskFormer R101c 512x512 160k 4968 46.0 48.1 41703904_1 model | metrics
MaskFormer Swin-T 512x512 160k 5292 46.7 48.8 40986951_3 model | metrics
MaskFormer Swin-S 512x512 160k 6330 49.8 51.0 40846700_5 model | metrics
MaskFormer Swin-B 640x640 160k 12928 52.7 53.9 40986951_0 model | metrics
MaskFormer Swin-L 640x640 160k 18144 54.1 55.6 40846700_0 model | metrics

COCO-Stuff-10K Semantic Segmentation

Name Backbone lr
sched
train
mem
(MB)
mIoU mIoU
(ms+flip)
model id download
PerPixelBaseline R50 60k 6898 32.4 34.4 40941321_0 model | metrics
PerPixelBaseline+ R50 60k 18227 34.2 35.8 40941321_3 model | metrics
MaskFormer R50 60k 8618 37.1 38.9 40941321_6 model | metrics
MaskFormer R101 60k 10091 38.1 39.8 40986940_1 model | metrics
MaskFormer R101c 60k 9927 38.0 39.3 41703904_3 model | metrics

ADE20K-Full Semantic Segmentation

Name Backbone lr
sched
train
mem
(MB)
mIoU model id download
PerPixelBaseline R50 200k 8030 12.4 40986914_5 model | metrics
PerPixelBaseline+ R50 200k 26698 13.9 40986914_6 model | metrics
MaskFormer R50 200k 6529 16.0 40986914_1 model | metrics
MaskFormer R101 200k 6894 16.8 40986946_1 model | metrics
MaskFormer R101c 200k 6904 17.4 41703904_6 model | metrics

Cityscapes Semantic Segmentation

Name Backbone lr
sched
train
mem
(MB)
mIoU mIoU
(ms+flip)
model id download
MaskFormer R101 90k 6960 78.5 80.3 41127351_1 model | metrics
MaskFormer R101c 90k 7204 79.7 81.4 41630444_2 model | metrics

Mapillary Vistas Semantic Segmentation

Name Backbone lr
sched
train
mem
(MB)
mIoU mIoU
(ms+flip)
model id download
MaskFormer R50 300k 15761 53.1 55.4 42325118 model | metrics

Panoptic Segmentation Models

COCO Panoptic Segmentation

Name Backbone lr
sched
train
mem
(MB)
PQ model id download
MaskFormer R50 + 6 Enc 554k 22634 46.5 42747488_1 model | metrics
MaskFormer R101 + 6 Enc 554k 27358 47.6 42747488_0 model | metrics
MaskFormer Swin-T 554k 20023 47.7 41143190_0 model | metrics
MaskFormer Swin-S 554k 21620 49.7 41270920 model | metrics
MaskFormer Swin-B 554k 24411 51.8 41260906 model | metrics
MaskFormer Swin-L 554k 23275 52.7 43219274 model | metrics

Note:

  • All COCO panoptic segmentation models are trained with 64 NVIDIA V100 GPUs.
  • For Swin-L model, we set MAX_SIZE_TRAIN=1000 due to memory constraint.

ADE20K Panoptic Segmentation

Name Backbone lr
sched
train
mem
(MB)
PQ model id download
MaskFormer R50 + 6 Enc 720k 15899 34.7 42746872_1 model | metrics
MaskFormer R50 + 6 Enc 720k 16516 35.7 42747444 model | metrics