Access code for baidu
is swin
.
name | pretrain | resolution | acc@1 | acc@5 | #params | FLOPs | FPS | 22K model | 1K model |
---|---|---|---|---|---|---|---|---|---|
Swin-T | ImageNet-1K | 224x224 | 81.2 | 95.5 | 28M | 4.5G | 755 | - | github/baidu/config/log |
Swin-S | ImageNet-1K | 224x224 | 83.2 | 96.2 | 50M | 8.7G | 437 | - | github/baidu/config/log |
Swin-B | ImageNet-1K | 224x224 | 83.5 | 96.5 | 88M | 15.4G | 278 | - | github/baidu/config/log |
Swin-B | ImageNet-1K | 384x384 | 84.5 | 97.0 | 88M | 47.1G | 85 | - | github/baidu/config |
Swin-T | ImageNet-22K | 224x224 | 80.9 | 96.0 | 28M | 4.5G | 755 | github/baidu/config | github/baidu/config |
Swin-S | ImageNet-22K | 224x224 | 83.2 | 97.0 | 50M | 8.7G | 437 | github/baidu/config | github/baidu/config |
Swin-B | ImageNet-22K | 224x224 | 85.2 | 97.5 | 88M | 15.4G | 278 | github/baidu/config | github/baidu/config |
Swin-B | ImageNet-22K | 384x384 | 86.4 | 98.0 | 88M | 47.1G | 85 | github/baidu | github/baidu/config |
Swin-L | ImageNet-22K | 224x224 | 86.3 | 97.9 | 197M | 34.5G | 141 | github/baidu/config | github/baidu/config |
Swin-L | ImageNet-22K | 384x384 | 87.3 | 98.2 | 197M | 103.9G | 42 | github/baidu | github/baidu/config |
name | pretrain | resolution | window | acc@1 | acc@5 | #params | FLOPs | FPS | 22K model | 1K model |
---|---|---|---|---|---|---|---|---|---|---|
SwinV2-T | ImageNet-1K | 256x256 | 8x8 | 81.8 | 95.9 | 28M | 5.9G | 572 | - | github/baidu/config |
SwinV2-S | ImageNet-1K | 256x256 | 8x8 | 83.7 | 96.6 | 50M | 11.5G | 327 | - | github/baidu/config |
SwinV2-B | ImageNet-1K | 256x256 | 8x8 | 84.2 | 96.9 | 88M | 20.3G | 217 | - | github/baidu/config |
SwinV2-T | ImageNet-1K | 256x256 | 16x16 | 82.8 | 96.2 | 28M | 6.6G | 437 | - | github/baidu/config |
SwinV2-S | ImageNet-1K | 256x256 | 16x16 | 84.1 | 96.8 | 50M | 12.6G | 257 | - | github/baidu/config |
SwinV2-B | ImageNet-1K | 256x256 | 16x16 | 84.6 | 97.0 | 88M | 21.8G | 174 | - | github/baidu/config |
SwinV2-B* | ImageNet-22K | 256x256 | 16x16 | 86.2 | 97.9 | 88M | 21.8G | 174 | github/baidu/config | github/baidu/config |
SwinV2-B* | ImageNet-22K | 384x384 | 24x24 | 87.1 | 98.2 | 88M | 54.7G | 57 | github/baidu/config | github/baidu/config |
SwinV2-L* | ImageNet-22K | 256x256 | 16x16 | 86.9 | 98.0 | 197M | 47.5G | 95 | github/baidu/config | github/baidu/config |
SwinV2-L* | ImageNet-22K | 384x384 | 24x24 | 87.6 | 98.3 | 197M | 115.4G | 33 | github/baidu/config | github/baidu/config |
Note:
- SwinV2-B* (SwinV2-L*) with input resolution of 256x256 and 384x384 both fine-tuned from the same pre-training model using a smaller input resolution of 192x192.
- SwinV2-B* (384x384) achieves 78.08 acc@1 on ImageNet-1K-V2 while SwinV2-L* (384x384) achieves 78.31.
name | pretrain | resolution | acc@1 | acc@5 | #params | FLOPs | FPS | 1K model |
---|---|---|---|---|---|---|---|---|
Mixer-B/16 | ImageNet-1K | 224x224 | 76.4 | - | 59M | 12.7G | - | official repo |
ResMLP-S24 | ImageNet-1K | 224x224 | 79.4 | - | 30M | 6.0G | 715 | timm |
ResMLP-B24 | ImageNet-1K | 224x224 | 81.0 | - | 116M | 23.0G | 231 | timm |
Swin-T/C24 | ImageNet-1K | 256x256 | 81.6 | 95.7 | 28M | 5.9G | 563 | github/baidu/config |
SwinMLP-T/C24 | ImageNet-1K | 256x256 | 79.4 | 94.6 | 20M | 4.0G | 807 | github/baidu/config |
SwinMLP-T/C12 | ImageNet-1K | 256x256 | 79.6 | 94.7 | 21M | 4.0G | 792 | github/baidu/config |
SwinMLP-T/C6 | ImageNet-1K | 256x256 | 79.7 | 94.9 | 23M | 4.0G | 766 | github/baidu/config |
SwinMLP-B | ImageNet-1K | 224x224 | 81.3 | 95.3 | 61M | 10.4G | 409 | github/baidu/config |
Note: C24 means each head has 24 channels.
name | #experts | k | router | resolution | window | IN-22K acc@1 | IN-1K/ft acc@1 | IN-1K/5-shot acc@1 | 22K model |
---|---|---|---|---|---|---|---|---|---|
Swin-MoE-S | 1 (dense) | - | - | 192x192 | 8x8 | 35.5 | 83.5 | 70.3 | github/baidu/config |
Swin-MoE-S | 8 | 1 | Linear | 192x192 | 8x8 | 36.8 | 84.5 | 75.2 | github/baidu/config |
Swin-MoE-S | 16 | 1 | Linear | 192x192 | 8x8 | 37.6 | 84.9 | 76.5 | github/baidu/config |
Swin-MoE-S | 32 | 1 | Linear | 192x192 | 8x8 | 37.4 | 84.7 | 75.9 | github/baidu/config |
Swin-MoE-S | 32 | 1 | Cosine | 192x192 | 8x8 | 37.2 | 84.3 | 75.2 | github/baidu/config |
Swin-MoE-S | 64 | 1 | Linear | 192x192 | 8x8 | 37.8 | 84.7 | 75.7 | - |
Swin-MoE-S | 128 | 1 | Linear | 192x192 | 8x8 | 37.4 | 84.5 | 75.4 | - |
Swin-MoE-B | 1 (dense) | - | - | 192x192 | 8x8 | 37.3 | 85.1 | 75.9 | config |
Swin-MoE-B | 8 | 1 | Linear | 192x192 | 8x8 | 38.1 | 85.3 | 77.2 | config |
Swin-MoE-B | 16 | 1 | Linear | 192x192 | 8x8 | 38.7 | 85.5 | 78.2 | config |
Swin-MoE-B | 32 | 1 | Linear | 192x192 | 8x8 | 38.6 | 85.5 | 77.9 | config |
Swin-MoE-B | 32 | 1 | Cosine | 192x192 | 8x8 | 38.5 | 85.3 | 77.3 | config |
Swin-MoE-B | 32 | 2 | Linear | 192x192 | 8x8 | 38.6 | 85.5 | 78.7 | - |
Please note that all SimMIM pretrained Swin-V2 models will be stored in the Huggingface repository starting July 2024. For more details, refer to the huggingface repository.
- Model size only includes the backbone weights and excludes weights in the decoders/classification heads.
- Batch size for all models is set to 2048.
- Validation loss is calculated on the ImageNet-1K validation set.
- Fine-tuned acc@1 refers to the top-1 accuracy on the ImageNet-1K validation set after fine-tuning.
name | model size | pre-train dataset | pre-train iterations | validation loss | fine-tuned acc@1 | pre-trained model | fine-tuned model |
---|---|---|---|---|---|---|---|
SwinV2-Small | 49M | ImageNet-1K 10% | 125k | 0.4820 | 82.69 | huggingface | huggingface |
SwinV2-Small | 49M | ImageNet-1K 10% | 250k | 0.4961 | 83.11 | huggingface | huggingface |
SwinV2-Small | 49M | ImageNet-1K 10% | 500k | 0.5115 | 83.17 | huggingface | huggingface |
SwinV2-Small | 49M | ImageNet-1K 20% | 125k | 0.4751 | 83.05 | huggingface | huggingface |
SwinV2-Small | 49M | ImageNet-1K 20% | 250k | 0.4722 | 83.56 | huggingface | huggingface |
SwinV2-Small | 49M | ImageNet-1K 20% | 500k | 0.4734 | 83.75 | huggingface | huggingface |
SwinV2-Small | 49M | ImageNet-1K 50% | 125k | 0.4732 | 83.04 | huggingface | huggingface |
SwinV2-Small | 49M | ImageNet-1K 50% | 250k | 0.4681 | 83.67 | huggingface | huggingface |
SwinV2-Small | 49M | ImageNet-1K 50% | 500k | 0.4646 | 83.96 | huggingface | huggingface |
SwinV2-Small | 49M | ImageNet-1K | 125k | 0.4728 | 82.92 | huggingface | huggingface |
SwinV2-Small | 49M | ImageNet-1K | 250k | 0.4674 | 83.66 | huggingface | huggingface |
SwinV2-Small | 49M | ImageNet-1K | 500k | 0.4641 | 84.08 | huggingface | huggingface |
SwinV2-Base | 87M | ImageNet-1K 10% | 125k | 0.4822 | 83.33 | huggingface | huggingface |
SwinV2-Base | 87M | ImageNet-1K 10% | 250k | 0.4997 | 83.60 | huggingface | huggingface |
SwinV2-Base | 87M | ImageNet-1K 10% | 500k | 0.5112 | 83.41 | huggingface | huggingface |
SwinV2-Base | 87M | ImageNet-1K 20% | 125k | 0.4703 | 83.86 | huggingface | huggingface |
SwinV2-Base | 87M | ImageNet-1K 20% | 250k | 0.4679 | 84.37 | huggingface | huggingface |
SwinV2-Base | 87M | ImageNet-1K 20% | 500k | 0.4711 | 84.61 | huggingface | huggingface |
SwinV2-Base | 87M | ImageNet-1K 50% | 125k | 0.4683 | 84.04 | huggingface | huggingface |
SwinV2-Base | 87M | ImageNet-1K 50% | 250k | 0.4633 | 84.57 | huggingface | huggingface |
SwinV2-Base | 87M | ImageNet-1K 50% | 500k | 0.4598 | 84.95 | huggingface | huggingface |
SwinV2-Base | 87M | ImageNet-1K | 125k | 0.4680 | 84.13 | huggingface | huggingface |
SwinV2-Base | 87M | ImageNet-1K | 250k | 0.4626 | 84.65 | huggingface | huggingface |
SwinV2-Base | 87M | ImageNet-1K | 500k | 0.4588 | 85.04 | huggingface | huggingface |
SwinV2-Base | 87M | ImageNet-22K | 125k | 0.4695 | 84.11 | huggingface | huggingface |
SwinV2-Base | 87M | ImageNet-22K | 250k | 0.4649 | 84.57 | huggingface | huggingface |
SwinV2-Base | 87M | ImageNet-22K | 500k | 0.4614 | 85.11 | huggingface | huggingface |
SwinV2-Large | 195M | ImageNet-1K 10% | 125k | 0.4995 | 83.69 | huggingface | huggingface |
SwinV2-Large | 195M | ImageNet-1K 10% | 250k | 0.5140 | 83.66 | huggingface | huggingface |
SwinV2-Large | 195M | ImageNet-1K 10% | 500k | 0.5150 | 83.50 | huggingface | huggingface |
SwinV2-Large | 195M | ImageNet-1K 20% | 125k | 0.4675 | 84.38 | huggingface | huggingface |
SwinV2-Large | 195M | ImageNet-1K 20% | 250k | 0.4746 | 84.71 | huggingface | huggingface |
SwinV2-Large | 195M | ImageNet-1K 20% | 500k | 0.4960 | 84.59 | huggingface | huggingface |
SwinV2-Large | 195M | ImageNet-1K 50% | 125k | 0.4622 | 84.78 | huggingface | huggingface |
SwinV2-Large | 195M | ImageNet-1K 50% | 250k | 0.4566 | 85.38 | huggingface | huggingface |
SwinV2-Large | 195M | ImageNet-1K 50% | 500k | 0.4530 | 85.80 | huggingface | huggingface |
SwinV2-Large | 195M | ImageNet-1K | 125k | 0.4611 | 84.98 | huggingface | huggingface |
SwinV2-Large | 195M | ImageNet-1K | 250k | 0.4552 | 85.45 | huggingface | huggingface |
SwinV2-Large | 195M | ImageNet-1K | 500k | 0.4507 | 85.91 | huggingface | huggingface |
SwinV2-Large | 195M | ImageNet-22K | 125k | 0.4649 | 84.61 | huggingface | huggingface |
SwinV2-Large | 195M | ImageNet-22K | 250k | 0.4586 | 85.39 | huggingface | huggingface |
SwinV2-Large | 195M | ImageNet-22K | 500k | 0.4536 | 85.81 | huggingface | huggingface |
SwinV2-Huge | 655M | ImageNet-1K 20% | 125k | 0.4789 | 84.35 | huggingface | huggingface |
SwinV2-Huge | 655M | ImageNet-1K 20% | 250k | 0.5038 | 84.16 | huggingface | huggingface |
SwinV2-Huge | 655M | ImageNet-1K 20% | 500k | 0.5071 | 83.44 | huggingface | huggingface |
SwinV2-Huge | 655M | ImageNet-1K 50% | 125k | 0.4549 | 85.09 | huggingface | huggingface |
SwinV2-Huge | 655M | ImageNet-1K 50% | 250k | 0.4511 | 85.64 | huggingface | huggingface |
SwinV2-Huge | 655M | ImageNet-1K 50% | 500k | 0.4559 | 85.69 | huggingface | huggingface |
SwinV2-Huge | 655M | ImageNet-1K | 125k | 0.4531 | 85.23 | huggingface | huggingface |
SwinV2-Huge | 655M | ImageNet-1K | 250k | 0.4464 | 85.90 | huggingface | huggingface |
SwinV2-Huge | 655M | ImageNet-1K | 500k | 0.4416 | 86.34 | huggingface | huggingface |
SwinV2-Huge | 655M | ImageNet-22K | 125k | 0.4564 | 85.14 | huggingface | huggingface |
SwinV2-Huge | 655M | ImageNet-22K | 250k | 0.4499 | 85.86 | huggingface | huggingface |
SwinV2-Huge | 655M | ImageNet-22K | 500k | 0.4444 | 86.27 | huggingface | huggingface |
SwinV2-giant | 1.06B | ImageNet-1K 50% | 125k | 0.4534 | 85.44 | huggingface | huggingface |
SwinV2-giant | 1.06B | ImageNet-1K 50% | 250k | 0.4515 | 85.76 | huggingface | huggingface |
SwinV2-giant | 1.06B | ImageNet-1K 50% | 500k | 0.4719 | 85.51 | huggingface | huggingface |
SwinV2-giant | 1.06B | ImageNet-1K | 125k | 0.4513 | 85.57 | huggingface | huggingface |
SwinV2-giant | 1.06B | ImageNet-1K | 250k | 0.4442 | 86.12 | huggingface | huggingface |
SwinV2-giant | 1.06B | ImageNet-1K | 500k | 0.4395 | 86.46 | huggingface | huggingface |
SwinV2-giant | 1.06B | ImageNet-22K | 125k | 0.4544 | 85.39 | huggingface | huggingface |
SwinV2-giant | 1.06B | ImageNet-22K | 250k | 0.4475 | 85.96 | huggingface | huggingface |
SwinV2-giant | 1.06B | ImageNet-22K | 500k | 0.4416 | 86.53 | huggingface | huggingface |
ImageNet-1K Pre-trained and Fine-tuned Models
name | pre-train epochs | pre-train resolution | fine-tune resolution | acc@1 | pre-trained model | fine-tuned model |
---|---|---|---|---|---|---|
Swin-Base | 100 | 192x192 | 192x192 | 82.8 | google/config | google/config |
Swin-Base | 100 | 192x192 | 224x224 | 83.5 | google/config | google/config |
Swin-Base | 800 | 192x192 | 224x224 | 84.0 | google/config | google/config |
Swin-Large | 800 | 192x192 | 224x224 | 85.4 | google/config | google/config |
SwinV2-Huge | 800 | 192x192 | 224x224 | 85.7 | / | / |
SwinV2-Huge | 800 | 192x192 | 512x512 | 87.1 | / | / |