GluonCV toolkit v0.4.0
Pre-release0.4.0 Release Note
Highlights
GluonCV v0.4 added Pose Estimation models, Int8 quantization for intel CPUs, added FPN Faster/Mask-RCNN, wide se/resnext models, and we also included multiple usability improvements.
We highly suggest to use GluonCV 0.4.0 with MXNet>=1.4.0 to avoid some dependency issues. For some specific tasks you may need MXNet nightly build. See https://gluon-cv.mxnet.io/index.html
New Models released in 0.4
Model | Metric | 0.4 |
---|---|---|
simple_pose_resnet152_v1b | OKS AP* | 74.2 |
simple_pose_resnet50_v1b | OKS AP* | 72.2 |
ResNext50_32x4d | ImageNet Top-1 | 79.32 |
ResNext101_64x4d | ImageNet Top-1 | 80.69 |
SE_ResNext101_32x4d | ImageNet Top-1 | 79.95 |
SE_ResNext101_64x4d | ImageNet Top-1 | 81.01 |
yolo3_mobilenet1.0_coco | COCO mAP | 28.6 |
* Using Ground-Truth person detection results
Int8 Quantization with Intel Deep Learning boost
GluonCV is now integrated with Intel's vector neural network instruction(vnni) to accelerate model inference speed.
Note that you will need a capable Intel Skylake CPU to see proper speed up ratio.
Model | Dataset | Batch Size | C5.18x FP32 | C5.18x INT8 | Speedup | FP32 Acc | INT8 Acc |
---|---|---|---|---|---|---|---|
resnet50_v1 | ImageNet | 128 | 122.02 | 276.72 | 2.27 | 77.21%/93.55% | 76.86%/93.46% |
mobilenet1.0 | ImageNet | 128 | 375.33 | 1016.39 | 2.71 | 73.28%/91.22% | 72.85%/90.99% |
ssd_300_vgg16_atrous_voc* | VOC | 224 | 21.55 | 31.47 | 1.46 | 77.4 | 77.46 |
ssd_512_vgg16_atrous_voc* | VOC | 224 | 7.63 | 11.69 | 1.53 | 78.41 | 78.39 |
ssd_512_resnet50_v1_voc* | VOC | 224 | 17.81 | 34.55 | 1.94 | 80.21 | 80.16 |
ssd_512_mobilenet1.0_voc* | VOC | 224 | 31.13 | 48.72 | 1.57 | 75.42 | 75.04 |
*nms_thresh=0.45, nms_topk=200
Usage of int8
quantized model is identical to standard GluonCV models, simple use suffix _int8
.
For example, use resnet50_v1_int8
as int8
quantized version of resnet50_v1
.
Pruned ResNet
https://gluon-cv.mxnet.io/model_zoo/classification.html#pruned-resnet
Pruning channels of convolution layers is an very effective way to reduce model redundency which aims to speed up inference without sacrificing significant accuracy. GluonCV 0.4 has included several pruned resnets from original GluonCV SoTA ResNets for ImageNet.
Model | Top-1 | Top-5 | Hashtag | Speedup (to original ResNet) |
---|---|---|---|---|
resnet18_v1b_0.89 | 67.2 | 87.45 | 54f7742b | 2x |
resnet50_v1d_0.86 | 78.02 | 93.82 | a230c33f | 1.68x |
resnet50_v1d_0.48 | 74.66 | 92.34 | 0d3e69bb | 3.3x |
resnet50_v1d_0.37 | 70.71 | 89.74 | 9982ae49 | 5.01x |
resnet50_v1d_0.11 | 63.22 | 84.79 | 6a25eece | 8.78x |
resnet101_v1d_0.76 | 79.46 | 94.69 | a872796b | 1.8x |
resnet101_v1d_0.73 | 78.89 | 94.48 | 712fccb1 | 2.02x |
Scripts for pruning resnets will be release in the future.
More GANs(thanks @husonchen)
SRGAN
A GluonCV SRGAN of "Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network ": https://github.com/dmlc/gluon-cv/tree/master/scripts/gan/srgan
CycleGAN
Image-to-Image translation reproduced in GluonCV: https://github.com/dmlc/gluon-cv/tree/master/scripts/gan/cycle_gan
Residual Attention Network(thanks @PistonY)
GluonCV implementation of https://arxiv.org/abs/1704.06904
New application: Human Pose Estimation
https://gluon-cv.mxnet.io/model_zoo/pose.html
Human Pose Estimation in GluonCV is a complete application set, including model definition, training scripts, useful loss and metric functions. We also included some pre-trained models and usage tutorials.
Model | OKS AP | OKS AP (with flip) |
---|---|---|
simple_pose_resnet18_v1b | 66.3/89.2/73.4 | 68.4/90.3/75.7 |
simple_pose_resnet18_v1b | 52.8/83.6/57.9 | 54.5/84.8/60.3 |
simple_pose_resnet50_v1b | 71.0/91.2/78.6 | 72.2/92.2/79.9 |
simple_pose_resnet50_v1d | 71.6/91.3/78.7 | 73.3/92.4/80.8 |
simple_pose_resnet101_v1b | 72.4/92.2/79.8 | 73.7/92.3/81.1 |
simple_pose_resnet101_v1d | 73.0/92.2/80.8 | 74.2/92.4/82.0 |
simple_pose_resnet152_v1b | 72.4/92.1/79.6 | 74.2/92.3/82.1 |
simple_pose_resnet152_v1d | 73.4/92.3/80.7 | 74.6/93.4/82.1 |
simple_pose_resnet152_v1d | 74.8/92.3/82.0 | 76.1/92.4/83.2 |
Feature Pyramid Network for Faster/Mask-RCNN
Model | bbox/seg mAP | Caffe bbox/seg |
---|---|---|
faster_rcnn_fpn_resnet50_v1b_coco | 0.384/- | 0.379 |
faster_rcnn_fpn_bn_resnet50_v1b_coco | 0.393/- | - |
faster_rcnn_fpn_resnet101_v1d_coco | 0.412/- | 0.398/- |
maskrcnn_fpn_resnet50_v1b_coco | 0.392/0.353 | 0.386/0.345 |
maskrcnn_fpn_resnet101_v1d_coco | 0.423/0.377 | 0.409/0.364 |
Bug fixes and Improvements
- Now all resnet definitions in GluonCV support Synchronized BatchNorm
- Now pretrained object detection models support
reset_class
for reuse partial category knowledge so some task may not need to finetune models anymore: https://gluon-cv.mxnet.io/build/examples_detection/skip_fintune.html#sphx-glr-build-examples-detection-skip-fintune-py - Fix some dataloader issue(need mxnet >= 1.4.0)
- Fix some segmentation models that won't hybridize
- Fix some detection model random Nan problems (require mxnet latest nightly build, >= 20190315)
- Various other minor bug fixes