How about the result with SSD? #1

bailvwangzi · 2017-08-31T10:41:19Z

I'm glad to see your work with focal loss, have you got some better performance with focal loss than ohem in ssd? Moreover, have you test focal loss with your another work MobileNet-SSD?Thanks!

mychina75 · 2017-09-01T10:32:14Z

I tested the solution. loss computing may has errors.
loss drops fast first, and after few hundreds of iters, loss will getting bigger and bigger.

chuanqi305 · 2017-09-02T14:30:45Z

@mychina75 I found the error and corrected it, for verification I checked the gradient by check_focal_diff.py.
Sorry for my fault, please check out the new code and test.
@bailvwangzi I tested it on Mobilenet-SSD for 30000 iterations, and the mAP~0.717, a little droped.
Now I'm training with some other gamma values, hope to get better performance.

mychina75 · 2017-09-05T01:09:44Z

loss can steady reduce now, but the evaluation result of model getting worse...
##############
Line 7015: I0904 13:14:38.944118 13639 solver.cpp:546] Test net output #0: detection_eval = 0.0882107
Line 8234: I0904 14:54:06.567481 13639 solver.cpp:546] Test net output #0: detection_eval = 0.0710198
Line 9455: I0904 16:33:22.908924 13639 solver.cpp:546] Test net output #0: detection_eval = 0.0636553

chuanqi305 · 2017-09-05T01:24:50Z

@mychina75
What's the final loss value? In my test the evaluation is OK.

bailvwangzi · 2017-09-05T10:47:09Z

@chuanqi305 I test your focal loss with SSD, not mobilenet-ssd. I merge your code and change mining_type to NONE. The final loss can decrease to 0.3 , but the detection_eval = 74% worse than ohem 77%. Do you have other training tricks?

chuanqi305 · 2017-09-06T00:55:07Z

@bailvwangzi No, I did not get a higher mAP, too. Just the same as OHEM.

mychina75 · 2017-09-06T01:05:05Z

@chuanqi305
I trained the model on COCO 80, looks like model getting worse.
but loss values normal...
never met with this before, so I stopped training early. any clue about this?

I0904 13:14:08.266741 13639 solver.cpp:433] Iteration 2000, Testing net (#0)
I0904 13:14:08.304054 13639 net.cpp:693] Ignoring source layer mbox_loss
W0904 13:14:38.931223 13639 solver.cpp:524] Missing true_pos for label: 30
W0904 13:14:38.931674 13639 solver.cpp:524] Missing true_pos for label: 32
W0904 13:14:38.932234 13639 solver.cpp:524] Missing true_pos for label: 35
W0904 13:14:38.934804 13639 solver.cpp:524] Missing true_pos for label: 43
W0904 13:14:38.941249 13639 solver.cpp:524] Missing true_pos for label: 65
W0904 13:14:38.941372 13639 solver.cpp:524] Missing true_pos for label: 71
W0904 13:14:38.944011 13639 solver.cpp:524] Missing true_pos for label: 77
W0904 13:14:38.944099 13639 solver.cpp:524] Missing true_pos for label: 79
W0904 13:14:38.944110 13639 solver.cpp:524] Missing true_pos for label: 80
I0904 13:14:38.944118 13639 solver.cpp:546] Test net output #0: detection_eval = 0.0882107
I0904 13:14:41.769762 13639 solver.cpp:243] Iteration 2000, loss = 2.7271
I0904 13:14:41.769806 13639 solver.cpp:259] Train net output #0: mbox_loss = 3.15971 (* 1 = 3.15971 loss)
########
I0904 14:53:46.069953 13639 solver.cpp:433] Iteration 4000, Testing net (#0)
I0904 14:53:46.070041 13639 net.cpp:693] Ignoring source layer mbox_loss
I0904 14:53:46.254638 13639 blocking_queue.cpp:50] Data layer prefetch queue empty
W0904 14:54:06.554822 13639 solver.cpp:524] Missing true_pos for label: 13
W0904 14:54:06.556159 13639 solver.cpp:524] Missing true_pos for label: 30
W0904 14:54:06.556406 13639 solver.cpp:524] Missing true_pos for label: 32
W0904 14:54:06.556752 13639 solver.cpp:524] Missing true_pos for label: 35
W0904 14:54:06.560189 13639 solver.cpp:524] Missing true_pos for label: 43
W0904 14:54:06.560204 13639 solver.cpp:524] Missing true_pos for label: 44
W0904 14:54:06.560220 13639 solver.cpp:524] Missing true_pos for label: 45
W0904 14:54:06.566228 13639 solver.cpp:524] Missing true_pos for label: 65
W0904 14:54:06.566258 13639 solver.cpp:524] Missing true_pos for label: 69
W0904 14:54:06.566284 13639 solver.cpp:524] Missing true_pos for label: 71
W0904 14:54:06.567383 13639 solver.cpp:524] Missing true_pos for label: 77
W0904 14:54:06.567459 13639 solver.cpp:524] Missing true_pos for label: 79
W0904 14:54:06.567472 13639 solver.cpp:524] Missing true_pos for label: 80
I0904 14:54:06.567481 13639 solver.cpp:546] Test net output #0: detection_eval = 0.0710198
I0904 14:54:09.375293 13639 solver.cpp:243] Iteration 4000, loss = 2.80046
I0904 14:54:09.375329 13639 solver.cpp:259] Train net output #0: mbox_loss = 2.84744 (* 1 = 2.84744 loss)
########
I0904 16:33:02.799624 13639 solver.cpp:433] Iteration 6000, Testing net (#0)
I0904 16:33:02.799713 13639 net.cpp:693] Ignoring source layer mbox_loss
I0904 16:33:05.091681 13639 blocking_queue.cpp:50] Data layer prefetch queue empty
W0904 16:33:22.895887 13639 solver.cpp:524] Missing true_pos for label: 13
W0904 16:33:22.897629 13639 solver.cpp:524] Missing true_pos for label: 25
W0904 16:33:22.897707 13639 solver.cpp:524] Missing true_pos for label: 30
W0904 16:33:22.897748 13639 solver.cpp:524] Missing true_pos for label: 32
W0904 16:33:22.897907 13639 solver.cpp:524] Missing true_pos for label: 35
W0904 16:33:22.897919 13639 solver.cpp:524] Missing true_pos for label: 36
W0904 16:33:22.900342 13639 solver.cpp:524] Missing true_pos for label: 43
W0904 16:33:22.900359 13639 solver.cpp:524] Missing true_pos for label: 44
W0904 16:33:22.900367 13639 solver.cpp:524] Missing true_pos for label: 45
W0904 16:33:22.906970 13639 solver.cpp:524] Missing true_pos for label: 65
W0904 16:33:22.906996 13639 solver.cpp:524] Missing true_pos for label: 69
W0904 16:33:22.907037 13639 solver.cpp:524] Missing true_pos for label: 71
W0904 16:33:22.908805 13639 solver.cpp:524] Missing true_pos for label: 77
W0904 16:33:22.908903 13639 solver.cpp:524] Missing true_pos for label: 79
W0904 16:33:22.908915 13639 solver.cpp:524] Missing true_pos for label: 80
I0904 16:33:22.908924 13639 solver.cpp:546] Test net output #0: detection_eval = 0.0636553
I0904 16:33:25.634253 13639 solver.cpp:243] Iteration 6000, loss = 2.73545
I0904 16:33:25.634300 13639 solver.cpp:259] Train net output #0: mbox_loss = 2.85784 (* 1 = 2.85784 loss)

XiongweiWu · 2017-09-06T13:53:22Z

My implementation is almost the same as you besides some minor differences which are claimed in the paper. I shall test my function and investigate whether these differences are crucial or not

chuanqi305 · 2017-09-07T00:49:15Z

@mychina75 Too few iterations, you should evaluate after iteration 30000~50000.

chuanqi305 · 2017-09-07T00:53:11Z

@XiongweiWu Can you talk about some details? In my test, the performance has not been improved, focal loss is not better than OHEM.

jinxuan777 · 2017-09-07T11:50:35Z

@chuanqi305
I0907 19:47:47.503582 40167 solver.cpp:243] Iteration 0, loss = 538.726
I0907 19:47:47.503639 40167 solver.cpp:259] Train net output #0: mbox_loss = 538.726 (* 1 = 538.726 loss)
I0907 19:47:47.503693 40167 sgd_solver.cpp:138] Iteration 0, lr = 0.001
I0907 19:47:47.523170 40167 blocking_queue.cpp:50] Data layer prefetch queue empty
I0907 19:47:59.226004 40167 solver.cpp:243] Iteration 10, loss = 488.884
I0907 19:47:59.226058 40167 solver.cpp:259] Train net output #0: mbox_loss = 418.708 (* 1 = 418.708 loss)
I0907 19:47:59.226068 40167 sgd_solver.cpp:138] Iteration 10, lr = 0.001
I0907 19:48:11.308161 40167 solver.cpp:243] Iteration 20, loss = 412.334
I0907 19:48:11.308215 40167 solver.cpp:259] Train net output #0: mbox_loss = 393.423 (* 1 = 393.423 loss)
I0907 19:48:11.308225 40167 sgd_solver.cpp:138] Iteration 20, lr = 0.001
I0907 19:48:24.216085 40167 solver.cpp:243] Iteration 30, loss = 426.297
I0907 19:48:24.216294 40167 solver.cpp:259] Train net output #0: mbox_loss = 269.242 (* 1 = 269.242 loss)
I0907 19:48:24.216308 40167 sgd_solver.cpp:138] Iteration 30, lr = 0.001
I0907 19:48:36.642977 40167 solver.cpp:243] Iteration 40, loss = 449.73
I0907 19:48:36.643034 40167 solver.cpp:259] Train net output #0: mbox_loss = 424.498 (* 1 = 424.498 loss)
I0907 19:48:36.643045 40167 sgd_solver.cpp:138] Iteration 40, lr = 0.001
I0907 19:48:49.470823 40167 solver.cpp:243] Iteration 50, loss = 520.721
I0907 19:48:49.470880 40167 solver.cpp:259] Train net output #0: mbox_loss = 450.236 (* 1 = 450.236 loss)
I0907 19:48:49.470890 40167 sgd_solver.cpp:138] Iteration 50, lr = 0.001
I0907 19:49:01.526100 40167 solver.cpp:243] Iteration 60, loss = 470.837
I0907 19:49:01.526652 40167 solver.cpp:259] Train net output #0: mbox_loss = 504.9 (* 1 = 504.9 loss)
I0907 19:49:01.526669 40167 sgd_solver.cpp:138] Iteration 60, lr = 0.001
I0907 19:49:15.080325 40167 solver.cpp:243] Iteration 70, loss = 441.191
I0907 19:49:15.080377 40167 solver.cpp:259] Train net output #0: mbox_loss = 343.061 (* 1 = 343.061 loss)
I0907 19:49:15.080387 40167 sgd_solver.cpp:138] Iteration 70, lr = 0.001
I0907 19:49:27.861601 40167 solver.cpp:243] Iteration 80, loss = 416.44
I0907 19:49:27.861662 40167 solver.cpp:259] Train net output #0: mbox_loss = 524.938 (* 1 = 524.938 loss)
I0907 19:49:27.861677 40167 sgd_solver.cpp:138] Iteration 80, lr = 0.001
I0907 19:49:40.567715 40167 solver.cpp:243] Iteration 90, loss = 419.763
I0907 19:49:40.568455 40167 solver.cpp:259] Train net output #0: mbox_loss = 485.486 (* 1 = 485.486 loss)
I0907 19:49:40.568467 40167 sgd_solver.cpp:138] Iteration 90, lr = 0.001
I0907 19:49:52.489009 40167 solver.cpp:243] Iteration 100, loss = 496.385
I0907 19:49:52.489078 40167 solver.cpp:259] Train net output #0: mbox_loss = 598.885 (* 1 = 598.885 loss)
I0907 19:49:52.489092 40167 sgd_solver.cpp:138] Iteration 100, lr = 0.001
I0907 19:50:04.454450 40167 solver.cpp:243] Iteration 110, loss = 440.035
I0907 19:50:04.454507 40167 solver.cpp:259] Train net output #0: mbox_loss = 552.493 (* 1 = 552.493 loss)
I0907 19:50:04.454519 40167 sgd_solver.cpp:138] Iteration 110, lr = 0.001

is it normal ？

chuanqi305 · 2017-09-08T00:41:53Z

No, the loss should be < 10 after 10 iterations. Maybe there is a bug in your network structure?

XiongweiWu · 2017-09-12T06:43:13Z

@chuanqi305 Sorry for replying late. I did a series experiments on VOC07 with Fast RCNN, ZF backbone. The baseline is 57.1%. In your implementation, alpha is shared with all categories and only one K+1 classifiers is learned. The paper said K 2-class classifiers are trained and alpha is class-dependent. I use your code directly and achieve 53.3% mAP in my settings and when I replace all alpha to 1 the accuracy reaches 57.4%, slightly better than baseline. However, when I use all proposals to train, the performance reduce to 56.8%(worse than OHEM). The difficulty is the loss weight in bounding box regressor loss since we cannot use all samples to smooth. I will test in SSD today and hope you can also share some results

zhanglonghao1992 · 2017-09-18T02:06:54Z

@XiongweiWu Hi，I chaged a two-stage net RON(very similar to FPN) to one-stage net just like the paper did, and use all proposals to train.But my AP is too low. Do you have time to check my net ? 3q

zhanglonghao1992 · 2017-09-19T08:31:14Z

@bailvwangzi I'm training the nomal SSD and SSD with focal loss together and use ResNet101 as base line. The detection_eval of nomal SSD is 0.68 at iteration 10000, but the detection_eval of SSD with focal loss is just 0.45 at iteration 20000. It seems like that SSD with focal loss becomes very hard to train .Have you been through of this during training?

bailvwangzi · 2017-09-19T08:51:12Z

@zhanglonghao1992 the same with you. I get up to 74 mAP after 18w iteration. To avoid the effect of initialization, I use normal SSD model(e.x. you can use normal SSD iteration 10000) as pre-trained model to finetune, it can converge faster.

XiongweiWu · 2017-09-19T10:31:19Z

@bailvwangzi @zhanglonghao1992 hi, I just finish the ablation experiment on SSD with focal loss trained on VOC07 dataset. The performance of SSD is not as good as paper said ><. SSD's benchmark is 77.4% and 62% w or w\o data augmentation, while my result is 74.1% and 66% on focal loss. I remember the original paper said they remove all data augmentation tricks except mirror. I need more time to investigate, maybe the dataset, maybe the learning parameters, maybe the implementation(I think the implementation should be quite simple...)

bailvwangzi · 2017-09-19T11:12:45Z

@XiongweiWu nice ablation work, thanks.Looking forward to your better result!

zhanglonghao1992 · 2017-09-25T02:54:16Z

@bailvwangzi Hi, my mAP is still 0.6 after 18w iters using SSD with focal loss..You said your mAP on 18w iter is 0.74? How you do that? Do you change the lr rate or use nomal SSD model to initialize the model?

zhanglonghao1992 · 2017-09-26T01:27:21Z

@chuanqi305 Hi ,I use your code on SSD with Resnet-101 but the final result is 0.6.. Do you change the lr rate or some other params? How about the pretrain model?

zhanglonghao1992 · 2017-09-26T02:25:46Z

@chuanqi305 ..When I use VGG16, lr_rate=0.001 will make loss=nan, but ResNet-101 is ok with 0.001. I have to set lr_rate=0.0001 to train VGG16
Why? Do i have to change alpha and gamma?

zhanglonghao1992 · 2017-09-26T02:30:58Z

@XiongweiWu Hi ,could you leave your qq or E-mail address? I got some troubles when training SSD with focal loss on VGG16 and ResNet-101

zhanglonghao1992 · 2017-09-26T05:50:18Z

@mychina75 The same with you..Have you sloved that?

mychina75 · 2017-09-26T09:44:47Z

@zhanglonghao1992 no... I can not get better result.. maybe need to change some parameters?

zhanglonghao1992 · 2017-09-26T10:07:30Z

@mychina75 It only happens when I use VGG16. This 'Missing true_pos for label' never appears when i use ResNet101. I dont know why

pbdahzou · 2017-10-15T03:41:33Z

@chuanqi305 Thank you very much for sharing your Focal Loss implementation. I tested your code and also found no improvement with respect to original SSD. maybe the focal loss is not the key factor for the retinaNet?

chuanqi305 · 2017-10-31T14:11:12Z

@pbdahzou Maybe the Focal Loss is similar to OHEM in the training effect. The retinaNet use FPN framework, maybe the key factor is 'Deconvolution'.

mathmanu · 2017-11-24T13:39:43Z

Has any one tried both kind of losses together - i.e. some thing like:

layer {
name: "mbox_loss"
type: "MultiBoxLoss"
bottom: "mbox_loc"
bottom: "mbox_conf"
bottom: "mbox_priorbox"
bottom: "label"
top: "mbox_loss"
include {
phase: TRAIN
}
propagate_down: true
propagate_down: true
propagate_down: false
propagate_down: false
loss_param {
normalization: VALID
}
loss_weight: 0.5
multibox_loss_param {
loc_loss_type: SMOOTH_L1
conf_loss_type: SOFTMAX
loc_weight: 0.5
num_classes: 21
share_location: true
match_type: PER_PREDICTION
overlap_threshold: 0.5
use_prior_for_matching: true
background_label_id: 0
use_difficult_gt: true
neg_pos_ratio: 3.0
neg_overlap: 0.5
code_type: CENTER_SIZE
ignore_cross_boundary_bbox: false
mining_type: MAX_NEGATIVE
}
}

layer {
name: "mbox_focal_loss"
type: "MultiBoxFocalLoss" #change the type
bottom: "mbox_loc"
bottom: "mbox_conf"
bottom: "mbox_priorbox"
bottom: "label"
top: "mbox_focal_loss"
include {
phase: TRAIN
}
propagate_down: true
propagate_down: true
propagate_down: false
propagate_down: false
loss_param {
normalization: VALID
}
loss_weight: 0.5
focal_loss_param { #set the alpha and gamma, default is alpha=0.25, gamma=2.0
alpha: 0.25
gamma: 2.0
}
multibox_loss_param {
loc_loss_type: SMOOTH_L1
conf_loss_type: SOFTMAX
loc_weight: 1.0
num_classes: 21
share_location: true
match_type: PER_PREDICTION
overlap_threshold: 0.5
use_prior_for_matching: true
background_label_id: 0
use_difficult_gt: true
neg_pos_ratio: 3.0
neg_overlap: 0.5
code_type: CENTER_SIZE
ignore_cross_boundary_bbox: false
mining_type: NONE #do not use OHEM
}
}

mathmanu · 2017-11-24T16:44:11Z

It seems this has implemented Softmax Focal Loss, where as the original paper RetinaNet paper descibed used of Sigmoid instead of Softmax to compute the p. (See equatioon 5 and the paragraph below that).

Also see this discussion. kuangliu/pytorch-retinanet#6

Has any one tried Sigmod for the Focal loss layer?

XiongweiWu · 2017-11-26T03:12:45Z

@mathmanu even worse.

mathmanu · 2017-11-27T12:36:19Z

This work is useful for experimentation. Could you please add a license file?

May be the same license file in caffe:
https://github.com/BVLC/caffe/blob/master/LICENSE

Thank you.

chuanqi305 · 2017-11-30T06:19:16Z

@mathmanu I chose a MIT license, feel free for using this project.

mathmanu · 2017-11-30T06:20:26Z

Thank you.

zhohuiluo · 2018-05-30T08:56:05Z

@zhanglonghao1992 when I try SSD with VGG16,base lr =0.001, its loss become "nan".When I change base lr to 0.0001. its loss will decrease , after 10k iteration, the loss is about 1.9. While the final detection eval can only achieve 0.29. It is similar to your problem. Have you solved it?

PiyalGeorge · 2018-11-28T14:06:59Z

@zhanglonghao1992 , How is the result? does this help in detecting smaller objects with better accuracy?

How about the result with SSD? #1

How about the result with SSD? #1

Comments

bailvwangzi commented Aug 31, 2017

mychina75 commented Sep 1, 2017

chuanqi305 commented Sep 2, 2017

mychina75 commented Sep 5, 2017

chuanqi305 commented Sep 5, 2017

bailvwangzi commented Sep 5, 2017

chuanqi305 commented Sep 6, 2017

mychina75 commented Sep 6, 2017

XiongweiWu commented Sep 6, 2017

chuanqi305 commented Sep 7, 2017

chuanqi305 commented Sep 7, 2017

jinxuan777 commented Sep 7, 2017

chuanqi305 commented Sep 8, 2017

XiongweiWu commented Sep 12, 2017

zhanglonghao1992 commented Sep 18, 2017

zhanglonghao1992 commented Sep 19, 2017

bailvwangzi commented Sep 19, 2017

XiongweiWu commented Sep 19, 2017

bailvwangzi commented Sep 19, 2017

zhanglonghao1992 commented Sep 25, 2017

zhanglonghao1992 commented Sep 26, 2017

zhanglonghao1992 commented Sep 26, 2017

zhanglonghao1992 commented Sep 26, 2017

zhanglonghao1992 commented Sep 26, 2017

mychina75 commented Sep 26, 2017

zhanglonghao1992 commented Sep 26, 2017

pbdahzou commented Oct 15, 2017

chuanqi305 commented Oct 31, 2017

mathmanu commented Nov 24, 2017 • edited Loading

mathmanu commented Nov 24, 2017

XiongweiWu commented Nov 26, 2017

mathmanu commented Nov 27, 2017

chuanqi305 commented Nov 30, 2017

mathmanu commented Nov 30, 2017

zhohuiluo commented May 30, 2018 • edited Loading

PiyalGeorge commented Nov 28, 2018

mathmanu commented Nov 24, 2017 •

edited

Loading

zhohuiluo commented May 30, 2018 •

edited

Loading