You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I ran the training script provided in this repo, and did not change any code. However, there is a significant performance gap between the code and your paper (for example, 0.588 v.s. 0.715 AUROC on tinyimagenet).
Should I tune some learning parameters for increasing accuracy? I have tried to adjust the lr and epochs but it does not work. I am looking forward to your insightful suggestions for this.
(cvaecaposr) xx@xxx:~/cvaecaposr$ sh ./scripts/train_tinyimagenet.sh
{
"data_base_path": "./data",
"val_ratio": 0.2,
"seed": 1234,
"known_classes": [
2,
3,
13,
30,
44,
45,
64,
66,
76,
101,
111,
121,
128,
130,
136,
158,
167,
170,
187,
193
],
"unknown_classes": [
0,
1,
4,
5,
6,
7,
8,
9,
10,
11,
12,
14,
15,
16,
17,
18,
19,
20,
21,
22,
23,
24,
25,
26,
27,
28,
29,
31,
32,
33,
34,
35,
36,
37,
38,
39,
40,
41,
42,
43,
46,
47,
48,
49,
50,
51,
52,
53,
54,
55,
56,
57,
58,
59,
60,
61,
62,
63,
65,
67,
68,
69,
70,
71,
72,
73,
74,
75,
77,
78,
79,
80,
81,
82,
83,
84,
85,
86,
87,
88,
89,
90,
91,
92,
93,
94,
95,
96,
97,
98,
99,
100,
102,
103,
104,
105,
106,
107,
108,
109,
110,
112,
113,
114,
115,
116,
117,
118,
119,
120,
122,
123,
124,
125,
126,
127,
129,
131,
132,
133,
134,
135,
137,
138,
139,
140,
141,
142,
143,
144,
145,
146,
147,
148,
149,
150,
151,
152,
153,
154,
155,
156,
157,
159,
160,
161,
162,
163,
164,
165,
166,
168,
169,
171,
172,
173,
174,
175,
176,
177,
178,
179,
180,
181,
182,
183,
184,
185,
186,
188,
189,
190,
191,
192,
194,
195,
196,
197,
198,
199
],
"split_num": 0,
"batch_size": 32,
"num_workers": 0,
"dataset": "tiny_imagenet",
"z_dim": 128,
"lr": 5e-05,
"t_mu_shift": 10.0,
"t_var_scale": 0.01,
"alpha": 1.0,
"beta": 0.01,
"margin": 10.0,
"in_dim_caps": 16,
"out_dim_caps": 32,
"checkpoint": "",
"mode": "train",
"epochs": 100
}
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7]
| Name | Type | Params
--------------------------------------
0 | enc | ResNet34 | 21.3 M
1 | vae_cap | VaeCap | 23.5 M
2 | fc | Linear | 10.5 M
3 | dec | Decoder | 760 K
4 | t_mean | Embedding | 51.2 K
5 | t_var | Embedding | 51.2 K
--------------------------------------
56.1 M Trainable params
0 Non-trainable params
56.1 M Total params
224.552 Total estimated model params size (MB)
/xx/anaconda3/envs/cvaecaposr/lib/python3.6/site-packages/pytorch_lightning/utilities/distributed.py:52: UserWarning: The dataloader, train dataloader, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 80 which is the number of cpus on this machine) in the `DataLoader` init to improve performance.
warnings.warn(*args, **kwargs)
/xx/anaconda3/envs/cvaecaposr/lib/python3.6/site-packages/pytorch_lightning/utilities/distributed.py:52: UserWarning: The dataloader, val dataloader 0, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 80 which is the number of cpus on this machine) in the `DataLoader` init to improve performance.
warnings.warn(*args, **kwargs)
Epoch 20: 100%|██████████▉| 312/313 [00:38<00:00, 8.21it/s, loss=4.99e+03, v_num=0, train_acc=0.938, validation_acc=0.456Epoch 21: reducing learning rate of group 0 to 2.5000e-05.█████████████████████████████▊ | 62/63 [00:03<00:00, 16.56it/s]
Epoch 28: 100%|██████████▉| 312/313 [00:37<00:00, 8.35it/s, loss=3.31e+03, v_num=0, train_acc=0.906, validation_acc=0.460Epoch 29: reducing learning rate of group 0 to 1.2500e-05.█████████████████████████████▊ | 62/63 [00:03<00:00, 16.48it/s]
Epoch 42: 100%|██████████▉| 312/313 [00:38<00:00, 8.20it/s, loss=2.32e+03, v_num=0, train_acc=0.906, validation_acc=0.459Epoch 43: reducing learning rate of group 0 to 6.2500e-06.█████████████████████████████▊ | 62/63 [00:03<00:00, 16.35it/s]
Epoch 48: 100%|██████████▉| 312/313 [00:37<00:00, 8.35it/s, loss=1.88e+03, v_num=0, train_acc=0.969, validation_acc=0.474Epoch 49: reducing learning rate of group 0 to 3.1250e-06.█████████████████████████████▊ | 62/63 [00:03<00:00, 16.42it/s]
Epoch 54: 100%|██████████▉| 312/313 [00:37<00:00, 8.25it/s, loss=2.41e+03, v_num=0, train_acc=0.938, validation_acc=0.465Epoch 55: reducing learning rate of group 0 to 1.5625e-06.█████████████████████████████▊ | 62/63 [00:03<00:00, 16.49it/s]
Epoch 60: 100%|██████████▉| 312/313 [00:37<00:00, 8.35it/s, loss=1.72e+03, v_num=0, train_acc=1.000, validation_acc=0.468Epoch 61: reducing learning rate of group 0 to 7.8125e-07.█████████████████████████████▊ | 62/63 [00:03<00:00, 16.53it/s]
Epoch 66: 100%|██████████▉| 312/313 [00:37<00:00, 8.35it/s, loss=1.88e+03, v_num=0, train_acc=1.000, validation_acc=0.471Epoch 67: reducing learning rate of group 0 to 3.9063e-07.█████████████████████████████▊ | 62/63 [00:03<00:00, 16.44it/s]
Epoch 72: 100%|██████████▉| 312/313 [00:38<00:00, 8.21it/s, loss=1.62e+03, v_num=0, train_acc=1.000, validation_acc=0.466Epoch 73: reducing learning rate of group 0 to 1.9531e-07.█████████████████████████████▊ | 62/63 [00:03<00:00, 16.36it/s]
Epoch 78: 100%|██████████▉| 312/313 [00:37<00:00, 8.35it/s, loss=1.15e+03, v_num=0, train_acc=1.000, validation_acc=0.472Epoch 79: reducing learning rate of group 0 to 9.7656e-08.█████████████████████████████▊ | 62/63 [00:03<00:00, 16.41it/s]
Epoch 84: 100%|██████████▉| 312/313 [00:40<00:00, 7.73it/s, loss=1.48e+03, v_num=0, train_acc=0.969, validation_acc=0.470Epoch 85: reducing learning rate of group 0 to 4.8828e-08.█████████████████████████████▊ | 62/63 [00:04<00:00, 15.34it/s]
Epoch 90: 100%|██████████▉| 312/313 [00:38<00:00, 8.04it/s, loss=1.68e+03, v_num=0, train_acc=0.938, validation_acc=0.472Epoch 91: reducing learning rate of group 0 to 2.4414e-08.█████████████████████████████▊ | 62/63 [00:03<00:00, 16.67it/s]
Epoch 96: 100%|██████████▉| 312/313 [00:38<00:00, 8.09it/s, loss=1.82e+03, v_num=0, train_acc=0.938, validation_acc=0.471Epoch 97: reducing learning rate of group 0 to 1.2207e-08.█████████████████████████████▊ | 62/63 [00:03<00:00, 16.40it/s]
Epoch 99: 100%|███████████| 313/313 [00:38<00:00, 8.10it/s, loss=1.76e+03, v_num=0, train_acc=1.000, validation_acc=0.468Saving latest checkpoint...
Epoch 99: 100%|███████████| 313/313 [00:40<00:00, 7.73it/s, loss=1.76e+03, v_num=0, train_acc=1.000, validation_acc=0.468]
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7]
/xx/anaconda3/envs/cvaecaposr/lib/python3.6/site-packages/pytorch_lightning/utilities/distributed.py:52: UserWarning: The dataloader, test dataloader 0, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 80 which is the number of cpus on this machine) in the `DataLoader` init to improve performance.
warnings.warn(*args, **kwargs)
Testing: 100%|██████████████████████████████████████████████████████████████████████████▊| 312/313 [00:23<00:00, 13.07it/s]/xx/anaconda3/envs/cvaecaposr/lib/python3.6/site-packages/pytorch_lightning/utilities/distributed.py:52: UserWarning: Metric `AUROC` will save all targets and predictions in buffer. For large datasets this may lead to large memory footprint.
warnings.warn(*args, **kwargs)
Testing: 100%|███████████████████████████████████████████████████████████████████████████| 313/313 [00:23<00:00, 13.10it/s]
--------------------------------------------------------------------------------
DATALOADER:0 TEST RESULTS
{'test_auroc': 0.5880855321884155}
The text was updated successfully, but these errors were encountered:
Hi, thanks for your nice job.
I ran the training script provided in this repo, and did not change any code. However, there is a significant performance gap between the code and your paper (for example, 0.588 v.s. 0.715 AUROC on tinyimagenet).
Should I tune some learning parameters for increasing accuracy? I have tried to adjust the lr and epochs but it does not work. I am looking forward to your insightful suggestions for this.
The text was updated successfully, but these errors were encountered: