Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Signs of the two score components for OOD detection #24

Open
j-cb opened this issue Nov 16, 2021 · 2 comments
Open

Signs of the two score components for OOD detection #24

j-cb opened this issue Nov 16, 2021 · 2 comments

Comments

@j-cb
Copy link

j-cb commented Nov 16, 2021

Hi,

in the scoring formula on page 7 in the paper, shouldn't the KL divergence of the classifier prediction from uniform be small for OOD inputs, and the rotation CE be large on OOD since the rotation head has not been trained to predict the original rotation on OOD inputs? I.e. one of the terms should have a minus sign, right?

If I read it correctly, the code uses different signs for those terms:

classification_loss = -1 * kl_div(class_uniform_dist, classification_smax)
rot_one_hot = torch.zeros_like(rot_smax).scatter_(1, target_rots.unsqueeze(1).cuda(), 1)
rot_loss = kl_div(rot_one_hot, rot_smax)

where KL is positive CE minus the constant entropy of U.

@lygjwy
Copy link

lygjwy commented Jan 22, 2022

Hi,

in the scoring formula on page 7 in the paper, shouldn't the KL divergence of the classifier prediction from uniform be small for OOD inputs, and the rotation CE be large on OOD since the rotation head has not been trained to predict the original rotation on OOD inputs? I.e. one of the terms should have a minus sign, right?

If I read it correctly, the code uses different signs for those terms:

classification_loss = -1 * kl_div(class_uniform_dist, classification_smax)
rot_one_hot = torch.zeros_like(rot_smax).scatter_(1, target_rots.unsqueeze(1).cuda(), 1)
rot_loss = kl_div(rot_one_hot, rot_smax)

where KL is positive CE minus the constant entropy of U.

I think the code is correct. For OOD inputs, the whole loss, i.e., kl_div(rot_one_hot, rot_smax) -1 * kl_div(class_uniform_dist, classification_smax) is bigger than ID inputs.

@zjysteven
Copy link

The code is correct while the formulation in the paper (Sec. 4.1 Method) is not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants