-
Notifications
You must be signed in to change notification settings - Fork 170
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Policy out softmax with illegal moves #22
Comments
Does this answer understand the intent of the question? For example, # no output for 'pass'
policy_out = Dense(8*8, kernel_regularizer=l2(mc.l2_reg), activation="softmax", name="policy_out")(x) ↓ legal_mask = Input((8 * 8)) # (0: illegal, 1: legal)
...
# no output for 'pass'
x = Dense(8*8, kernel_regularizer=l2(mc.l2_reg))(x)
x = Multiply()([x, legal_mask])
policy_out = Activation("softmax", name="policy_out")(x)
...
self.model = Model([in_x, legal_mask], [policy_out, value_out], name="reversi_model") Input of |
Is not equal softmax((0, -0.5, 0.5))[1:2] and softmax((-0.5,0.5)) when legal_mask=(0,1,1)? |
oh, I was careless.
|
right, just it. thanks. |
reversi-alpha-zero/src/reversi_zero/agent/model.py
Line 48 in 5ee2f33
I see calculate policy softmax on the all moves contains illegal.
How can calculate softmax on the only legal moves, if set placeholder for legal moves?
The text was updated successfully, but these errors were encountered: