Policy out softmax with illegal moves #22

apollo-time · 2017-12-11T10:38:27Z

reversi-alpha-zero/src/reversi_zero/agent/model.py

Line 48 in 5ee2f33

    
           policy_out = Dense(8*8, kernel_regularizer=l2(mc.l2_reg), activation="softmax", name="policy_out")(x)

I see calculate policy softmax on the all moves contains illegal.
How can calculate softmax on the only legal moves, if set placeholder for legal moves?

mokemokechicken · 2017-12-12T02:17:26Z

I see calculate policy softmax on the all moves contains illegal.
How can calculate softmax on the only legal moves, if set placeholder for legal moves?

Does this answer understand the intent of the question?

For example,

  # no output for 'pass'
  policy_out = Dense(8*8, kernel_regularizer=l2(mc.l2_reg), activation="softmax", name="policy_out")(x)

↓

  legal_mask = Input((8 * 8))  # (0: illegal, 1: legal)
  ...
  # no output for 'pass'
  x = Dense(8*8, kernel_regularizer=l2(mc.l2_reg))(x)
  x = Multiply()([x, legal_mask])
  policy_out = Activation("softmax", name="policy_out")(x)
  
  ...
  
  self.model = Model([in_x, legal_mask], [policy_out, value_out], name="reversi_model")

Input of legal_mask is required to be computed in all training data.

apollo-time · 2017-12-12T08:52:09Z

Is not equal softmax((0, -0.5, 0.5))[1:2] and softmax((-0.5,0.5)) when legal_mask=(0,1,1)?

mokemokechicken · 2017-12-13T03:57:58Z

@apollo-time

oh, I was careless.
How about like this?

  legal_mask = Input((8 * 8))  # (0: illegal, 1: legal)
  legal_mask_2 = Lambda(lambda x: (x-1)*1000000)(legal_mask)  # illegal -> -1000000, legal -> 0
  ...
  # no output for 'pass'
  x = Dense(8*8, kernel_regularizer=l2(mc.l2_reg))(x)
  x = Add()([x, legal_mask_2])
  policy_out = Activation("softmax", name="policy_out")(x)
  
  ...
  
  self.model = Model([in_x, legal_mask], [policy_out, value_out], name="reversi_model")

> softmax([-0.5, 0.5])
[ 0.26894142,  0.73105858]

> softmax([-0.5, 0.5, -1000000])
[ 0.26894142,  0.73105858,  0.        ]

apollo-time · 2017-12-13T06:14:34Z

right, just it. thanks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Policy out softmax with illegal moves #22

Policy out softmax with illegal moves #22

apollo-time commented Dec 11, 2017

mokemokechicken commented Dec 12, 2017

apollo-time commented Dec 12, 2017

mokemokechicken commented Dec 13, 2017 •

edited

Loading

apollo-time commented Dec 13, 2017

Policy out softmax with illegal moves #22

Policy out softmax with illegal moves #22

Comments

apollo-time commented Dec 11, 2017

mokemokechicken commented Dec 12, 2017

apollo-time commented Dec 12, 2017

mokemokechicken commented Dec 13, 2017 • edited Loading

apollo-time commented Dec 13, 2017

mokemokechicken commented Dec 13, 2017 •

edited

Loading