You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm trying to implement action masking for the discrete CQL algorithm, i.e. I'd like to make some actions impossible to choose given some conditions on the current observation.
At inference time it should be easy, because predict_value can be used to get action values for every possible action, then the mask could be used to filter out the impossible actions, and finally argmax can be used to get the action to execute.
However, I'm unsure on how to implement action masking during training. Is there any way to do that without changing d3rlpy source code? If not, could you please give me some hints about which parts of the codebase should be changed to achieve this?
Thank you a lot in advance!
The text was updated successfully, but these errors were encountered:
rgraziosi-fbk
changed the title
How to add an action mask to DiscreteCQL algorithm?
[QUESTION] How to add an action mask to DiscreteCQL algorithm?
Dec 4, 2023
@rgraziosi-fbk Thanks for the issue. I assume that you want to mask actions at bootstrap target calculation. In that case, you need to modify this action selection here:
Hi everyone!
I'm trying to implement action masking for the discrete CQL algorithm, i.e. I'd like to make some actions impossible to choose given some conditions on the current observation.
At inference time it should be easy, because
predict_value
can be used to get action values for every possible action, then the mask could be used to filter out the impossible actions, and finally argmax can be used to get the action to execute.However, I'm unsure on how to implement action masking during training. Is there any way to do that without changing d3rlpy source code? If not, could you please give me some hints about which parts of the codebase should be changed to achieve this?
Thank you a lot in advance!
The text was updated successfully, but these errors were encountered: