random_act #194
-
When trying to use the PPO_RNN agent, I set random_act to some positive number to collect some initial samples. However, this triggers an error when calling |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
As mentioned in #138 (comment), random samples are not implemented in on-policy algorithms (like PPO) yet. The main reason, unlike the off-policy algorithms, is what would be the appropriate |
Beta Was this translation helpful? Give feedback.
Hi @patricknaughton01
As mentioned in #138 (comment), random samples are not implemented in on-policy algorithms (like PPO) yet.
The main reason, unlike the off-policy algorithms, is what would be the appropriate
log_prob
values for the random actions generated, since thelog_prob
will be used to compare the distributions (KL) during optimization... and I haven't really thought about it in depth.