Skip to content

random_act #194

Answered by Toni-SM
patricknaughton01 asked this question in Q&A
Discussion options

You must be logged in to vote

Hi @patricknaughton01

As mentioned in #138 (comment), random samples are not implemented in on-policy algorithms (like PPO) yet.

The main reason, unlike the off-policy algorithms, is what would be the appropriate log_prob values for the random actions generated, since the log_prob will be used to compare the distributions (KL) during optimization... and I haven't really thought about it in depth.

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by patricknaughton01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants