random_act #194

patricknaughton01 · 2024-08-29T15:59:08Z

patricknaughton01
Aug 29, 2024

When trying to use the PPO_RNN agent, I set random_act to some positive number to collect some initial samples. However, this triggers an error when calling record_transition since self._current_log_prob isn't set if random_act is called.

Answered by Toni-SM

Aug 30, 2024

Hi @patricknaughton01

As mentioned in #138 (comment), random samples are not implemented in on-policy algorithms (like PPO) yet.

The main reason, unlike the off-policy algorithms, is what would be the appropriate log_prob values for the random actions generated, since the log_prob will be used to compare the distributions (KL) during optimization... and I haven't really thought about it in depth.

View full answer

Toni-SM · 2024-08-30T13:07:35Z

Toni-SM
Aug 30, 2024
Maintainer

Hi @patricknaughton01

As mentioned in #138 (comment), random samples are not implemented in on-policy algorithms (like PPO) yet.

The main reason, unlike the off-policy algorithms, is what would be the appropriate log_prob values for the random actions generated, since the log_prob will be used to compare the distributions (KL) during optimization... and I haven't really thought about it in depth.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

random_act #194

{{title}}

Replies: 1 comment

{{title}}

Select a reply

random_act #194

patricknaughton01 Aug 29, 2024

Replies: 1 comment

Toni-SM Aug 30, 2024 Maintainer

patricknaughton01
Aug 29, 2024

Toni-SM
Aug 30, 2024
Maintainer