Skip to content

clip_action and action_space #63

Answered by Toni-SM
HumbleLee asked this question in Q&A
Discussion options

You must be logged in to vote

Hi @HumbleLee

Both, SAC and PPO, for example, use a stochastic policy.

In the current implementation (a Gaussian policy), the function approximator (artificial neural networks) returns deterministic values for the mean actions (mean_action) and the natural logarithm of the standard deviation (log_std).
Those values are used to parametrize a Gaussian distribution (N) where std = e ^ log_std, as indicated in the concept image.

Then, the stochastic action is sampled as action ~ N ( mean_action, std )

The Gaussian mixin provided by skrl clip the log_std (a parameter of the artificial neural network) in the range [-20, 2] by default.
Then, the limits of the std is [2.06e-09, 7.389]

If we plot…

Replies: 1 comment 2 replies

Comment options

You must be logged in to vote
2 replies
@HumbleLee
Comment options

@Toni-SM
Comment options

Answer selected by Toni-SM
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants