Replies: 4 comments
-
Hi @IanWangg The implementation of multiples environments (parallel environment in the case of Isaac Gym preview, Isaac Orbit and Omniverse Isaac Gym; or vectorized environments using OpenAI Gym or Farama Gymnasium) where In these cases, it is only necessary to restart all environments, from the outside (i.e. from the skrl's trainer) only once (at the beginning of the training/evaluation). In fact, the wrappers for those environment types only return the observations (and infos) in subsequent invocations, without resetting the environments. This via the The implementation of the trainers of the released version (which is based on the basic Gym/Gymnasium API for a single environment) always checks and calls the reset method of the wrappers (when the execution is terminated or truncated), regardless of whether the setup contains multiple environments or not. In the case of multiple environments, it does not produce an effective reset, at least in subsequent calls. Then, in the case of multiple environments, such a practice is not necessary and causes additional computational overhead. Therefore, the trainer implementation in upcoming versions of skrl will handle this differently, as implemented in the unreleased skrl/skrl/trainers/torch/base.py Lines 201 to 209 in f6c7d71 |
Beta Was this translation helpful? Give feedback.
-
So, in single agent environment, among all parallel environments in Isaac family, call |
Beta Was this translation helpful? Give feedback.
-
For both cases, NVIDIA Isaac family and vectorized gym/gymnasium environments, where the That is why calling the wrapped environment For example, the OpenAI Gym environment wrapper handle the vectorized environment as follow (similar to how the Isaac Gym wrapper does it, as you showed in your first post) skrl/skrl/envs/torch/wrappers.py Lines 460 to 464 in 6b8b70f |
Beta Was this translation helpful? Give feedback.
-
Thank you for your explanation! |
Beta Was this translation helpful? Give feedback.
-
Hi, I am looking into using skrl+isaacgym as future research tools. Many thanks to the authors for providing such a quality library.
I am a bit confused by the implementation of IssacGymPreview4Wrapper and the Trainers here, the following are the reset function of the wrapper and its usage in the trainer:
It seems that, when using multiple environments, one one of them terminates, and all of them will get reset? Or is there some mechanism on the isaac gym side that deal with this case, so that only the terminated ones get reset?
If I am correct (all of them get reset if one of them terminates), why design like this? Not many algorithm can take advantages of multi-environment, but the PPO implementations usually do not do this.
Thank you in advance for any explanation!
Beta Was this translation helpful? Give feedback.
All reactions