Environment reset function #75

IanWangg · 2023-05-16T18:27:00Z

IanWangg
May 16, 2023

Hi, I am looking into using skrl+isaacgym as future research tools. Many thanks to the authors for providing such a quality library.

I am a bit confused by the implementation of IssacGymPreview4Wrapper and the Trainers here, the following are the reset function of the wrapper and its usage in the trainer:

def reset(self) -> Tuple[torch.Tensor, Any]:
    """Reset the environment

    :return: Observation, info
    :rtype: torch.Tensor and any other info
    """
    if self._reset_once:
        self._obs_dict = self._env.reset()
        self._reset_once = False
    return self._obs_dict["obs"], {}

# reset environments
with torch.no_grad():
    if terminated.any() or truncated.any():
        states, infos = self.env.reset()
    else:
        states.copy_(next_states)

It seems that, when using multiple environments, one one of them terminates, and all of them will get reset? Or is there some mechanism on the isaac gym side that deal with this case, so that only the terminated ones get reset?

If I am correct (all of them get reset if one of them terminates), why design like this? Not many algorithm can take advantages of multi-environment, but the PPO implementations usually do not do this.

Thank you in advance for any explanation!

Toni-SM · 2023-05-16T20:37:50Z

Toni-SM
May 16, 2023
Maintainer

Hi @IanWangg

The implementation of multiples environments (parallel environment in the case of Isaac Gym preview, Isaac Orbit and Omniverse Isaac Gym; or vectorized environments using OpenAI Gym or Farama Gymnasium) where num_envs > 1 handle terminating environment restarts internally.

In these cases, it is only necessary to restart all environments, from the outside (i.e. from the skrl's trainer) only once (at the beginning of the training/evaluation).

In fact, the wrappers for those environment types only return the observations (and infos) in subsequent invocations, without resetting the environments. This via the self._reset_once flag.

The implementation of the trainers of the released version (which is based on the basic Gym/Gymnasium API for a single environment) always checks and calls the reset method of the wrappers (when the execution is terminated or truncated), regardless of whether the setup contains multiple environments or not. In the case of multiple environments, it does not produce an effective reset, at least in subsequent calls.

Then, in the case of multiple environments, such a practice is not necessary and causes additional computational overhead. Therefore, the trainer implementation in upcoming versions of skrl will handle this differently, as implemented in the unreleased multi-agent branch.

skrl/skrl/trainers/torch/base.py

Lines 201 to 209 in f6c7d71

    
           # reset environments 
        
           if self.env.num_envs > 1: 
        
               states = next_states 
        
           else: 
        
               if terminated.any() or truncated.any(): 
        
                   with torch.no_grad(): 
        
                       states, infos = self.env.reset() 
        
               else: 
        
                   states = next_states

0 replies

IanWangg · 2023-05-16T20:44:25Z

IanWangg
May 16, 2023
Author

So, in single agent environment, among all parallel environments in Isaac family, call reset() from outside will only reset environments that are terminated or truncated, correct? And for Gym/Gymnasium parallel environments, call reset() from outside will actually reset all environments regardless whether they are terminated/truncated or not?

0 replies

Toni-SM · 2023-05-17T13:39:29Z

Toni-SM
May 17, 2023
Maintainer

For both cases, NVIDIA Isaac family and vectorized gym/gymnasium environments, where the num_envs > 1, calling the original environment .reset() method in subsequent timesteps (except at the begging of the training/evaluation) is an error because these interfaces autoreset (internally) sub-environments after they terminate or truncated.

That is why calling the wrapped environment .reset() method from skrl trainers (from outside) does not invoke the original environment method (in subsequent timesteps) and only returns the environment next states (and infos).

For example, the OpenAI Gym environment wrapper handle the vectorized environment as follow (similar to how the Isaac Gym wrapper does it, as you showed in your first post)

skrl/skrl/envs/torch/wrappers.py

Lines 460 to 464 in 6b8b70f

    
           # handle vectorized envs 
        
           if self._vectorized: 
        
               if not self._reset_once: 
        
                   return self._obs_tensor, self._info_dict 
        
               self._reset_once = False

0 replies

IanWangg · 2023-05-17T23:11:00Z

IanWangg
May 17, 2023
Author

Thank you for your explanation!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Environment reset function #75

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 4 comments

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Environment reset function #75

IanWangg May 16, 2023

Replies: 4 comments

Toni-SM May 16, 2023 Maintainer

IanWangg May 16, 2023 Author

Toni-SM May 17, 2023 Maintainer

IanWangg May 17, 2023 Author

IanWangg
May 16, 2023

Toni-SM
May 16, 2023
Maintainer

IanWangg
May 16, 2023
Author

Toni-SM
May 17, 2023
Maintainer

IanWangg
May 17, 2023
Author