You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm having a look at this example (because I'm trying to implement PPO as well), as well as others, and I can't figure out a way to have 2 inter-dependent networks for autodiff.
Entropy loss and value loss I can handle just fine, but pg_loss depends on ratio of post-softmax probabilities, which comes from the policy network AND advantages, which depend on the value network. (Python impl: pg_loss = torch.max(-mb_advantages * ratio, -mb_advantages * torch.clamp(ratio, 1 - args.clip_coef, 1 + args.clip_coef)).mean() )
I have 2 separate networks for those, is there any way to have the autodiff work only regarding the Value network on one side, and only regarding the Policy network on the other side? I know that tensorflow can do that somehow, but I don't know about this lib.
Thanks.
The text was updated successfully, but these errors were encountered:
I'm having a look at this example (because I'm trying to implement PPO as well), as well as others, and I can't figure out a way to have 2 inter-dependent networks for autodiff.
In the case of PPO, the formula is basically
loss = pg_loss - args.ent_coef * entropy_loss + v_loss * args.vf_coef
(https://github.com/vwxyzjn/ppo-implementation-details/blob/main/ppo.py#L297).Entropy loss and value loss I can handle just fine, but
pg_loss
depends on ratio of post-softmax probabilities, which comes from the policy network AND advantages, which depend on the value network. (Python impl:pg_loss = torch.max(-mb_advantages * ratio, -mb_advantages * torch.clamp(ratio, 1 - args.clip_coef, 1 + args.clip_coef)).mean()
)I have 2 separate networks for those, is there any way to have the autodiff work only regarding the Value network on one side, and only regarding the Policy network on the other side? I know that tensorflow can do that somehow, but I don't know about this lib.
Thanks.
The text was updated successfully, but these errors were encountered: