About value_norm in ppo #172
-
If Line 176 in 0b71fc4 Can anyone elaborate the reason behind this? If I understand correctly, Another question, is there any example project implementing A3C with DI-Engine? Thank you |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 4 replies
-
First of all, thank you very much for your question. The key insight of value normalization is that neural networks can more easily fit normalized data. Regarding the principle and experimental results of value normalization,
We use the normalizaed value in critic loss calculation and use the original unormalized value in advantage calculation, because we have another key In the following, mainly explains our implementation. Here in this line: Line 173 in 0b71fc4 value are the original output of the value network, which are expected to be limited to the normalized space for the aforementioned reason, note that in our implementation we only normalize value to variance-one, not mean-zero, because our experiments show that subtracting the mean does not have a particularly obvious benefit on most tasks. To make the value network regress to the normalized values during value learning, in this line: Line 176 in 0b71fc4 Line 180 in 0b71fc4 And please note that when we compute the ppo_loss in this line: Line 213 in 0b71fc4 value_loss is used to backpropagate the gradients to update the value network, so in the calculation of value_loss : DI-engine/ding/rl_utils/ppo.py Line 203 in 0b71fc4 value_old i.e. the batch['value'] should be the original output of the value network(i.e. the normalized value), so here in this line Line 185 in 0b71fc4 data[‘value’].
It's the same case for the normalization and denormalization of return and next_value. Hope the above answer can answer some of your questions. Thanks a lot. |
Beta Was this translation helpful? Give feedback.
-
As for A3C, we select to implement reduce gradients methods so we abandon some sync gradients methods such as A3C. Why do you want to use A3C, for faster training speed? We think other alternatives can be better. Can you offer mode details about your demand? |
Beta Was this translation helpful? Give feedback.
First of all, thank you very much for your question.
The key insight of value normalization is that neural networks can more easily fit normalized data. Regarding the principle and experimental results of value normalization,
We use the normalizaed value in critic loss calculation and use the original unormalized value in advantage calc…