To run all experiments and plot figures for the report, run
bash run_all.sh
The predictions are the most inaccurate for state 17. The state changes almost monotonically for dimension 17 and the error accumulates more dramatically for later steps, while the errors at different directions are cancelled out to some extend for those fluctuating dimensions.
Return | Random Policy | Trained Policy |
---|---|---|
ReturnAvg | -169.525 | 54.1579 |
ReturnStd | 38.9955 | 23.4211 |