Skip to content

Latest commit

 

History

History
49 lines (38 loc) · 1.16 KB

README.md

File metadata and controls

49 lines (38 loc) · 1.16 KB

CS294-112 HW 4: Model-Based Reinforcement Learning

Usage

To run all experiments and plot figures for the report, run

bash run_all.sh

Results

Problem 1

(a)

(b)

The predictions are the most inaccurate for state 17. The state changes almost monotonically for dimension 17 and the error accumulates more dramatically for later steps, while the errors at different directions are cancelled out to some extend for those fluctuating dimensions.

Problem 2

(a)

Return Random Policy Trained Policy
ReturnAvg -169.525 54.1579
ReturnStd 38.9955 23.4211

Problem 3a

(a)

Problem 3b

(a)

(b)

(c)