CS294-112 HW 4: Model-Based Reinforcement Learning

Usage

To run all experiments and plot figures for the report, run

bash run_all.sh

Results

Problem 1

(a)

(b)

The predictions are the most inaccurate for state 17. The state changes almost monotonically for dimension 17 and the error accumulates more dramatically for later steps, while the errors at different directions are cancelled out to some extend for those fluctuating dimensions.

Problem 2

(a)

Return	Random Policy	Trained Policy
ReturnAvg	-169.525	54.1579
ReturnStd	38.9955	23.4211

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

CS294-112 HW 4: Model-Based Reinforcement Learning

Usage

Results

Problem 1

(a)

(b)

Problem 2

(a)

Problem 3a

(a)

Problem 3b

(a)

(b)

(c)

Files

README.md

Latest commit

History

README.md

File metadata and controls

CS294-112 HW 4: Model-Based Reinforcement Learning

Usage

Results

Problem 1

(a)

(b)

Problem 2

(a)

Problem 3a

(a)

Problem 3b

(a)

(b)

(c)