Dreamer #151

Rohan138 · 2022-06-05T06:07:54Z

Types of changes

Still a WIP, but I've managed to add most of Dreamer. The main thing left is computing the loss in planning/dreamer_agent/DreamerAgent::train().

Docs change / refactoring / dependency upgrade
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)

Motivation and Context / Related issue

How Has This Been Tested (if it applies)

Checklist

The documentation is up-to-date with the changes I made.
I have read the CONTRIBUTING document and completed the CLA (see CONTRIBUTING).
All tests passed, and additional code has been covered with new tests.

luisenp · 2022-06-10T14:51:57Z

Hi @Rohan138, took a quick high-level glance and so far it looks good. I will start looking at the code in algorithms/dreamer and the DreamerAgent more closely now.

I noticed you added some changes to other files like the pre-commit config, the requirements, pyproject.toml, etc. Were you running into some errors? If that's the case, would you mind opening a separate PR for these?

luisenp · 2022-06-10T15:11:02Z

@Rohan138 trying to pull from your fork as shown below, but I run into access permission errors. Could you see if I can get read access?

git checkout -b Rohan138-dreamer main
git pull [email protected]:Rohan138/mbrl-lib.git dreamer

Rohan138 · 2022-06-10T15:19:06Z

I can add you to my fork, but do you want to add my repo as a remote and try git checkout -t Rohan138/dreamer?

luisenp · 2022-06-10T18:49:48Z

git checkout -t Rohan138/dreamer

I tried a couple of things and I got errors:

Rohan138 · 2022-06-10T18:59:21Z

Just sent you a contributor invite; maybe https://github.com/Rohan138/mbrl-lib.git would've worked better than git@github?

luisenp · 2022-06-10T19:02:05Z

HTTP worked for me before accepting the invitation, thanks!

Rohan138 · 2022-06-11T19:34:12Z

On the non-Dreamer fixes:

I renamed pyproyect.toml to pyproject.toml
I removed the lines pinning python==3.7 on all commits
There's no black version 21.4b2 on pypi, so I upgraded to the latest stable version instead

I can definitely move these to a different PR, though.

luisenp · 2022-06-13T20:37:39Z

On the non-Dreamer fixes:

I renamed pyproyect.toml to pyproject.toml

I removed the lines pinning python==3.7 on all commits

There's no black version 21.4b2 on pypi, so I upgraded to the latest stable version instead

I can definitely move these to a different PR, though.

Another PR for these would be great, so that we can merge them w/o waiting for this more involved PR to be ready. Thanks!

luisenp · 2022-06-15T15:11:09Z

@Rohan138 planning to spend most of today and then Friday playing around with your code. Is there anything in particular you'd like for me to focus on or help with? It seems I'm able to run Dreamer, but I haven't checked if it learns correctly yet. What's the current status?

Rohan138 · 2022-06-15T18:19:04Z

Hi! So I tried running it, but despite the losses dropping, it doesn't seem to learn right now. Here are the results; I'm planning to look through the agent.train() implementation over the weekend to figure out what's going on.

luisenp

Still looking into things, but left some initial comments. I'm also wondering about the way you are computing the value estimates in _compute_return, but need to look into this more carefully. Are you confident about that part of the code? I was thinking of maybe spending some time on Friday to write a utility function for this and maybe add some unit tests. Let me know if this would be useful.

luisenp · 2022-06-15T18:10:16Z

mbrl/planning/dreamer_agent.py

+                rewards,
+            ) = self.planet_model(obs[:, 1:], actions[:, :-1], rewards[:, :-1])
+
+            for epoch in range(num_epochs):


I think the following might be clearer, since it doesn't seem that you use beliefs and latents except for initializing the state for unrolling.

B, L, _ = beliefs.shape for epoch in range(num_epochs): imag_beliefs = [] imag_latents = [] imag_actions = [] imag_rewards = [] states = { "belief": beliefs.reshape(B * L, -1), "latent": latents.reshape(B * L, -1), } for i in range(self.horizon): ...

luisenp · 2022-06-15T18:11:23Z

mbrl/planning/dreamer_agent.py

+                    imag_rewards.append(rewards)
+
+                # I x (B*L) x _
+                imag_beliefs = torch.stack(imag_beliefs).to(self.device)


Is to(self.device) necessary? These are computed from tensors that should already be on self.device at this point.

luisenp · 2022-06-15T18:12:30Z

mbrl/planning/dreamer_agent.py

+                imag_beliefs = torch.stack(imag_beliefs).to(self.device)
+                imag_latents = torch.stack(imag_latents).to(self.device)
+                imag_actions = torch.stack(imag_actions).to(self.device)
+                freeze(self.critic)


Curious about the use of freeze(self.critic) instead of with torch.no_grad, since the next line only calls the critic and no other parameters are being used.

luisenp · 2022-06-15T18:34:07Z

mbrl/planning/dreamer_agent.py

+        """
+        next_values = torch.cat([value[1:], bootstrap[None]], 0)
+        target = reward + discount * next_values * (1 - lambda_)
+        timesteps = list(range(reward.shape[0] - 1, -1, -1))


No need to make list here, since you only need the iterator.

luisenp · 2022-06-15T20:17:53Z

mbrl/algorithms/dreamer.py

+        trainer.train(
+            dataset, num_epochs=1, batch_callback=model_batch_callback, evaluate=False
+        )
+        agent.train(dataset, num_epochs=1, batch_callback=agent_batch_callback)


I'm wondering if we should be passing a different iterator for training the agent. If I understand the paper correctly, the Dreamer agent is trained on trajectories whose start states are sampled from the experience buffer, but where all subsequent states are obtained by rolling out the model. In this case, we only need to sample individual transitions to get start states, and not full sequences, which is what dataset would return here.

If what I said above is correct, then maybe the cleanest would be to modify DreamerAgent.train() to directly receive replay_buffer and also an additional parameter called num_updates. Then the agent train code can loop num_updates times , each time doing 1) replay_buffer.sample(batch_size), 2) roll out the planet model with a batch of start states, 3) update the agent parameters.

Does the above make sense? Let me know if I'm missing something or if anything is unclear. I guess your current code is serving more data to the Dreamer agent, but seems like it'd be easier to make a mistake with the current implementation?

I'm wondering if we should be passing a different iterator for training the agent. If I understand the paper correctly, the Dreamer agent is trained on trajectories whose start states are sampled from the experience buffer, but where all subsequent states are obtained by rolling out the model. In this case, we only need to sample individual transitions to get start states, and not full sequences, which is what dataset would return here.

I might have misunderstood the paper, but I'm not sure this is correct. In Algorithm 1 (Page 3), they:

Draw B data sequences or episodes {(a_t, o_t, r_t)} ~_{t=k}^{k+L}. Here k is the outer variable looping over episodes, while t is the inner variable looping over timesteps in an episode.

Compute model states s_t for all t in [k, k + L) for all k in B using the RSSM transition model.

Imagine trajectories {s_\tau, a_\tau}_{\tau = t}^{\tau = t + H} from each state s_t in B, not just the initial state s_k in each episode.

I'm not sure if this explanation was clear, and I'll take another look at the prior implementations linked in the other comment to confirm.

We do have a minor divergence+performance hit currently-Instead of computing the model states just once as in the paper and references, we're running the forward+backprop on the model in model.train(), then running the forward pass again in self.planet_model._process_batch(...) in agent.train(). I haven't figured out a way to cleanly fix this yet-maybe return the states from model.train()? Or append them to the TransitionIterator somehow?

Looking at the paper again, I think your interpretation is correct because the Compute model states step occurs for all sampled o_t, and then trajectories are imagined for all model states s_t. I find it a bit confusing how they are using the index k; I guess this increasing in increments of size L? That is, the j-th trajectory goes from t=L*(j-1) to L*j - 1? In any case, confirming with prior implementations is a good idea.

Regarding the performance hit, one idea that wouldn't require a lot of changes would be to add get/set methods for random state of the iterator, so that we can have it return the same set of samples both for the model and agent loops. We should then be able to use the model trainer callback to store all computed model states, and pass them to the agent trainer in the correct order.

Does that make sense?

luisenp · 2022-06-15T20:23:57Z

mbrl/examples/conf/algorithm/dreamer.yaml

+action_noise_std: 0.3
+test_frequency: 25
+num_episodes: 1000
+dataset_size: 1000000


Missing newline at end of file.

natolambert · 2022-08-02T21:30:54Z

I'm thinking about starting a project that builds off the Dreamer style of dynamics model. If I spend a couple hours here and there on this PR, what would be most useful?

Rohan138 · 2022-08-03T06:13:22Z

Sorry for the delay-I'll try to address all of the comments across the next day or so. I moved the non-Dreamer fixes to #161.

The compute_return is taken from this PyTorch implementation, which takes it from the original TF1 code. A unit test for it is probably a good idea, though.

@natolambert-The core dreamer implementation is in the DreamerAgent.train(...) function here. If you'd like, could you look through it?

…nto main

Signed-off-by: Rohan138 <[email protected]>

natolambert · 2022-11-29T22:21:08Z

@luisenp @Rohan138 -- is there anything I or @RajGhugare19 can do to get this moving again?

luisenp · 2022-11-30T13:31:27Z

@luisenp @Rohan138 -- is there anything I or @RajGhugare19 can do to get this moving again?

Hi @natolambert. Unfortunately, it's almost impossible for me at this point to take the lead in development, due to other more pressing commitments. But I'm happy to support with reviews, general advice, and some amount of coding, if someone else is willing to drive this feature to completion.

natolambert · 2022-11-30T17:31:10Z

Gotcha, so I'm guessing it's at the point where there are small issues and need to verify performance? @luisenp
I just wanted to make sure there wasn't any more issues / blocking problems I missed skimming it.

luisenp · 2022-11-30T18:05:01Z

Gotcha, so I'm guessing it's at the point where there are small issues and need to verify performance? @luisenp I just wanted to make sure there wasn't any more issues / blocking problems I missed skimming it.

There were some comments I left early that I'm not sure were addressed (mostly high level stuff). But leaving that aside I don't think the implementation was fully working yet, @Rohan138 would have more details though.

natolambert · 2022-11-30T18:08:03Z

Great. I want to take a look, and I have chatted with @danijar who didn't know it was being worked on. Let's see if I can un-stick it and if needed talk to Danijar.

RajGhugare19 · 2022-12-01T05:06:07Z

Hello @natolambert -- I can take a lead developing this. You can review and sanity check the code afterwards. I will take a deeper look at the code and what changes are still required today.

Rohan138 · 2022-12-01T14:59:36Z

Pitching in-I can help answer questions and debug the implementation over the weekend. The main function I'm unsure about is the DreamerAgent.train(...) function here.

There's also some minor conflicts due to gym versions and gym's type checking that seem to be breaking CI; there's an open PR #161 here.

veds12 and others added 11 commits February 14, 2022 11:57

added skeleton

42cbfbc

Merge https://github.com/veds12/mbrl-lib into dreamer

51f4c21

update gym and test commit hooks

52218db

Initial commit

3fb3ee9

pre-commit fixes

630c1ce

fixed pyproject.toml

d591586

dreamer core; bug fixes

85e7c14

dtype fix

fd38c62

remove breakpoint

f44d500

working on config

6b845fb

wip

a6e6a2a

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 5, 2022

Rohan138 mentioned this pull request Jun 5, 2022

[Feature Request] Support for Dreamers #143

Open

Rohan138 added 3 commits June 5, 2022 23:57

Finish dreamer loss

4852125

Add Dreamer to README

d54a0e1

Added config yamls

64441ca

rename pyproject

3ffcf2d

Rohan138 added 4 commits June 11, 2022 22:49

Make saving replay buffer optional

656a6ce

drop deprecation test

03d383d

Fix num_grad_updates

da7f83c

Add policy and critic loss to metrics

3b4a00c

Freeze planet during dreamer train

68d55f5

luisenp suggested changes Jun 15, 2022

View reviewed changes

Rohan138 added 5 commits August 2, 2022 23:15

Merge branch 'main' of https://github.com/facebookresearch/mbrl-lib i…

e9b7196

…nto main

Merge branch 'main' into dreamer

e37958b

wip

3fb69be

Signed-off-by: Rohan138 <[email protected]>

wip

3215e6e

Signed-off-by: Rohan138 <[email protected]>

wip

65cfa05

Signed-off-by: Rohan138 <[email protected]>

Rohan138 mentioned this pull request Aug 3, 2022

Misc updates to requirements and build files #161

Closed

7 tasks

Rohan138 added 5 commits August 2, 2022 23:46

wip

85447e6

Signed-off-by: Rohan138 <[email protected]>

wip

b5beaa4

Signed-off-by: Rohan138 <[email protected]>

wip

71b54eb

Signed-off-by: Rohan138 <[email protected]>

wip

ca1604b

Signed-off-by: Rohan138 <[email protected]>

Merge branch 'main' into dreamer

39c7845

wip

84bda8d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dreamer #151

Dreamer #151

Rohan138 commented Jun 5, 2022 •

edited

Loading

luisenp commented Jun 10, 2022

luisenp commented Jun 10, 2022

Rohan138 commented Jun 10, 2022

luisenp commented Jun 10, 2022

Rohan138 commented Jun 10, 2022

luisenp commented Jun 10, 2022

Rohan138 commented Jun 11, 2022

luisenp commented Jun 13, 2022

luisenp commented Jun 15, 2022

Rohan138 commented Jun 15, 2022

luisenp left a comment •

edited

Loading

luisenp Jun 15, 2022

luisenp Jun 15, 2022

luisenp Jun 15, 2022

luisenp Jun 15, 2022

luisenp Jun 15, 2022

Rohan138 Aug 3, 2022 •

edited

Loading

luisenp Aug 3, 2022

luisenp Jun 15, 2022

natolambert commented Aug 2, 2022

Rohan138 commented Aug 3, 2022 •

edited

Loading

natolambert commented Nov 29, 2022

luisenp commented Nov 30, 2022 •

edited

Loading

natolambert commented Nov 30, 2022

luisenp commented Nov 30, 2022

natolambert commented Nov 30, 2022

RajGhugare19 commented Dec 1, 2022

Rohan138 commented Dec 1, 2022 •

edited

Loading

Dreamer #151

Are you sure you want to change the base?

Dreamer #151

Conversation

Rohan138 commented Jun 5, 2022 • edited Loading

Types of changes

Motivation and Context / Related issue

How Has This Been Tested (if it applies)

Checklist

luisenp commented Jun 10, 2022

luisenp commented Jun 10, 2022

Rohan138 commented Jun 10, 2022

luisenp commented Jun 10, 2022

Rohan138 commented Jun 10, 2022

luisenp commented Jun 10, 2022

Rohan138 commented Jun 11, 2022

luisenp commented Jun 13, 2022

luisenp commented Jun 15, 2022

Rohan138 commented Jun 15, 2022

luisenp left a comment • edited Loading

Choose a reason for hiding this comment

luisenp Jun 15, 2022

Choose a reason for hiding this comment

luisenp Jun 15, 2022

Choose a reason for hiding this comment

luisenp Jun 15, 2022

Choose a reason for hiding this comment

luisenp Jun 15, 2022

Choose a reason for hiding this comment

luisenp Jun 15, 2022

Choose a reason for hiding this comment

Rohan138 Aug 3, 2022 • edited Loading

Choose a reason for hiding this comment

luisenp Aug 3, 2022

Choose a reason for hiding this comment

luisenp Jun 15, 2022

Choose a reason for hiding this comment

natolambert commented Aug 2, 2022

Rohan138 commented Aug 3, 2022 • edited Loading

natolambert commented Nov 29, 2022

luisenp commented Nov 30, 2022 • edited Loading

natolambert commented Nov 30, 2022

luisenp commented Nov 30, 2022

natolambert commented Nov 30, 2022

RajGhugare19 commented Dec 1, 2022

Rohan138 commented Dec 1, 2022 • edited Loading

Rohan138 commented Jun 5, 2022 •

edited

Loading

luisenp left a comment •

edited

Loading

Rohan138 Aug 3, 2022 •

edited

Loading

Rohan138 commented Aug 3, 2022 •

edited

Loading

luisenp commented Nov 30, 2022 •

edited

Loading

Rohan138 commented Dec 1, 2022 •

edited

Loading