Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Batch inference deviates from single state inference #434

Open
jdesman1 opened this issue Nov 27, 2024 · 0 comments
Open

[BUG] Batch inference deviates from single state inference #434

jdesman1 opened this issue Nov 27, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@jdesman1
Copy link

jdesman1 commented Nov 27, 2024

Describe the bug
Thanks @takuseno for creating such a great tool for offline RL. However, recently I have encountered an issue where running batch inference produces difference results from inference on single states. i.e.:

Let ep.observations = states be a (100, 10) (observation, state dimension) array:

batch_infer = model.predict(ep.observations)
list_infer = [model.predict(np.expand_dims(s, axis=0))[0] for s in ep.observations)

batch_infer and list_infer produce different results across a number of models I have tested.

See reproduction code below. This issue emerged while we were running inference on a proprietary dataset where there is appreciable and meaningful difference when running inference on every state versus batch inference. I cannot provide this dataset, but please see below for a minimal example illustrating unexpected differences when running batch vs single state inference.

To Reproduce

import numpy as np
from d3rlpy.datasets import get_pendulum
from d3rlpy.algos import CQLConfig

# Setup
dataset, env = get_pendulum()
model = CQLConfig().create()
model.build_with_dataset(dataset)
model.fit(
    dataset,
    n_steps=100,
    n_steps_per_epoch=10
)

# Use only a single episode for reproducibility
ep = dataset.episodes[0]

# Infer actions
batch_infer = model.predict(ep.observations)
list_infer = [model.predict(np.expand_dims(s, axis=0))[0] for s in ep.observations]
batch_infer = np.array(batch_infer)
list_infer = np.array(list_infer)
diff = batch_infer - list_infer
print(diff)            ### Produces non-zero values! ###

# Infer values
batch_infer = model.predict_value(ep.observations, ep.actions)
list_infer = [model.predict_value(np.expand_dims(s, axis=0), np.expand_dims(a, axis=0))[0]
              for s, a in zip(ep.observations, ep.actions)]
batch_infer = np.array(batch_infer)
list_infer = np.array(list_infer)
diff = batch_infer - list_infer
print(diff)            ### Produces non-zero values! ###

Expected behavior
diff should be 0.

Additional context
In the tutorials section of the documentation, we see single state inference: tutorial

However in the algorithm section, batch inference seems to be supported: interface documentation

@jdesman1 jdesman1 added the bug Something isn't working label Nov 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant