You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
Thanks @takuseno for creating such a great tool for offline RL. However, recently I have encountered an issue where running batch inference produces difference results from inference on single states. i.e.:
Let ep.observations = states be a (100, 10) (observation, state dimension) array:
batch_infer = model.predict(ep.observations)
list_infer = [model.predict(np.expand_dims(s, axis=0))[0] for s in ep.observations)
batch_infer and list_infer produce different results across a number of models I have tested.
See reproduction code below. This issue emerged while we were running inference on a proprietary dataset where there is appreciable and meaningful difference when running inference on every state versus batch inference. I cannot provide this dataset, but please see below for a minimal example illustrating unexpected differences when running batch vs single state inference.
To Reproduce
import numpy as np
from d3rlpy.datasets import get_pendulum
from d3rlpy.algos import CQLConfig
# Setup
dataset, env = get_pendulum()
model = CQLConfig().create()
model.build_with_dataset(dataset)
model.fit(
dataset,
n_steps=100,
n_steps_per_epoch=10
)
# Use only a single episode for reproducibility
ep = dataset.episodes[0]
# Infer actions
batch_infer = model.predict(ep.observations)
list_infer = [model.predict(np.expand_dims(s, axis=0))[0] for s in ep.observations]
batch_infer = np.array(batch_infer)
list_infer = np.array(list_infer)
diff = batch_infer - list_infer
print(diff) ### Produces non-zero values! ###
# Infer values
batch_infer = model.predict_value(ep.observations, ep.actions)
list_infer = [model.predict_value(np.expand_dims(s, axis=0), np.expand_dims(a, axis=0))[0]
for s, a in zip(ep.observations, ep.actions)]
batch_infer = np.array(batch_infer)
list_infer = np.array(list_infer)
diff = batch_infer - list_infer
print(diff) ### Produces non-zero values! ###
Expected behavior diff should be 0.
Additional context
In the tutorials section of the documentation, we see single state inference: tutorial
However in the algorithm section, batch inference seems to be supported: interface documentation
The text was updated successfully, but these errors were encountered:
Describe the bug
Thanks @takuseno for creating such a great tool for offline RL. However, recently I have encountered an issue where running batch inference produces difference results from inference on single states. i.e.:
Let
ep.observations
=states
be a (100, 10) (observation, state dimension) array:batch_infer
andlist_infer
produce different results across a number of models I have tested.See reproduction code below. This issue emerged while we were running inference on a proprietary dataset where there is appreciable and meaningful difference when running inference on every state versus batch inference. I cannot provide this dataset, but please see below for a minimal example illustrating unexpected differences when running batch vs single state inference.
To Reproduce
Expected behavior
diff
should be 0.Additional context
In the tutorials section of the documentation, we see single state inference: tutorial
However in the algorithm section, batch inference seems to be supported: interface documentation
The text was updated successfully, but these errors were encountered: