[BUG] Batch inference deviates from single state inference #434

jdesman1 · 2024-11-27T11:56:00Z

Describe the bug
Thanks @takuseno for creating such a great tool for offline RL. However, recently I have encountered an issue where running batch inference produces difference results from inference on single states. i.e.:

Let ep.observations = states be a (100, 10) (observation, state dimension) array:

batch_infer = model.predict(ep.observations)
list_infer = [model.predict(np.expand_dims(s, axis=0))[0] for s in ep.observations)

batch_infer and list_infer produce different results across a number of models I have tested.

See reproduction code below. This issue emerged while we were running inference on a proprietary dataset where there is appreciable and meaningful difference when running inference on every state versus batch inference. I cannot provide this dataset, but please see below for a minimal example illustrating unexpected differences when running batch vs single state inference.

To Reproduce

import numpy as np
from d3rlpy.datasets import get_pendulum
from d3rlpy.algos import CQLConfig

# Setup
dataset, env = get_pendulum()
model = CQLConfig().create()
model.build_with_dataset(dataset)
model.fit(
    dataset,
    n_steps=100,
    n_steps_per_epoch=10
)

# Use only a single episode for reproducibility
ep = dataset.episodes[0]

# Infer actions
batch_infer = model.predict(ep.observations)
list_infer = [model.predict(np.expand_dims(s, axis=0))[0] for s in ep.observations]
batch_infer = np.array(batch_infer)
list_infer = np.array(list_infer)
diff = batch_infer - list_infer
print(diff)            ### Produces non-zero values! ###

# Infer values
batch_infer = model.predict_value(ep.observations, ep.actions)
list_infer = [model.predict_value(np.expand_dims(s, axis=0), np.expand_dims(a, axis=0))[0]
              for s, a in zip(ep.observations, ep.actions)]
batch_infer = np.array(batch_infer)
list_infer = np.array(list_infer)
diff = batch_infer - list_infer
print(diff)            ### Produces non-zero values! ###

Expected behavior
diff should be 0.

Additional context
In the tutorials section of the documentation, we see single state inference: tutorial

However in the algorithm section, batch inference seems to be supported: interface documentation

The text was updated successfully, but these errors were encountered:

jdesman1 added the bug Something isn't working label Nov 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Batch inference deviates from single state inference #434

[BUG] Batch inference deviates from single state inference #434

jdesman1 commented Nov 27, 2024 •

edited

Loading

[BUG] Batch inference deviates from single state inference #434

[BUG] Batch inference deviates from single state inference #434

Comments

jdesman1 commented Nov 27, 2024 • edited Loading

jdesman1 commented Nov 27, 2024 •

edited

Loading