[QUESTION] difference between evaluators and OPE ? #380

ericyue · 2024-02-19T16:27:41Z

I'm build a offline RL model with custom collected logs data.
I'm not sure how to understand trained model performance, one way is to add evaluators such as TDErrorEvaluator , the other way is to train a other d3rlpy.ope.FQE with test dataset to see soft_opc or other metrics.
considering theses two way is all do with the test dataset and calc some metric, which one should I take to value the model?

model = d3rlpy.algos.BCQConfig(xxxx)
ret = model.fit(
train_dataset,
n_steps=N_STEPS,
n_steps_per_epoch=N_STEPS_PER_EPOCH,
logger_adapter=logger_adapter,
save_interval = 10,
evaluators={
'test_td_error': TDErrorEvaluator(episodes=test_dataset.episodes),
'test_value_scale': AverageValueEstimationEvaluator(episodes=test_dataset.episodes),
"test_init_value": InitialStateValueEstimationEvaluator(episodes=test_dataset.episodes),
}
)

takuseno · 2024-03-03T11:41:51Z

@ericyue Hi, thanks for the issue. I would redirect you to the following papers since this is a general offline RL question.

LorenzoBottaccioli · 2024-05-03T10:18:27Z

@takuseno I have similar dubts as the one of @ericyue . Could you please elaborate a little bit more, the papers are interesting but very theoretical could you provide more practical example?

ericyue added the enhancement New feature or request label Feb 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QUESTION] difference between evaluators and OPE ? #380

[QUESTION] difference between evaluators and OPE ? #380

ericyue commented Feb 19, 2024 •

edited

Loading

takuseno commented Mar 3, 2024

LorenzoBottaccioli commented May 3, 2024

[QUESTION] difference between evaluators and OPE ? #380

[QUESTION] difference between evaluators and OPE ? #380

Comments

ericyue commented Feb 19, 2024 • edited Loading

takuseno commented Mar 3, 2024

LorenzoBottaccioli commented May 3, 2024

ericyue commented Feb 19, 2024 •

edited

Loading