You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm build a offline RL model with custom collected logs data.
I'm not sure how to understand trained model performance, one way is to add evaluators such as TDErrorEvaluator , the other way is to train a other d3rlpy.ope.FQE with test dataset to see soft_opc or other metrics.
considering theses two way is all do with the test dataset and calc some metric, which one should I take to value the model?
@takuseno I have similar dubts as the one of @ericyue . Could you please elaborate a little bit more, the papers are interesting but very theoretical could you provide more practical example?
I'm build a offline RL model with custom collected logs data.
I'm not sure how to understand trained model performance, one way is to add evaluators such as TDErrorEvaluator , the other way is to train a other d3rlpy.ope.FQE with test dataset to see soft_opc or other metrics.
considering theses two way is all do with the test dataset and calc some metric, which one should I take to value the model?
model = d3rlpy.algos.BCQConfig(xxxx)
ret = model.fit(
train_dataset,
n_steps=N_STEPS,
n_steps_per_epoch=N_STEPS_PER_EPOCH,
logger_adapter=logger_adapter,
save_interval = 10,
evaluators={
'test_td_error': TDErrorEvaluator(episodes=test_dataset.episodes),
'test_value_scale': AverageValueEstimationEvaluator(episodes=test_dataset.episodes),
"test_init_value": InitialStateValueEstimationEvaluator(episodes=test_dataset.episodes),
}
)
The text was updated successfully, but these errors were encountered: