The reward function encourages short episodes #40

MHamza-Y · 2021-11-13T11:26:48Z

I am trying to train a reinforcement learning algorithm to control basal rate using the given gym environment. The problem with reward function is it encourages as short episode as possible. I have tried different algorithms and hyper parameters variations. But the policy always learns to either output 0 or max basal value. To avoid accumulating any more penalty because of the long episode. Can the reward function be improved somehow?

lorenzobrigato · 2022-11-29T09:45:36Z

Any updates on this? Or solutions? I am also having some issues with different algorithms and hyper-parameters and experiencing similar behavior.

jxx123 · 2022-12-07T00:14:05Z

The documentation has a section showing how to use a custom reward function, https://github.com/jxx123/simglucose#openai-gym-usage, which serves exactly your purpose of tuning the reward function.

The default reward function is not intended to give you a nice reward (especially long-term reward), and you are supposed to define your own reward function.

But for the prosperity, it will be nice if anyone could share their insights and their carefully designed reward functions here. I could collect them and put them in the documentation for visibility (of course show your name to give you the credit).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The reward function encourages short episodes #40

The reward function encourages short episodes #40

MHamza-Y commented Nov 13, 2021 •

edited

Loading

lorenzobrigato commented Nov 29, 2022 •

edited

Loading

jxx123 commented Dec 7, 2022

The reward function encourages short episodes #40

The reward function encourages short episodes #40

Comments

MHamza-Y commented Nov 13, 2021 • edited Loading

lorenzobrigato commented Nov 29, 2022 • edited Loading

jxx123 commented Dec 7, 2022

MHamza-Y commented Nov 13, 2021 •

edited

Loading

lorenzobrigato commented Nov 29, 2022 •

edited

Loading