You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to train a reinforcement learning algorithm to control basal rate using the given gym environment. The problem with reward function is it encourages as short episode as possible. I have tried different algorithms and hyper parameters variations. But the policy always learns to either output 0 or max basal value. To avoid accumulating any more penalty because of the long episode. Can the reward function be improved somehow?
The text was updated successfully, but these errors were encountered:
The default reward function is not intended to give you a nice reward (especially long-term reward), and you are supposed to define your own reward function.
But for the prosperity, it will be nice if anyone could share their insights and their carefully designed reward functions here. I could collect them and put them in the documentation for visibility (of course show your name to give you the credit).
I am trying to train a reinforcement learning algorithm to control basal rate using the given gym environment. The problem with reward function is it encourages as short episode as possible. I have tried different algorithms and hyper parameters variations. But the policy always learns to either output 0 or max basal value. To avoid accumulating any more penalty because of the long episode. Can the reward function be improved somehow?
The text was updated successfully, but these errors were encountered: