-
-
Notifications
You must be signed in to change notification settings - Fork 273
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expand non-fatal JNLP agent endpoint resolution to do retries #628
Conversation
Follow-up jenkinsci#449 When using `hudson.remoting.Engine.nonFatalJnlpAgentEndpointResolutionExceptions`, the failure to resolve JNLP endpoint is considered non-fatal, so there should be retries handled by remoting. This can help with cases where a flaky network causes initial JNLP endpoint lookup to fail. This commit adds two new system properties to additionally configure the retry attempts. - `nonFatalJnlpAgentResolutionExceptionsMaxRetries` the number of retries before failing - `nonFatalJnlpAgentResolutionExceptionsIntervalInMillis` - the interval in milliseconds before attempt to reconnect
@basil - one thing here would be that the default retry would be 2 (so 3 attempts in all) with an interval of 5 seconds in between each. Is that okay for your implementation, or would you like a |
So you are proposing a change in behavior and asking me if it would affect an existing consumer? The source code for that existing consumer is available at https://github.com/jenkinsci/swarm-plugin so it should be possible to judge for oneself. Whatever your use case may be, it is not described clearly in this PR. That would be enough to preclude the review of this PR even if this PR was not a self-described temporary workaround. |
I fully deserved that 😅. Thank you @basil - I will make the the appropriate changes and come back to you. |
Hi @basil, is this okay? |
Not sure we want to commit to this API. |
This would just be a stop gap until a long-term solution is found. It is non-breaking so I would hope that it would be okay. |
Superceded by #675 |
What?
This PR adds two new system properties which can be used to further configure the retry attempts when using the
nonFatalJnlpAgentEndpointResolutionExceptions
property.nonFatalJnlpAgentEndpointResolutionExceptionsMaxRetries
the number of retries before failing (defaults to 0 for backwards compatibility with the swarm plugin)
nonFatalJnlpAgentEndpointResolutionExceptionsIntervalInMillis
the interval in milliseconds before attempt to reconnect (defaults to 5000)
Why?
The
nonFatalJnlpAgentEndpointResolutionExceptions
is currently used in the swarm plugin. The swarm plugin manages its own retry strategy.In some envrionments with a flaky network, the JNLP endpoint resolution might simple fail due to a network issue. With standard incoming agents, the agent connection would fail at the first attempt and the build would fail.
These additional properties allow the configuration of retries in order to handle a flaky network.
References
Follow-up #449, and initial fix before #608 hits.