Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expand non-fatal JNLP agent endpoint resolution to do retries #628

Closed
wants to merge 6 commits into from

Conversation

sboardwell
Copy link

@sboardwell sboardwell commented Feb 24, 2023

What?

This PR adds two new system properties which can be used to further configure the retry attempts when using the nonFatalJnlpAgentEndpointResolutionExceptions property.

  • nonFatalJnlpAgentEndpointResolutionExceptionsMaxRetries
    the number of retries before failing (defaults to 0 for backwards compatibility with the swarm plugin)
  • nonFatalJnlpAgentEndpointResolutionExceptionsIntervalInMillis
    the interval in milliseconds before attempt to reconnect (defaults to 5000)

Why?

The nonFatalJnlpAgentEndpointResolutionExceptions is currently used in the swarm plugin. The swarm plugin manages its own retry strategy.

In some envrionments with a flaky network, the JNLP endpoint resolution might simple fail due to a network issue. With standard incoming agents, the agent connection would fail at the first attempt and the build would fail.

These additional properties allow the configuration of retries in order to handle a flaky network.

References

Follow-up #449, and initial fix before #608 hits.

  • Make sure you are opening from a topic/feature/bugfix branch (right side) and not your main branch!
  • Ensure that the pull request title represents the desired changelog entry
  • Please describe what you did
  • Link to relevant issues in GitHub or Jira
  • Link to relevant pull requests, esp. upstream and downstream changes
  • Ensure you have provided tests - that demonstrates feature works or fixes the issue

Follow-up jenkinsci#449

When using `hudson.remoting.Engine.nonFatalJnlpAgentEndpointResolutionExceptions`,
the failure to resolve JNLP endpoint is considered non-fatal, so there should be retries handled by remoting.

This can help with cases where a flaky network causes initial JNLP
endpoint lookup to fail.

This commit adds two new system properties to additionally configure the retry attempts.

- `nonFatalJnlpAgentResolutionExceptionsMaxRetries` the number of
  retries before failing
- `nonFatalJnlpAgentResolutionExceptionsIntervalInMillis` - the
  interval in milliseconds before attempt to reconnect
@sboardwell
Copy link
Author

@basil / @Vlatombe - can you have a look at this please? This would be a non-invasive solution until a fully fledged solution is discussed and finally implemented in #608.

@sboardwell
Copy link
Author

@basil - one thing here would be that the default retry would be 2 (so 3 attempts in all) with an interval of 5 seconds in between each. Is that okay for your implementation, or would you like a -1 value for infinite retries?

@basil
Copy link
Member

basil commented Feb 27, 2023

So you are proposing a change in behavior and asking me if it would affect an existing consumer? The source code for that existing consumer is available at https://github.com/jenkinsci/swarm-plugin so it should be possible to judge for oneself.

Whatever your use case may be, it is not described clearly in this PR. That would be enough to preclude the review of this PR even if this PR was not a self-described temporary workaround.

@sboardwell
Copy link
Author

I fully deserved that 😅.

Thank you @basil - I will make the the appropriate changes and come back to you.

@sboardwell
Copy link
Author

Hi @basil, is this okay?

@basil
Copy link
Member

basil commented Mar 14, 2023

Not sure we want to commit to this API.

@sboardwell
Copy link
Author

This would just be a stop gap until a long-term solution is found. It is non-breaking so I would hope that it would be okay.

@sboardwell
Copy link
Author

Superceded by #675

@sboardwell sboardwell closed this Oct 12, 2023
@sboardwell sboardwell deleted the add-retry-options branch October 12, 2023 09:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants