Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extract inbound tcp and websocket connection flow to separate classes, unify retrying #773

Open
wants to merge 19 commits into
base: master
Choose a base branch
from

Conversation

Vlatombe
Copy link
Member

@Vlatombe Vlatombe commented Oct 25, 2024

Extracts Inbound TCP and websocket connection flows to separate classes, and unify the retrying logic, so that both Inbound TCP and Websocket use retries, with exponential retries.

Bumped requirement to Java 17, since we are past End-october, and records makes things pretttier.

Testing done

Submitter checklist

  • Make sure you are opening from a topic/feature/bugfix branch (right side) and not your main branch!
  • Ensure that the pull request title represents the desired changelog entry
  • Please describe what you did
  • Link to relevant issues in GitHub or Jira
  • Link to relevant pull requests, esp. upstream and downstream changes
  • Ensure you have provided tests - that demonstrates feature works or fixes the issue

A more ambitious attempt to extract logic from Engine into separate
classes
@Vlatombe Vlatombe requested a review from jglick October 25, 2024 09:43
@jglick
Copy link
Member

jglick commented Nov 4, 2024

incremental diff

Comment on lines +95 to +101
private final RSAPublicKey publicKey;
private final Map<String, String> headers;

public EngineJnlpConnectionStateListener(RSAPublicKey publicKey, Map<String, String> headers) {
this.publicKey = publicKey;
this.headers = headers;
}
Copy link
Member

@jglick jglick Nov 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be a record. (see #749)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Too bad JnlpConnectionStateListener is an abstract class (for no good reason)

@jglick
Copy link
Member

jglick commented Nov 4, 2024

Seems OK from a quick glance, though the diff is too long to readily see what is actually being edited.

@@ -146,7 +146,7 @@ public void error(Throwable t) {

private static class NoReconnectException extends RuntimeException {}

@Test(timeout = 30_000)
@Test(timeout = 10_000)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shortening timeout since the new retry logic is snappier

Comment on lines +95 to +101
private final RSAPublicKey publicKey;
private final Map<String, String> headers;

public EngineJnlpConnectionStateListener(RSAPublicKey publicKey, Map<String, String> headers) {
this.publicKey = publicKey;
this.headers = headers;
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Too bad JnlpConnectionStateListener is an abstract class (for no good reason)

@@ -269,7 +222,7 @@ public Engine(EngineListener listener, List<URL> hudsonUrls, String secretKey, S

public Engine(
EngineListener listener,
List<URL> hudsonUrls,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well... long overdue.

Comment on lines -547 to -585
SSLContext context;
// prepare our SSLContext
try {
context = SSLContext.getInstance("TLS");
} catch (NoSuchAlgorithmException e) {
throw new IllegalStateException("Java runtime specification requires support for TLS algorithm", e);
}
char[] password = "password".toCharArray();
KeyStore store;
try {
store = KeyStore.getInstance(KeyStore.getDefaultType());
} catch (KeyStoreException e) {
throw new IllegalStateException("Java runtime specification requires support for JKS key store", e);
}
try {
store.load(null, password);
} catch (NoSuchAlgorithmException e) {
throw new IllegalStateException("Java runtime specification requires support for JKS key store", e);
} catch (CertificateException e) {
throw new IllegalStateException("Empty keystore", e);
}
KeyManagerFactory kmf;
try {
kmf = KeyManagerFactory.getInstance(KeyManagerFactory.getDefaultAlgorithm());
} catch (NoSuchAlgorithmException e) {
throw new IllegalStateException(
"Java runtime specification requires support for default key manager", e);
}
try {
kmf.init(store, password);
} catch (KeyStoreException | NoSuchAlgorithmException | UnrecoverableKeyException e) {
throw new IllegalStateException(e);
}
try {
context.init(kmf.getKeyManagers(), new TrustManager[] {agentTrustManager}, null);
} catch (KeyManagementException e) {
events.error(e);
return;
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to SSLUtils

@SuppressFBWarnings(
value = {"REC_CATCH_EXCEPTION"},
justification = "checked exceptions were a mistake to begin with; connecting to Jenkins from agent")
private void runWebSocket() {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to WebSocketConnector

Comment on lines -861 to -911
private static class ExponentialRetry {
final int factor;
final Instant beginning;
final Duration delay;
final Duration timeout;
final Duration incrementDelay;
final Duration maxDelay;

ExponentialRetry(Duration timeout) {
this(Duration.ofSeconds(0), timeout, 2, Duration.ofSeconds(1), Duration.ofSeconds(10));
}

ExponentialRetry(
Duration initialDelay, Duration timeout, int factor, Duration incrementDelay, Duration maxDelay) {
this.beginning = Instant.now();
this.delay = initialDelay;
this.timeout = timeout;
this.factor = factor;
this.incrementDelay = incrementDelay;
this.maxDelay = maxDelay;
}

ExponentialRetry(ExponentialRetry previous) {
beginning = previous.beginning;
factor = previous.factor;
timeout = previous.timeout;
incrementDelay = previous.incrementDelay;
maxDelay = previous.maxDelay;
delay = min(maxDelay, previous.delay.multipliedBy(previous.factor).plus(incrementDelay));
}

private static Duration min(Duration a, Duration b) {
return a.compareTo(b) < 0 ? a : b;
}

boolean timeoutExceeded() {
return Util.shouldBailOut(beginning, timeout);
}

ExponentialRetry next(EngineListenerSplitter events) throws InterruptedException {
var next = new ExponentialRetry(this);
if (next.timeoutExceeded()) {
events.status("Bailing out after " + DurationFormatter.format(next.timeout));
return null;
} else {
events.status("Waiting " + DurationFormatter.format(next.delay) + " before retry");
Thread.sleep(next.delay.toMillis());
}
return next;
}
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to RetryUtils

}
}

private void innerRun(IOHub hub, SSLContext context, ExecutorService service) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to InboundTCPConnector

@@ -1156,139 +585,13 @@ public static Engine current() {

private static final Logger LOGGER = Logger.getLogger(Engine.class.getName());

@SuppressFBWarnings(value = "PATH_TRAVERSAL_IN", justification = "File path is loaded from system properties.")
static KeyStore getCacertsKeyStore()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to SSLUtils

@@ -1312,53 +615,4 @@ public String getAgentName() {
public String getProtocolName() {
return this.protocolName;
}

private class EngineJnlpConnectionStateListener extends JnlpConnectionStateListener {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to InboundTCPConnector

}
value = {"HARD_CODE_PASSWORD", "REC_CATCH_EXCEPTION"},
justification = "Password doesn't need to be protected.; We need to catch all exceptions")
public void run() {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method contains the key structural changes.

@Vlatombe Vlatombe marked this pull request as ready for review November 5, 2024 09:56
Comment on lines +162 to +172
var clientProtocols = new JnlpProtocolHandlerFactory(data.executor())
.withIOHub(hub)
.withSSLContext(context)
.withPreferNonBlockingIO(false) // we only have one connection, prefer blocking I/O
.handlers();
var negotiatedProtocols = clientProtocols.stream()
.filter(JnlpProtocolHandler::isEnabled)
.filter(p -> endpoint.isProtocolSupported(p.getName()))
.collect(Collectors.toSet());
var serverProtocols = endpoint.getProtocols() == null ? "?" : String.join(",", endpoint.getProtocols());
LOGGER.info(buildDebugProtocolsMessage(serverProtocols, clientProtocols, negotiatedProtocols));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gives clearer output to users on which protocols are supported on either side of the connection.

Copy link
Member

@jglick jglick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK AFAICT

pom.xml Show resolved Hide resolved
pom.xml Outdated Show resolved Hide resolved
@timja timja added the enhancement For changelog: An enhancement providing new capability. label Nov 6, 2024
@basil
Copy link
Member

basil commented Nov 6, 2024

See this document for a discussion of various retry strategies. Exponential backoff does appear to be a strict improvement over the status quo, as it balances retries against latency when the controller is unavailable for an unknown duration; however, in contrast with jitter-based solutions, it does not resolve contention for the controller. As in #676, testing of various retry strategies in real-world scenarios is needed to evaluate the effectiveness of those strategies. I would be fine with this PR if it was a simple refactoring that did not introduce exponential backoff. I would also be fine with introducing exponential backoff it was confirmed to have a positive impact in a functional test scenario.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement For changelog: An enhancement providing new capability.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants