Skip to content
This repository has been archived by the owner on Feb 12, 2024. It is now read-only.

Support Enable/Disable NiFi Zookeeper client Zookeeper Ensemble Tracker #294

Closed

Conversation

nathluu
Copy link
Contributor

@nathluu nathluu commented Apr 25, 2023

What this PR does / why we need it:

From NiFi 1.20.0 it supports configuration to enable/disable Zookeeper(zk) client zk Ensemble Tracker as of NIFI-10481.
With zk ensemble tracker enabled (default by NiFi), nifi zk client resolves the zk connection string to IP addresses then when all zk server pods get restarted and come up with a different IP, NiFi pods cannot reconnect to zk.

Which issue this PR fixes

(optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close that issue when PR gets merged)

  • fixes #

Special notes for your reviewer:

Checklist

[Place an '[x]' (no spaces) in all applicable fields. Please remove unrelated fields.]

  • DCO signed
  • Chart Version bumped
  • Variables are documented in the README.md

@banzo
Copy link
Contributor

banzo commented Apr 26, 2023

This is bumping NiFi to 1.20 also, nice, thanks!

Unfortunately one of the tests fails: https://github.com/cetic/helm-nifi/blob/253ad43b273075bf91165f320bac3794a2446d4f/.github/workflows/test-site-to-site.yml

2023-04-25 15:27:41,173 WARN [NiFi Site-to-Site Connection Pool Maintenance] o.a.n.r.SiteToSiteProvenanceReportingTask SiteToSiteProvenanceReportingTask[id=c131bf78-017e-1000-dbc1-a9b6b34a798b] Unable to refresh remote group peers due to: Connect to nifi.bravo.svc.cluster.local:8443 [nifi.bravo.svc.cluster.local/10.105.95.59] failed: Connection refused (Connection refused)
2023-04-25 15:27:41,186 WARN [Timer-Driven Process Thread-8] o.a.n.r.util.SiteToSiteRestApiClient Failed to get controller from https://nifi.bravo.svc.cluster.local:8443/nifi-api due to org.apache.http.conn.HttpHostConnectException: Connect to nifi.bravo.svc.cluster.local:8443 [nifi.bravo.svc.cluster.local/10.105.95.59] failed: Connection refused (Connection refused)
2023-04-25 15:27:41,186 ERROR [Timer-Driven Process Thread-8] o.a.n.r.SiteToSiteProvenanceReportingTask SiteToSiteProvenanceReportingTask[id=c131bf78-017e-1000-dbc1-a9b6b34a798b] Error running task SiteToSiteProvenanceReportingTask[id=c131bf78-017e-1000-dbc1-a9b6b34a798b] due to org.apache.nifi.processor.exception.ProcessException: Failed to send Provenance Events to destination due to IOException:Connect to nifi.bravo.svc.cluster.local:8443 [nifi.bravo.svc.cluster.local/10.105.95.59] failed: Connection refused (Connection refused)
2023-04-25 15:27:46,097 WARN [Timer-Driven Process Thread-4] o.a.n.r.util.SiteToSiteRestApiClient Failed to get controller from https://nifi.bravo.svc.cluster.local:8443/nifi-api due to org.apache.http.conn.HttpHostConnectException: Connect to nifi.bravo.svc.cluster.local:8443 [nifi.bravo.svc.cluster.local/10.105.95.59] failed: Connection refused (Connection refused)
2023-04-25 15:27:46,098 ERROR [Timer-Driven Process Thread-4] o.a.nifi.remote.StandardRemoteGroupPort RemoteGroupPort[name=BravoInput,targets=https://nifi.bravo.svc.cluster.local:8443/nifi] failed to communicate with https://nifi.bravo.svc.cluster.local:8443/nifi due to org.apache.nifi.remote.exception.UnreachableClusterException: Unable to refresh details from any of the configured remote instances.
2023-04-25 15:27:46,134 ERROR [Timer-Driven Process Thread-4] o.a.nifi.remote.StandardRemoteGroupPort RemoteGroupPort[name=BravoInput,targets=https://nifi.bravo.svc.cluster.local:8443/nifi] Processing failed
org.apache.nifi.processor.exception.ProcessException: org.apache.nifi.remote.exception.UnreachableClusterException: Unable to refresh details from any of the configured remote instances.
	at org.apache.nifi.remote.StandardRemoteGroupPort.onTrigger(StandardRemoteGroupPort.java:244)
	at org.apache.nifi.controller.AbstractPort.onTrigger(AbstractPort.java:255)
	at org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:246)
	at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:102)
	at org.apache.nifi.engine.FlowEngine$2.run(FlowEngine.java:110)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
	at java.base/java.util.concurrent.FutureTask.runAndReset(Unknown Source)
	at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	at java.base/java.lang.Thread.run(Unknown Source)
Caused by: org.apache.nifi.remote.exception.UnreachableClusterException: Unable to refresh details from any of the configured remote instances.
	at org.apache.nifi.remote.client.socket.EndpointConnectionPool.getEndpointConnection(EndpointConnectionPool.java:154)
	at org.apache.nifi.remote.client.socket.SocketClient.createTransaction(SocketClient.java:127)
	at org.apache.nifi.remote.StandardRemoteGroupPort.onTrigger(StandardRemoteGroupPort.java:224)
	... 10 common frames omitted
Caused by: org.apache.http.conn.HttpHostConnectException: Connect to nifi.bravo.svc.cluster.local:8443 [nifi.bravo.svc.cluster.local/10.105.95.59] failed: Connection refused (Connection refused)
	at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:156)
	at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:376)
	at org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:393)
	at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:236)
	at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186)
	at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89)
	at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110)
	at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:108)
	at org.apache.nifi.remote.util.SiteToSiteRestApiClient.execute(SiteToSiteRestApiClient.java:1169)
	at org.apache.nifi.remote.util.SiteToSiteRestApiClient.execute(SiteToSiteRestApiClient.java:1211)
	at org.apache.nifi.remote.util.SiteToSiteRestApiClient.fetchController(SiteToSiteRestApiClient.java:417)
	at org.apache.nifi.remote.util.SiteToSiteRestApiClient.getController(SiteToSiteRestApiClient.java:392)
	at org.apache.nifi.remote.util.SiteToSiteRestApiClient.getController(SiteToSiteRestApiClient.java:359)
	at org.apache.nifi.remote.client.SiteInfoProvider.refreshRemoteInfo(SiteInfoProvider.java:69)
	at org.apache.nifi.remote.client.SiteInfoProvider.getActiveClusterUrl(SiteInfoProvider.java:247)
	at org.apache.nifi.remote.client.socket.EndpointConnectionPool.getEndpointConnection(EndpointConnectionPool.java:152)
	... 12 common frames omitted
Caused by: java.net.ConnectException: Connection refused (Connection refused)
	at java.base/java.net.PlainSocketImpl.socketConnect(Native Method)
	at java.base/java.net.AbstractPlainSocketImpl.doConnect(Unknown Source)
	at java.base/java.net.AbstractPlainSocketImpl.connectToAddress(Unknown Source)
	at java.base/java.net.AbstractPlainSocketImpl.connect(Unknown Source)
	at java.base/java.net.SocksSocketImpl.connect(Unknown Source)
	at java.base/java.net.Socket.connect(Unknown Source)
	at org.apache.http.conn.ssl.SSLConnectionSocketFactory.connectSocket(SSLConnectionSocketFactory.java:368)
	at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:142)
	... 29 common frames omitted
2023-04-25 15:27:46,175 WARN [NiFi Site-to-Site Connection Pool Maintenance] o.a.n.r.util.SiteToSiteRestApiClient Failed to get controller from https://nifi.bravo.svc.cluster.local:8443/nifi-api due to org.apache.http.conn.HttpHostConnectException: Connect to nifi.bravo.svc.cluster.local:8443 [nifi.bravo.svc.cluster.local/10.105.95.59] failed: Connection refused (Connection refused)
2023-04-25 15:27:46,175 WARN [NiFi Site-to-Site Connection Pool Maintenance] o.apache.nifi.remote.client.PeerSelector Unable to refresh remote group peers due to: Connect to nifi.bravo.svc.cluster.local:8443 [nifi.bravo.svc.cluster.local/10.105.95.59] failed: Connection refused (Connection refused)
2023-04-25 15:27:46,179 WARN [NiFi Site-to-Site Connection Pool Maintenance] o.a.n.r.util.SiteToSiteRestApiClient Failed to get controller from https://nifi.bravo.svc.cluster.local:8443/nifi-api due to org.apache.http.conn.HttpHostConnectException: Connect to nifi.bravo.svc.cluster.local:8443 [nifi.bravo.svc.cluster.local/10.105.95.59] failed: Connection refused (Connection refused)
2023-04-25 15:27:46,179 WARN [NiFi Site-to-Site Connection Pool Maintenance] o.apache.nifi.remote.client.PeerSelector Unable to refresh remote group peers due to: Connect to nifi.bravo.svc.cluster.local:8443 [nifi.bravo.svc.cluster.local/10.105.95.59] failed: Connection refused (Connection refused)
2023-04-25 15:27:46,180 WARN [NiFi Site-to-Site Connection Pool Maintenance] o.a.n.r.SiteToSiteProvenanceReportingTask SiteToSiteProvenanceReportingTask[id=c131bf78-017e-1000-dbc1-a9b6b34a798b] Unable to refresh remote group peers due to: Connect to nifi.bravo.svc.cluster.local:8443 [nifi.bravo.svc.cluster.local/10.105.95.59] failed: Connection refused (Connection refused)
2023-04-25 15:27:46,244 WARN [Timer-Driven Process Thread-8] o.a.n.r.util.SiteToSiteRestApiClient Failed to get controller from https://nifi.bravo.svc.cluster.local:8443/nifi-api due to org.apache.http.conn.HttpHostConnectException: Connect to nifi.bravo.svc.cluster.local:8443 [nifi.bravo.svc.cluster.local/10.105.95.59] failed: Connection refused (Connection refused)
2023-04-25 15:27:46,244 ERROR [Timer-Driven Process Thread-8] o.a.n.r.SiteToSiteProvenanceReportingTask SiteToSiteProvenanceReportingTask[id=c131bf78-017e-1000-dbc1-a9b6b34a798b] Error running task SiteToSiteProvenanceReportingTask[id=c131bf78-017e-1000-dbc1-a9b6b34a798b] due to org.apache.nifi.processor.exception.ProcessException: Failed to send Provenance Events to destination due to IOException:Connect to nifi.bravo.svc.cluster.local:8443 [nifi.bravo.svc.cluster.local/10.105.95.59] failed: Connection refused (Connection refused)

@banzo banzo added enhancement New feature or request help wanted Extra attention is needed labels Apr 26, 2023
@banzo
Copy link
Contributor

banzo commented May 4, 2023

@nathluu

thank you, can you please update the PR title and description to reflect the change in NiFi version? This will help to autogenerate the release notes.

Also I am curious what is the difference between the cert-manager.io and jetstack versions for cert-manager.

@nathluu nathluu changed the title Disable zkClientEnsembleTracker Support Enable/Disable NiFi Zookeeper client Zookeeper Ensemble Tracker May 4, 2023
@nathluu
Copy link
Contributor Author

nathluu commented May 4, 2023

Hi @banzo ,
"Also I am curious what is the difference between the cert-manager.io and jetstack versions for cert-manager"
I revert the change though I can see there is no difference between them.

@wknickless
Copy link
Contributor

@nathluu

"Also I am curious what is the difference between the cert-manager.io and jetstack versions for cert-manager" I revert the change though I can see there is no difference between them."

When you point a web browser to https://github.com/jetstack/cert-manager it automatically redirects to https://github.com/cert-manager/cert-manager so I would recommend reverting to the cert-manager.io URL, as it seems to be the long-term supported reference to that project.

cmctl -n alpha renew nifi-0
kubectl -n alpha rollout restart statefulset/nifi
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This defeats the purpose of the test, which was to confirm that NiFi is automatically detecting the certificate has changed and restarting the TLS modules. If we want to disable this test because NiFi is broken, then I would recommend commenting it out with a comment rather than forcing a restart.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a TODO to remove this block of code when NIFI issue is fixed

nathluu and others added 14 commits May 19, 2023 18:59
Signed-off-by: Tan Luu <[email protected]>
* Add session affinity FAQ
---------

Co-authored-by: Chengjun Fu <[email protected]>
* Update nifi.properties - Add values - nifi.security.user.oidc.preferred.jwsalgorithm={{.Values.auth.oidc.preferredJwsalgorithm}}
* Update values.yaml - Add auth.oidc.preferredJwsalgorithm value
* Update README.md - Add info about nifi.security.user.oidc.preferred.jwsalgorithm
Signed-off-by: joseph.ybh <[email protected]>
Co-authored-by: joseph.ybh <[email protected]>
* Fix s2s test

Signed-off-by: Tan Luu <[email protected]>
Signed-off-by: Tan Luu <[email protected]>
…m:nathluu/helm-nifi into nathluu/zkclient-ensemble-tracker-disable
@nathluu
Copy link
Contributor Author

nathluu commented Nov 3, 2023

Close #294 unmerged and create #319

@nathluu nathluu closed this Nov 3, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants