You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is a follow up to #90 and #93 which made it possible to use orchard behind an nginx ingress in Kubernetes.
Just to recap, the orchard client started failing to forward ports once it was put behind an nginx ingress. Initially, I though it could be related to the lack of certificate when using certificate-less bootstrap tokens, but @edigaryev was spot on that the missing piece of the puzzle was enabling the GRPCS backend for the /Controller endpoint on nginx ingress which proxies traffic to the orchard controller1.
This solution worked intermittently, resulting in error messages from nginx such as:
2023/06/27 12:30:38 [error] 2282#2282: *83097928 upstream timed out (110: Operation timed out) while reading response header from upstream, client: 10.10.10.100, server: orchard.example.internal, request: "POST /Controller/Watch HTTP/2.0", upstream: "grpcs://10.10.10.100:443", host: "orchard.example.internal:443"
or
2023/06/28 11:51:03 [error] 1253#1253: *1194409 no connection data found for keepalive http2 connection while sending request to upstream, client: 10.10.10.100, server: orchard.example.internal, request: "POST /Controller/Watch HTTP/2.0", upstream: "grpcs://10.10.10.100:443", host: "orchard.example.internal:443"
After spending a bit more time investigating the root cause, it became apparent that many other users werefacing a similarissue - nginx struggling to multiplex HTTP/1.1 (and perhaps HTTP/2.0 too) and gRPC streams.
One the suggestions I've seen so far is to limit the number of keepalive connects to upstreams to 1, effectively requiring a new connection to the backend for every client that connects to nginx. Unfortunately, the nginx ingress on Kubernetes uses a single upstream {} block managed by a Lua script, so this setting has the downside of affecting all upstream connections. For high-volume systems, this isn't a long-term solution.
The configuration needs to be performed on nginx ingress controller ConfigMap:
This will add keep_alive requests 1; to the global upstream directive.
Once nginx applied this change, I suddenly started receiving a different error when using port forwarding - this time, the gRPC stream would be terminated by RST_STREAM, suggesting the connection was being aborted.
{"level": "warn", "ts": 1687962870.3219981, "msg": "failed to watch RPC: rpc error: code - Internal desc - stream terminated by RST_STREAM with error code: INTERNAL_ERROR"}
Turns out that for bi-directional gRPC streams, the default timeout values from nginx are too low - 60 seconds. That meant the upstream gRPC stream would be closed after that period of time. To fix this, I've had to increase a few settings:
I'm still getting ocasional 502s for which I don't have a good explanation yet. One option would be to try to reconnect on a 502, but I'm unsure if grpc_next_upstream works with a single backend (I'm only running a single orchard-controller instance).
Overall, I believe orchard-controller would benefit from an optional segregation between its REST and gRPCs APIs to isolate traffic and avoid potential downsides from unexpected proxy behaviour.
1The orchard controller exposes a gin-tonic REST API and gRPC server on the same port and then uses the content-type header to direct traffic to one or the other.
The text was updated successfully, but these errors were encountered:
Would it be possible for you to provide a minimally reproducible Kubernetes configuration (YAML's) that triggers 502s?
I really think that this would be a much better way of solving your issue because it'll reveal the root-cause in the nginx configuration (or even code).
With regards to adding an option to enable the gRPC to listen on a separate port, I do not entierly understand how it will help in your case, for example, if upstream {} will be global for all upstream connections, it wouldn't matter if the Orchard Controller will expose two ports as upstream connections because the global upstream {} settings would still apply to both of them. Or I am missing something here?
Hi @edigaryev,
This is a follow up to #90 and #93 which made it possible to use orchard behind an nginx ingress in Kubernetes.
Just to recap, the orchard client started failing to forward ports once it was put behind an nginx ingress. Initially, I though it could be related to the lack of certificate when using certificate-less bootstrap tokens, but @edigaryev was spot on that the missing piece of the puzzle was enabling the GRPCS backend for the
/Controller
endpoint on nginx ingress which proxies traffic to the orchard controller1.This solution worked intermittently, resulting in error messages from nginx such as:
or
After spending a bit more time investigating the root cause, it became apparent that many other users were facing a similar issue - nginx struggling to multiplex HTTP/1.1 (and perhaps HTTP/2.0 too) and gRPC streams.
One the suggestions I've seen so far is to limit the number of keepalive connects to upstreams to 1, effectively requiring a new connection to the backend for every client that connects to nginx. Unfortunately, the nginx ingress on Kubernetes uses a single
upstream {}
block managed by a Lua script, so this setting has the downside of affecting all upstream connections. For high-volume systems, this isn't a long-term solution.The configuration needs to be performed on nginx ingress controller ConfigMap:
This will add
keep_alive requests 1;
to the globalupstream
directive.Once nginx applied this change, I suddenly started receiving a different error when using port forwarding - this time, the gRPC stream would be terminated by
RST_STREAM
, suggesting the connection was being aborted.Turns out that for bi-directional gRPC streams, the default timeout values from nginx are too low - 60 seconds. That meant the upstream gRPC stream would be closed after that period of time. To fix this, I've had to increase a few settings:
I'm still getting ocasional 502s for which I don't have a good explanation yet. One option would be to try to reconnect on a 502, but I'm unsure if
grpc_next_upstream
works with a single backend (I'm only running a single orchard-controller instance).Overall, I believe orchard-controller would benefit from an optional segregation between its REST and gRPCs APIs to isolate traffic and avoid potential downsides from unexpected proxy behaviour.
1 The orchard controller exposes a gin-tonic REST API and gRPC server on the same port and then uses the content-type header to direct traffic to one or the other.
The text was updated successfully, but these errors were encountered: