You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The documentation does not mention anything about my problem
There are no open or closed issues that are related to my problem
Description
Issue: Self-Hosted Runners on GHA Workflows with Kubernetes Driver
Background
We have configured our GitHub Actions (GHA) workflows to use self-hosted runners. Our typical workflow involves:
Installing buildx
Building, pushing, and caching with buildx
Problem
We are encountering an issue when using the Kubernetes (k8s) driver for our builds. Our self-hosted runners are deployed on our k8s cluster. We're experiencing a specific error as shown in the screenshot below:
We suspect that the issue might be related to our runners being behind a VPN. It seems buildx may not be adequately handling network latency associated with a VPN connection.
Observations
The issue is isolated to our runners or the k8s driver/buildx combination. This is evident because switching to GitHub's hosted runners resolves the issue, indicating no problems with our workflow or Dockerfile.
The failure isn't consistent; approximately 1 in 5 actions encounter this issue. Sometimes the action completes successfully.
Seeking insights or suggestions to resolve this intermittent failure with our self-hosted runners in GHA workflows.
Expected Behavior
When using self-hosted runners in GitHub Actions workflows with the Kubernetes (k8s) driver for buildx, we expect the following:
Stable Connection to Build Services: The runners should maintain a stable connection to Docker's build services, regardless of being behind a VPN. Network latency typically associated with VPN connections should not disrupt the build process.
Consistent Build Process: Each action initiated by the workflow should complete successfully without intermittent failures. The build, push, and cache processes via buildx should be executed reliably.
Error-Free Operation: The buildx command, especially when interacting with Kubernetes, should execute without returning errors like /moby.buildkit.v1.Control/Solve returned error: rpc error: code = Canceled desc = context canceled.
Consistency with GitHub Hosted Runners: The performance and reliability of builds using self-hosted runners should be comparable to those observed with GitHub's hosted runners.
The expectation is that the self-hosted runners on our Kubernetes cluster should work as efficiently and reliably as GitHub's hosted runners, ensuring a smooth CI/CD pipeline.
Actual Behavior
When using self-hosted runners in GitHub Actions workflows with the Kubernetes (k8s) driver for buildx, we are encountering the following issues:
Unstable Connection to Build Services: The runners, especially when operating behind a VPN, are experiencing unstable connections to Docker's build services. This is evident from frequent connection cancellations and errors during the build process.
Inconsistent Build Process: The actions initiated by the workflow are not completing consistently. Approximately 20% of the actions (1 in 5) fail intermittently, showcasing a lack of reliability in the build, push, and cache processes via buildx.
Frequent Errors: We are frequently encountering errors such as /moby.buildkit.v1.Control/Solve returned error: rpc error: code = Canceled desc = context canceled. These errors suggest issues with the interaction between buildx and Kubernetes.
Disparity with GitHub Hosted Runners: Unlike the smooth operation observed with GitHub's hosted runners, our self-hosted runners exhibit inconsistent and error-prone behavior, leading to a disrupted CI/CD pipeline.
In summary, our self-hosted runners on the Kubernetes cluster are not performing as efficiently or reliably as expected, particularly in comparison to GitHub's hosted runners.
Also it is important to note that this job only ever cancels when doing build and push. We use actions for other things and the actions never just cancel for no reason.
The text was updated successfully, but these errors were encountered:
We're seeing the same issues, both with and without buildx. Can't pinpoint an exact cause. On AWS behind a VPC/transit gateway etc but no VPN. platform: amd64
Contributing guidelines
I've found a bug, and:
Description
Issue: Self-Hosted Runners on GHA Workflows with Kubernetes Driver
Background
We have configured our GitHub Actions (GHA) workflows to use self-hosted runners. Our typical workflow involves:
buildx
buildx
Problem
We are encountering an issue when using the Kubernetes (k8s) driver for our builds. Our self-hosted runners are deployed on our k8s cluster. We're experiencing a specific error as shown in the screenshot below:
Kubernetes Container Logs:
time="2023-11-30T22:28:11Z" level=error msg="/moby.buildkit.v1.Control/Solve returned error: rpc error: code = Canceled desc = context canceled"
Hypothesis
We suspect that the issue might be related to our runners being behind a VPN. It seems
buildx
may not be adequately handling network latency associated with a VPN connection.Observations
References
For additional context, see this related issue.
Seeking insights or suggestions to resolve this intermittent failure with our self-hosted runners in GHA workflows.
Expected Behavior
When using self-hosted runners in GitHub Actions workflows with the Kubernetes (k8s) driver for
buildx
, we expect the following:Stable Connection to Build Services: The runners should maintain a stable connection to Docker's build services, regardless of being behind a VPN. Network latency typically associated with VPN connections should not disrupt the build process.
Consistent Build Process: Each action initiated by the workflow should complete successfully without intermittent failures. The build, push, and cache processes via
buildx
should be executed reliably.Error-Free Operation: The
buildx
command, especially when interacting with Kubernetes, should execute without returning errors like/moby.buildkit.v1.Control/Solve returned error: rpc error: code = Canceled desc = context canceled
.Consistency with GitHub Hosted Runners: The performance and reliability of builds using self-hosted runners should be comparable to those observed with GitHub's hosted runners.
The expectation is that the self-hosted runners on our Kubernetes cluster should work as efficiently and reliably as GitHub's hosted runners, ensuring a smooth CI/CD pipeline.
Actual Behavior
When using self-hosted runners in GitHub Actions workflows with the Kubernetes (k8s) driver for
buildx
, we are encountering the following issues:Unstable Connection to Build Services: The runners, especially when operating behind a VPN, are experiencing unstable connections to Docker's build services. This is evident from frequent connection cancellations and errors during the build process.
Inconsistent Build Process: The actions initiated by the workflow are not completing consistently. Approximately 20% of the actions (1 in 5) fail intermittently, showcasing a lack of reliability in the build, push, and cache processes via
buildx
.Frequent Errors: We are frequently encountering errors such as
/moby.buildkit.v1.Control/Solve returned error: rpc error: code = Canceled desc = context canceled
. These errors suggest issues with the interaction betweenbuildx
and Kubernetes.Disparity with GitHub Hosted Runners: Unlike the smooth operation observed with GitHub's hosted runners, our self-hosted runners exhibit inconsistent and error-prone behavior, leading to a disrupted CI/CD pipeline.
In summary, our self-hosted runners on the Kubernetes cluster are not performing as efficiently or reliably as expected, particularly in comparison to GitHub's hosted runners.
Repository URL
No response
Workflow run URL
No response
YAML workflow
Workflow logs
No response
BuildKit logs
No response
Additional info
Also it is important to note that this job only ever cancels when doing build and push. We use actions for other things and the actions never just cancel for no reason.
The text was updated successfully, but these errors were encountered: