Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

30% of runner instantiation failes due to timeout #86

Closed
jappyjan opened this issue Dec 17, 2021 · 7 comments
Closed

30% of runner instantiation failes due to timeout #86

jappyjan opened this issue Dec 17, 2021 · 7 comments

Comments

@jappyjan
Copy link

Run machulav/ec2-github-runner@v2
GitHub Registration Token is received
AWS EC2 instance i-0eeae9ef28dcd04e9 is started
AWS EC2 instance i-0eeae9ef28dcd04e9 is up and running
Waiting 30s for the AWS EC2 instance to be registered in GitHub as a new self-hosted runner
Checking every 10s if the GitHub self-hosted runner is registered
Checking...
.
.
.
Checking...
Error: GitHub self-hosted runner registration error
Checking...
Error: A timeout of 5 minutes is exceeded. Your AWS EC2 instance was not able to register itself in GitHub as a new self-hosted runner.

this is the error i receive for like 30% of my runners
what could cause this? and how can i increase the percentage of successfull instantiations?

@Preen
Copy link
Collaborator

Preen commented May 19, 2022

Would also like to know if there's something one can do to limit these situations...

@farvour
Copy link

farvour commented Jun 24, 2022

I ran into this issue when previous runners didn't clean themselves up in the GitHub API and I look at the cloud-init log of the configure command it was asking if I wanted to replace the previous runner:

https://github.com/actions/runner/blob/main/src/Runner.Listener/CommandSettings.cs#L193

Edit: I followed up here and it seems one can pass in --replace to the config.sh script. I could fork and cut a PR for this, but was wondering if it should be flaggable since it shouldn't normally happen on a clean stop of the instance (which sometimes isn't guaranteed).

Jun 24 20:41:24 ip-10-x-0-61 cloud-init[2458]: This runner will have the following labels: 'self-hosted', 'Linux', 'X64'
Jun 24 20:41:24 ip-10-x-0-61 cloud-init[2458]: Enter any additional labels (ex. label-1,label-2): [press Enter to skip]```

@pwo3
Copy link

pwo3 commented Sep 26, 2022

Hello!

Any update on this? We are also facing a lot of unstarted runners...

@farvour do you plan to do a PR or is there a way to apply your solution on our side?

Thank you!

@davegravy
Copy link

We're also experiencing this, requiring periodic manual re-run of our CI jobs.

@machulav
Copy link
Owner

machulav commented Nov 9, 2022

A timeout of 5 minutes is exceeded

This error usually means that your new EC2 runner can not communicate to GitHub and registers itself as a new runner. Based on the tests, 5 minutes is more than enough for the EC2 runner to be able to register itself.
So I recommend double-checking if the outbound traffic on port 443 is always opened for your EC2 runner.

If that does not help, I have to ask you to provide more information about your action configuration and the AWS infrastructure setup to help you with the issue.

@davegravy
Copy link

davegravy commented Nov 9, 2022

A timeout of 5 minutes is exceeded

This error usually means that your new EC2 runner can not communicate to GitHub and registers itself as a new runner. Based on the tests, 5 minutes is more than enough for the EC2 runner to be able to register itself. So I recommend double-checking if the outbound traffic on port 443 is always opened for your EC2 runner.

If that does not help, I have to ask you to provide more information about your action configuration and the AWS infrastructure setup to help you with the issue.

I'll have to do some probing to see what's going on with 443 traffic however some observations which may be relevant are:

  • For successful registrations (under 5 minutes) the distribution of time it takes is pretty wide. Not sure why, but I get a decent number of registrations that occur around the 4-5 minute mark. Others occur almost immediately. This seems independent of how long it takes for the instance to reach an "OK" status reported via aws ec2 describe-instance-status.

  • None of my other EC2 instances instances in this subnet have any obvious networking issues. All my security policies are fully open in the outbound direction.

I'm using a t3.mini instance with an AMI based on bare Ubuntu 20.04, then prepared with the apt equivalent of the README instructions, and nothing else done to it.

@jeverling
Copy link

Hi, I think this might be the same issue where the hostname is used as runner name, and hostnames are reused between instances: #128
Maybe using the --replace option is a good idea though!

@jappyjan jappyjan closed this as not planned Won't fix, can't repro, duplicate, stale Mar 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants