30% of runner instantiation failes due to timeout #86

jappyjan · 2021-12-17T21:38:47Z

Run machulav/ec2-github-runner@v2
GitHub Registration Token is received
AWS EC2 instance i-0eeae9ef28dcd04e9 is started
AWS EC2 instance i-0eeae9ef28dcd04e9 is up and running
Waiting 30s for the AWS EC2 instance to be registered in GitHub as a new self-hosted runner
Checking every 10s if the GitHub self-hosted runner is registered
Checking...
.
.
.
Checking...
Error: GitHub self-hosted runner registration error
Checking...
Error: A timeout of 5 minutes is exceeded. Your AWS EC2 instance was not able to register itself in GitHub as a new self-hosted runner.

this is the error i receive for like 30% of my runners
what could cause this? and how can i increase the percentage of successfull instantiations?

The text was updated successfully, but these errors were encountered:

Preen · 2022-05-19T10:55:37Z

Would also like to know if there's something one can do to limit these situations...

farvour · 2022-06-24T20:44:43Z

I ran into this issue when previous runners didn't clean themselves up in the GitHub API and I look at the cloud-init log of the configure command it was asking if I wanted to replace the previous runner:

https://github.com/actions/runner/blob/main/src/Runner.Listener/CommandSettings.cs#L193

Edit: I followed up here and it seems one can pass in --replace to the config.sh script. I could fork and cut a PR for this, but was wondering if it should be flaggable since it shouldn't normally happen on a clean stop of the instance (which sometimes isn't guaranteed).

Jun 24 20:41:24 ip-10-x-0-61 cloud-init[2458]: This runner will have the following labels: 'self-hosted', 'Linux', 'X64'
Jun 24 20:41:24 ip-10-x-0-61 cloud-init[2458]: Enter any additional labels (ex. label-1,label-2): [press Enter to skip]```

pwo3 · 2022-09-26T15:48:46Z

Hello!

Any update on this? We are also facing a lot of unstarted runners...

@farvour do you plan to do a PR or is there a way to apply your solution on our side?

Thank you!

davegravy · 2022-10-18T15:19:37Z

We're also experiencing this, requiring periodic manual re-run of our CI jobs.

machulav · 2022-11-09T13:14:40Z

A timeout of 5 minutes is exceeded

This error usually means that your new EC2 runner can not communicate to GitHub and registers itself as a new runner. Based on the tests, 5 minutes is more than enough for the EC2 runner to be able to register itself.
So I recommend double-checking if the outbound traffic on port 443 is always opened for your EC2 runner.

If that does not help, I have to ask you to provide more information about your action configuration and the AWS infrastructure setup to help you with the issue.

davegravy · 2022-11-09T13:41:14Z

A timeout of 5 minutes is exceeded

This error usually means that your new EC2 runner can not communicate to GitHub and registers itself as a new runner. Based on the tests, 5 minutes is more than enough for the EC2 runner to be able to register itself. So I recommend double-checking if the outbound traffic on port 443 is always opened for your EC2 runner.

If that does not help, I have to ask you to provide more information about your action configuration and the AWS infrastructure setup to help you with the issue.

I'll have to do some probing to see what's going on with 443 traffic however some observations which may be relevant are:

For successful registrations (under 5 minutes) the distribution of time it takes is pretty wide. Not sure why, but I get a decent number of registrations that occur around the 4-5 minute mark. Others occur almost immediately. This seems independent of how long it takes for the instance to reach an "OK" status reported via aws ec2 describe-instance-status.
None of my other EC2 instances instances in this subnet have any obvious networking issues. All my security policies are fully open in the outbound direction.

I'm using a t3.mini instance with an AMI based on bare Ubuntu 20.04, then prepared with the apt equivalent of the README instructions, and nothing else done to it.

jeverling · 2022-12-08T14:07:02Z

Hi, I think this might be the same issue where the hostname is used as runner name, and hostnames are reused between instances: #128
Maybe using the --replace option is a good idea though!

jappyjan closed this as not planned Won't fix, can't repro, duplicate, stale Mar 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

30% of runner instantiation failes due to timeout #86

30% of runner instantiation failes due to timeout #86

jappyjan commented Dec 17, 2021

Preen commented May 19, 2022

farvour commented Jun 24, 2022 •

edited

Loading

pwo3 commented Sep 26, 2022

davegravy commented Oct 18, 2022

machulav commented Nov 9, 2022

davegravy commented Nov 9, 2022 •

edited

Loading

jeverling commented Dec 8, 2022

30% of runner instantiation failes due to timeout #86

30% of runner instantiation failes due to timeout #86

Comments

jappyjan commented Dec 17, 2021

Preen commented May 19, 2022

farvour commented Jun 24, 2022 • edited Loading

pwo3 commented Sep 26, 2022

davegravy commented Oct 18, 2022

machulav commented Nov 9, 2022

davegravy commented Nov 9, 2022 • edited Loading

jeverling commented Dec 8, 2022

farvour commented Jun 24, 2022 •

edited

Loading

davegravy commented Nov 9, 2022 •

edited

Loading