Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Worker Node group doesn't join the EKS cluster #10

Open
hinddeep opened this issue Sep 27, 2022 · 4 comments
Open

Worker Node group doesn't join the EKS cluster #10

hinddeep opened this issue Sep 27, 2022 · 4 comments

Comments

@hinddeep
Copy link

hinddeep commented Sep 27, 2022

  1. I've set up the infrastructure using open5gs-infra.yaml
  2. I've configured the bastion host and run step 5 properly (by providing the correct ARN value)
  3. I've initialised the DocumentDB
  4. I updated the CoreDNS configmap and restarted coredns pods
  5. I then ran the cloudformation yaml file for the creation of the worker node group
  6. However, the workernode group doesn't join the cluster. I've double-checked the parameters that I feed to the cloudformation template. I've even tried to edit the authConfig manually after the worker node group has been created so that the worker nodes can join the cluster. But that doesn't work.

Since there are no worker nodes, the pods can't be scheduled and the cluster is non-usable. What can I do so that the worker node group joins the cluster?

@jungy-aws
Copy link
Contributor

jungy-aws commented Sep 27, 2022

Please refer to "Run the CloudFormation for worker node group creation (open5gs-worker.yaml)", bullet point #3. Cluster joining procedure is supposed to be done by ConfigMapUpdate of Custom Resource Lambda in "open5gs-worker-xxx.yaml". You can debug why it was not successfully done by CloudWatch log of related Lambda function (named as eks-auth-update-hook-${AWS::StackName}) but simply you can just do manual update of aws-auth-cm (configmap). You can refer to https://docs.aws.amazon.com/eks/latest/userguide/launch-workers.html (in AWS management Console tab) for steps of configuring aws-auth-cm.yaml.

@hinddeep
Copy link
Author

hinddeep commented Oct 5, 2022

I've performed aws-auth update manually but that has not resolved the issue. I even checked the logs of the lambda function that is responsible for joining the worker nodes to the cluster but didn’t find any error.

  1. I've used this script to troubleshoot the issue: https://docs.aws.amazon.com/systems-manager-automation-runbooks/latest/userguide/automation-awssupport-troubleshooteksworkernode.html

  2. I got one error. The security group policies applied to the cluster were highly restrictive. It was not allowing traffic to flow from the worker nodes to the cluster. This was the only error. All the other tests passed.

  3. I modified the security group to allow all inbound traffic from everywhere. I re-ran the script and the error was fixed. I then redeployed my worker node group but somehow they still didn't join the cluster.

  4. I used network path analyser in AWS VPC. I tried to test 3 paths:
    a. user_plane worker node as the source, control_plane worker node as the destination
    b. control_plane worker node as the source, bastion host as the destination
    c. user_plane worker node as the source, bastion host as the destination
    All the 3 paths are functional

@nicolas-gagnon
Copy link

@hinddeep Have you fix the issue?

@mgonzalezo
Copy link

I faced the same issue and fixed it as follows:

  1. Check the AWS console/CLI user you are using for CloudFormation Stack creation. If possible, create a new AWS account for this specific task to avoid IAM roles errors later on.
  2. Optional: Restart coredns pods again once nodes are attached to the cluster in case they are not being scheduled correctly (I didn't have to do this after attaching nodes to cluster, though).

Following above 2 steps, Open5GS cluster should be working with pair of worker nodes.

NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
ip-10-1-2-158.ec2.internal Ready 121m v1.18.20-eks-c9f1ce 10.1.2.158 Amazon Linux 2 4.14.268-205.500.amzn2.x86_64 docker://20.10.13
ip-10-1-2-231.ec2.internal Ready 121m v1.18.20-eks-c9f1ce 10.1.2.231 Amazon Linux 2 4.14.268-205.500.amzn2.x86_64 docker://20.10.13
[

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants