-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ManagedNodeGroup is assigned to incorrect security group by default #1275
Comments
Thank you for filing this issue @JustASquid, could you add a quick repro program to make it easier to reproduce this exact issue. Thank you. |
@JustASquid, thank you for reporting this issue and providing detailed information. I am able to repro the problem using the example in our repository: Managed NodeGroups Example - with You've correctly identified that the two different security groups are causing communication issues. However, it's actually the managed node group that's using the security group created by EKS, while the default node group uses the security group managed by Pulumi. When the EKS provider is set to not skip the default node group creation, we create a security group that only allows intra-node communication within that group. Reference: Security Group Configuration. The EKS managed node group uses the default cluster security group created by AWS. Even if an additional security group is specified during cluster creation (which is used by the default node group), it won't be attached to the managed node group instances. To enable communication between these node groups, you need to use a custom launch template for the ManagedNodeGroup to specify the security group created by Pulumi. Here’s a TypeScript example of setting this up: const cluster = new eks.Cluster("cluster", {
// ... (other configurations)
skipDefaultNodeGroup: false,
});
// Create Managed Node Group with custom launch template to use the security group that the default node group uses.
function createUserData(cluster: aws.eks.Cluster, extraArgs: string): pulumi.Output<string> {
const userdata = pulumi
.all([
cluster.name,
cluster.endpoint,
cluster.certificateAuthority.data,
])
.apply(([clusterName, clusterEndpoint, clusterCertAuthority]) => {
return `MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="==MYBOUNDARY=="
--==MYBOUNDARY==
Content-Type: text/x-shellscript; charset="us-ascii"
#!/bin/bash
/etc/eks/bootstrap.sh --apiserver-endpoint "${clusterEndpoint}" --b64-cluster-ca "${clusterCertAuthority}" "${clusterName}" ${extraArgs}
--==MYBOUNDARY==--`;
});
// Encode the user data as base64.
return pulumi
.output(userdata)
.apply((ud) => Buffer.from(ud, "utf-8").toString("base64"));
}
const lt = new aws.ec2.LaunchTemplate("my-mng-lt", {
imageId: "ami-0cfd96d646e5535a8",
instanceType: "t3.medium",
vpcSecurityGroupIds: [cluster.defaultNodeGroup!.nodeSecurityGroup.id], // <- This is where we define the SG to be used by the MNG.
userData: createUserData(cluster.core.cluster, "--kubelet-extra-args --node-labels=mylabel=myvalue"), // This is required to enable instances to join the cluster.
});
const mng = new eks.ManagedNodeGroup("cluster-my-mng", {
// ... (other configurations)
cluster: cluster,
launchTemplate: {
id: lt.id,
version: pulumi.interpolate`${lt.latestVersion}`,
},
}); Alternatively, as you mentioned, you could skip creating the default node group, and everything should work as expected. Please let us know if this resolves your issue or if you need further assistance! |
Thank you @rquitales and you are exactly right - I got the order back to front in my original post, indeed the MNG's are using the EKS-created SG. And I was able to work around the issue by skipping the default node group. I do feel that this behavior is non-ideal though, as the path of least resistance when setting up a cluster is to use the default node group. It's easy to run into the case where it cannot communicate with any subsequent MNGs, and the problem can manifest in a very non-obvious way (In my case, no DNS resolution on the MNG nodes). Going down the road of specifying a custom launch template is not trivial, least of all because you need to fetch an AMI ID. So I guess the question is, is there a particular reason it works this way? Why not just have the default NG be assigned the cluster's EKS-created security group? |
Another possible workaround is setting up the necessary security group rules to allow the different node groups to communicate. Like this for example:
I'm gonna check what it would take to add this to the component itself. I'm not necessarily concerned about security implications here because we already have open firewalls within the Pulumi managed security group: pulumi-eks/nodejs/eks/securitygroup.ts Lines 118 to 130 in c5fd959
Equally the EKS managed security group also allows all communication within itself: https://docs.aws.amazon.com/eks/latest/userguide/sec-group-reqs.html |
What happened?
I noticed an interesting issue whenever spinning up a
ManagedNodeGroup
. Nodes in the group were not able to access the DNS server or otherwise connect to other nodes in the default NG in the cluster.The problem was because the default NG created by EKS, not Pulumi, is assigned to a security group that is different to subsequent MNG's. Indeed, if we take a look at the cluster in EKS, we can see that there are 2 security groups associated. One of them has the following description:
And the other is simply
The default NG is assigned to the EKS-created SG, while any subsequent MNG's are assigned to the Pulumi-created one. This means that any MNG nodes cannot communicate with core cluster resources.
It may be possible to work around the issue by skipping the default node group creation and making all nodes be handled by MNG's however I haven't tested this yet.
Example
N/A
Output of
pulumi about
CLI
Version 3.120.0
Go Version go1.22.4
Go Compiler gc
Plugins
KIND NAME VERSION
resource aws 6.41.0
resource awsx 2.12.3
resource docker 4.5.4
resource docker 3.6.1
resource eks 2.7.1
resource kubernetes 4.13.1
language nodejs unknown
Host
OS ubuntu
Version 22.04
Arch x86_64
This project is written in nodejs: executable='/home/daniel/.nvm/versions/node/v21.6.1/bin/node' version='v21.6.1'
Additional context
No response
Contributing
Vote on this issue by adding a 👍 reaction.
To contribute a fix for this issue, leave a comment (and link to your pull request, if you've opened one already).
The text was updated successfully, but these errors were encountered: