AWS IMDS Token is not being reused until it expires #1990

harshkiprofile · 2024-11-04T06:45:32Z

The script currently generates an IMDS token with a 6-hour (21,600 seconds) validity, but it lacks logic to reuse the token until it expires. Instead, it fetches a new token on every call, which could be optimized.

harshkiprofile · 2024-11-04T06:46:49Z

* Introduce a new shell function to reuse IMDS token * Utilize the get_token function to reuse the token * Move token renewal function to aws.sh for reuse in AWS agent scripts

thimslugga · 2024-11-06T22:43:49Z

This new change introduces security concerns as it's writing the session token to a file under /var/run and care must be taken to keep this safe and ensure permissions set correctly for read/write i.e. umask, etc.

Ideally the session token ttl time should be much lower i.e. ~10 min or less and not using the max ttl of 6 hours.

I have a fork of the aws-vpc-move-ip, which introduces some changes and attempts to get the instance ID from local file and falls back to the IMDS if that file is not accessible e.g. non-nitro based EC2 instance, etc.

main...thimslugga:resource-agents:aws-dev

oalbrigt · 2024-11-07T08:13:58Z

This new change introduces security concerns as it's writing the session token to a file under /var/run and care must be taken to keep this safe and ensure permissions set correctly for read/write i.e. umask, etc.

We can just chmod 600 the file after saving the token.

thimslugga · 2024-11-07T16:16:24Z

We can also retrieve the instance ID from the /sys/devices/virtual/dmi/id/board_asset_tag file on Nitro-based EC2 instances. This would prevent always having to make curl requests to retrieve this info and we can fallback to IMDS.

https://github.com/thimslugga/resource-agents/blob/cadd53c8a5c019f1679bda47d3d5b70471af4d3b/heartbeat/aws-vpc-move-ip#L286-L293

The sleep and wait time should also be bumped from the current 3 and 1 to something a bit higher in event the IMDS is temporarily unavailable e.g. ~10s.

oalbrigt · 2024-11-08T08:00:12Z

Nice. I'll make a patch with the suggestions next week.

oalbrigt · 2024-11-12T10:05:03Z

I've made a patch based on your suggestions: #1995

gguifelixamz · 2024-11-13T22:44:40Z

What's the actual difference between reading it from a file and making a request to the IMDS metadata? Before merging this I would like to hear from the requester @harshkiprofile what's the use-case and what's the level of optimization achieved from this change.

Were any issues observed by fetching a token every time?

Writing a plain text token (in fact, any kind of secret) to a file is generally a bad idea, even if you set exclusive permissions.

gguifelixamz · 2024-11-14T00:41:04Z

@oalbrigt can we revert PR1991until we have the requestor to clear up a few questions about the use-case? I don't think we should upstream this feature without some background context with a change of security and usability issues.

harshkiprofile · 2024-11-14T02:26:13Z

The decision to reuse the token by storing it locally is based on the fact that during AWS EC2 hypervisor maintenance events, the IMDS metadata API call to fetch token can sometimes become temporarily unavailable, even if the EC2 instance itself remains operational. If a new token were fetched with every request during such events, metadata token retrieval could fail. By storing the metadata token locally (temporarily), we ensure uninterrupted access to metadata, regardless of IMDS availability during maintenance windows, thus enhancing system reliability.

This approach prioritizes fault tolerance over pure performance optimization. In the event of AWS maintenance, local token storage allows continued metadata access, reducing downtime and mitigating the impact of transient IMDS issues.

While storing tokens or secrets in files does introduce potential risks, @oalbrigt has implemented restricted file permissions to minimize these concerns.

Performance-wise, no significant issues have been observed with fetching a token for each request. However, in environments where transient IMDS issues occur frequently, the local token storage method provides increased robustness, eliminating the need for constant retries to acquire a token.

That was the rationale behind this change. I would love to hear your feedback on this one! I will explore on what can we done to strengthen the security as I believe caching the token improves stability.

harshkiprofile · 2024-11-14T03:09:54Z

Another key rationale behind this change is that it helps prevent failures caused by AWS API call throttling.

oalbrigt · 2024-11-14T08:02:36Z

@gguifelixamz The requests adds up when you have multiple resources. Say you have one or more fence agents (e.g. for different zones), then one or more of awseip, awsvip, aws-vpc-move-ip, and aws-vpc-route53.

We can revert it if we find a reason to undo it. If not we can do improvements based on your suggestions or pull requests.

Next release should be in january/february, so we got time to fix any issues before then.

thimslugga · 2024-11-14T22:15:39Z

By storing the metadata token locally (temporarily), we ensure uninterrupted access to metadata, regardless of IMDS availability during maintenance windows, thus enhancing system reliability.

The session token is used to make subsequent calls to the IMDS service. Having it stored locally would still result in calls to the IMDS failing if the service was unavailable...

Another key rationale behind this change is that it helps prevent failures caused by AWS API call throttling.

The IMDS != AWS EC2 API. The other functions in the script that make use of the aws-cli, will still result in the aws-cli requesting a session token from IMDS, retrieving credentials from the IMDS and then making the actual call to the AWS EC2 API.

The requests adds up when you have multiple resources. Say you have one or more fence agents (e.g. for different zones), then one or more of awseip, awsvip, aws-vpc-move-ip, and aws-vpc-route53.

See above, do not confuse the IMDS and AWS EC2 API as they are not the same thing.

harshkiprofile · 2024-11-15T03:04:06Z

The IMDS != AWS EC2 API. The other functions in the script that make use of the aws-cli, will still result in the aws-cli requesting a session token from IMDS, retrieving credentials from the IMDS and then making the actual call to the AWS EC2 API.

PPS limit != EC2 API
Actually, this isn't related to AWS EC2 API throttling; by "via request throttling," I mean that certain limits (PPS), when reached, cause requests to be throttled. If you look at the highlighted text in the screenshot, you'll see it refers to the packets-per-second (PPS) limits being exceeded. This solution significantly helps reduce the consumed PPS limit as it reduce the recurring calls to IMDS to get the token.

As a user of resource agents, I encountered several challenges in my environment that led me to propose this suggestion. Implementing the recommendation has significantly enhanced the stability of the environment.

oalbrigt · 2024-11-18T10:15:34Z

@gguifelixamz I've added a commit to be able to refresh the token e.g. if the file is corrupt for some reason:
b8d3ecc

oalbrigt mentioned this issue Nov 12, 2024

aws.sh: chmod 600 $TOKEN_FILE, add get_instance_id() with DMI support, and use get_instance_id() in AWS agents #1995

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AWS IMDS Token is not being reused until it expires #1990

AWS IMDS Token is not being reused until it expires #1990

harshkiprofile commented Nov 4, 2024

harshkiprofile commented Nov 4, 2024

thimslugga commented Nov 6, 2024 •

edited

Loading

oalbrigt commented Nov 7, 2024 •

edited

Loading

thimslugga commented Nov 7, 2024 •

edited

Loading

oalbrigt commented Nov 8, 2024

oalbrigt commented Nov 12, 2024

gguifelixamz commented Nov 13, 2024

gguifelixamz commented Nov 14, 2024

harshkiprofile commented Nov 14, 2024

harshkiprofile commented Nov 14, 2024

oalbrigt commented Nov 14, 2024

thimslugga commented Nov 14, 2024

harshkiprofile commented Nov 15, 2024

oalbrigt commented Nov 18, 2024

AWS IMDS Token is not being reused until it expires #1990

AWS IMDS Token is not being reused until it expires #1990

Comments

harshkiprofile commented Nov 4, 2024

harshkiprofile commented Nov 4, 2024

thimslugga commented Nov 6, 2024 • edited Loading

oalbrigt commented Nov 7, 2024 • edited Loading

thimslugga commented Nov 7, 2024 • edited Loading

oalbrigt commented Nov 8, 2024

oalbrigt commented Nov 12, 2024

gguifelixamz commented Nov 13, 2024

gguifelixamz commented Nov 14, 2024

harshkiprofile commented Nov 14, 2024

harshkiprofile commented Nov 14, 2024

oalbrigt commented Nov 14, 2024

thimslugga commented Nov 14, 2024

harshkiprofile commented Nov 15, 2024

oalbrigt commented Nov 18, 2024

thimslugga commented Nov 6, 2024 •

edited

Loading

oalbrigt commented Nov 7, 2024 •

edited

Loading

thimslugga commented Nov 7, 2024 •

edited

Loading