-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add warnings for throttling exceptions #1555
Conversation
d2155c7
to
e1761af
Compare
alarm_description = "Have received a throttling exception in the last minute" | ||
comparison_operator = "GreaterThanOrEqualToThreshold" | ||
evaluation_periods = "1" | ||
metric_name = aws_cloudwatch_log_metric_filter.throttling-exceptions[0].metric_transformation[0].name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The metric_name
should be a string value, but it seems to be referencing an array element. Ensure that aws_cloudwatch_log_metric_filter.throttling-exceptions[0].metric_transformation[0].name
is correctly defined and accessible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True that we'd have a problem if those [0]
array elements didn't exist, but that doesn't have anything to do with the type of metric_name
, and it's the pattern we use everywhere. (and we only declare all these things if cloudwatch_enabled
is set so they all work together).
comparison_operator = "GreaterThanOrEqualToThreshold" | ||
evaluation_periods = "1" | ||
metric_name = aws_cloudwatch_log_metric_filter.throttling-exceptions[0].metric_transformation[0].name | ||
namespace = aws_cloudwatch_log_metric_filter.throttling-exceptions[0].metric_transformation[0].namespace |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The namespace
should be a string value, but it seems to be referencing an array element. Ensure that aws_cloudwatch_log_metric_filter.throttling-exceptions[0].metric_transformation[0].namespace
is correctly defined and accessible.
alarm_description = "Have received 100 throttling exception in the last minute" | ||
comparison_operator = "GreaterThanOrEqualToThreshold" | ||
evaluation_periods = "1" | ||
metric_name = aws_cloudwatch_log_metric_filter.throttling-exceptions[0].metric_transformation[0].name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The metric_name
should be a string value, but it seems to be referencing an array element. Ensure that aws_cloudwatch_log_metric_filter.throttling-exceptions[0].metric_transformation[0].name
is correctly defined and accessible.
comparison_operator = "GreaterThanOrEqualToThreshold" | ||
evaluation_periods = "1" | ||
metric_name = aws_cloudwatch_log_metric_filter.throttling-exceptions[0].metric_transformation[0].name | ||
namespace = aws_cloudwatch_log_metric_filter.throttling-exceptions[0].metric_transformation[0].namespace |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The namespace
should be a string value, but it seems to be referencing an array element. Ensure that aws_cloudwatch_log_metric_filter.throttling-exceptions[0].metric_transformation[0].namespace
is correctly defined and accessible.
aws/eks/cloudwatch_log.tf
Outdated
namespace = "LogMetrics" | ||
value = "1" | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a newline at the end of the file to follow best practices and ensure compatibility with various tools and editors.
Staging: eks✅ Terraform Init: Plan: 4 to add, 1 to change, 1 to destroy Show summary
Show planResource actions are indicated with the following symbols:
+ create
~ update in-place
-/+ destroy and then create replacement
Terraform will perform the following actions:
# aws_cloudwatch_log_metric_filter.throttling-exceptions[0] will be created
+ resource "aws_cloudwatch_log_metric_filter" "throttling-exceptions" {
+ id = (known after apply)
+ log_group_name = "/aws/containerinsights/notification-canada-ca-staging-eks-cluster/application"
+ name = "throttling-exceptions"
+ pattern = "ThrottlingException"
+ metric_transformation {
+ name = "throttling-exceptions"
+ namespace = "LogMetrics"
+ unit = "None"
+ value = "1"
}
}
# aws_cloudwatch_metric_alarm.many-throttling-exceptions-warning[0] will be created
+ resource "aws_cloudwatch_metric_alarm" "many-throttling-exceptions-warning" {
+ actions_enabled = true
+ alarm_actions = [
+ "arn:aws:sns:ca-central-1:239043911459:alert-warning",
]
+ alarm_description = "Have received 100 throttling exception in the last minute"
+ alarm_name = "many-throttling-exceptions-warning"
+ arn = (known after apply)
+ comparison_operator = "GreaterThanOrEqualToThreshold"
+ evaluate_low_sample_count_percentiles = (known after apply)
+ evaluation_periods = 1
+ id = (known after apply)
+ metric_name = "throttling-exceptions"
+ namespace = "LogMetrics"
+ period = 60
+ statistic = "Sum"
+ tags_all = (known after apply)
+ threshold = 100
+ treat_missing_data = "notBreaching"
}
# aws_cloudwatch_metric_alarm.service-callback-too-many-failures-critical[0] must be replaced
-/+ resource "aws_cloudwatch_metric_alarm" "service-callback-too-many-failures-critical" {
~ alarm_name = "service-callback-too-many-failures-warning" -> "service-callback-too-many-failures-critical" # forces replacement
~ arn = "arn:aws:cloudwatch:ca-central-1:239043911459:alarm:service-callback-too-many-failures-warning" -> (known after apply)
- datapoints_to_alarm = 0 -> null
- dimensions = {} -> null
+ evaluate_low_sample_count_percentiles = (known after apply)
~ id = "service-callback-too-many-failures-warning" -> (known after apply)
- insufficient_data_actions = [] -> null
- ok_actions = [] -> null
- tags = {} -> null
~ tags_all = {} -> (known after apply)
# (11 unchanged attributes hidden)
}
# aws_cloudwatch_metric_alarm.service-callback-too-many-failures-warning[0] will be updated in-place
~ resource "aws_cloudwatch_metric_alarm" "service-callback-too-many-failures-warning" {
~ alarm_actions = [
- "arn:aws:sns:ca-central-1:239043911459:alert-critical",
+ "arn:aws:sns:ca-central-1:239043911459:alert-warning",
]
~ alarm_description = "Service reached the max number of callback retries 100 times in 10 minutes" -> "Service reached the max number of callback retries 25 times in 5 minutes"
id = "service-callback-too-many-failures-warning"
~ period = 600 -> 300
tags = {}
~ threshold = 100 -> 25
# (14 unchanged attributes hidden)
}
# aws_cloudwatch_metric_alarm.throttling-exception-warning[0] will be created
+ resource "aws_cloudwatch_metric_alarm" "throttling-exception-warning" {
+ actions_enabled = true
+ alarm_actions = [
+ "arn:aws:sns:ca-central-1:239043911459:alert-warning",
]
+ alarm_description = "Have received a throttling exception in the last minute"
+ alarm_name = "throttling-exception-warning"
+ arn = (known after apply)
+ comparison_operator = "GreaterThanOrEqualToThreshold"
+ evaluate_low_sample_count_percentiles = (known after apply)
+ evaluation_periods = 1
+ id = (known after apply)
+ metric_name = "throttling-exceptions"
+ namespace = "LogMetrics"
+ period = 60
+ statistic = "Sum"
+ tags_all = (known after apply)
+ threshold = 1
+ treat_missing_data = "notBreaching"
}
Plan: 4 to add, 1 to change, 1 to destroy.
─────────────────────────────────────────────────────────────────────────────
Saved the plan to: plan.tfplan
To perform exactly these actions, run the following command to apply:
terraform apply "plan.tfplan"
Show Conftest resultsWARN - plan.json - main - Missing Common Tags: ["aws_acm_certificate.client_vpn"]
WARN - plan.json - main - Missing Common Tags: ["aws_acm_certificate.notification-canada-ca"]
WARN - plan.json - main - Missing Common Tags: ["aws_acm_certificate.notification-canada-ca-alt[0]"]
WARN - plan.json - main - Missing Common Tags: ["aws_acmpca_certificate_authority.client_vpn"]
WARN - plan.json - main - Missing Common Tags: ["aws_alb.notification-canada-ca"]
WARN - plan.json - main - Missing Common Tags: ["aws_alb_listener.internal_alb_tls"]
WARN - plan.json - main - Missing Common Tags: ["aws_alb_listener.notification-canada-ca"]
WARN - plan.json - main - Missing Common Tags: ["aws_alb_target_group.internal_nginx_http"]
WARN - plan.json - main - Missing Common Tags: ["aws_alb_target_group.notification-canada-ca-admin"]
WARN - plan.json - main - Missing Common Tags: ["aws_alb_target_group.notification-canada-ca-api"]
WARN - plan.json - main - Missing Common Tags: ["aws_alb_target_group.notification-canada-ca-document"]
WARN - plan.json - main - Missing Common Tags: ["aws_alb_target_group.notification-canada-ca-document-api"]
WARN - plan.json - main - Missing Common Tags: ["aws_alb_target_group.notification-canada-ca-documentation"]
WARN - plan.json - main - Missing Common Tags: ["aws_cloudwatch_log_group.notification-canada-ca-eks-application-logs[0]"]
WARN - plan.json - main - Missing Common Tags: ["aws_cloudwatch_log_group.notification-canada-ca-eks-cluster-logs[0]"]
WARN - plan.json - main - Missing Common Tags: ["aws_cloudwatch_log_group.notification-canada-ca-eks-prometheus-logs[0]"]
WARN - plan.json - main - Missing Common Tags: ["aws_cloudwatch_metric_alarm.admin-evicted-pods[0]"]
WARN - plan.json - main - Missing Common Tags: ["aws_cloudwatch_metric_alarm.admin-pods-high-cpu-warning[0]"]
WARN - plan.json - main - Missing Common Tags: ["aws_cloudwatch_metric_alarm.admin-pods-high-memory-warning[0]"]
WARN - plan.json - main - Missing Common Tags:... |
Updating alarms ⏰? Great! Please update the Google Sheet and add a 👍 to this message after 🙏 |
Updating alarms ⏰? Great! Please update the Google Sheet and add a 👍 to this message after 🙏 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes looks ok
Summary | Résumé
Add 2 warnings for getting ThrottlingExceptions (ie we've sent AWS to many SMS at once)
Related Issues | Cartes liées
Before merging this PR
Read code suggestions left by the
cds-ai-codereviewer bot. Address
valid suggestions and shortly write down reasons to not address others. To help
with the classification of the comments, please use these reactions on each of the
comments made by the AI review:
The classifications will be extracted and summarized into an analysis of how helpful
or not the AI code review really is.
Test instructions | Instructions pour tester la modification
Trigger in staging by uploading a large sms send to 613-555-01** numbers.
Release Instructions | Instructions pour le déploiement
None.
Reviewer checklist | Liste de vérification du réviseur