-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Pinpoint metrics and alarms #1354
Conversation
…-terraform into add-pinpoint-alarms
Updating alarms ⏰? Great! Please update the Google Sheet and add a 👍 to this message after 🙏 |
…-terraform into add-pinpoint-alarms
@@ -39,6 +40,7 @@ inputs = { | |||
sqs_deliver_receipts_queue_arn = dependency.common.outputs.sqs_deliver_receipts_queue_arn | |||
pinpoint_to_sqs_sms_callbacks_ecr_repository_url = dependency.ecr.outputs.pinpoint_to_sqs_sms_callbacks_ecr_repository_url | |||
pinpoint_to_sqs_sms_callbacks_ecr_arn = dependency.ecr.outputs.pinpoint_to_sqs_sms_callbacks_ecr_arn | |||
pinpoint_monthly_spend_limit = dependency.common.outputs.sns_monthly_spend_limit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we're using the sns monthly limit as the pinpoint limit as well? Assuming these are two different figures, should we sum the two costs to get the overall SMS limit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤷 That's probably a good idea. Just keep one alarm but make it a combined one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok for now I left the existing SNS alarms alone but made the new pinpoint alarm a combined alarm. Once we're happy that the new alarm is correct we can get rid of the old alarm
Updating alarms ⏰? Great! Please update the Google Sheet and add a 👍 to this message after 🙏 |
99dc429
to
010cec8
Compare
Staging: common✅ Terraform Init: Plan: 0 to add, 0 to change, 0 to destroy Show summary
Show planChanges to Outputs:
+ sns_monthly_spend_limit = 100
You can apply this plan to save these new output values to the Terraform
state, without changing any real infrastructure.
Warning: Argument is deprecated
with aws_s3_bucket.csv_bucket,
on s3.tf line 5, in resource "aws_s3_bucket" "csv_bucket":
5: resource "aws_s3_bucket" "csv_bucket" {
Use the aws_s3_bucket_server_side_encryption_configuration resource instead
(and 69 more similar warnings elsewhere)
─────────────────────────────────────────────────────────────────────────────
Saved the plan to: plan.tfplan
To perform exactly these actions, run the following command to apply:
terraform apply "plan.tfplan"
Show Conftest resultsWARN - plan.json - main - Missing Common Tags: ["aws_athena_workgroup.ad_hoc"]
WARN - plan.json - main - Missing Common Tags: ["aws_athena_workgroup.build_tables"]
WARN - plan.json - main - Missing Common Tags: ["aws_athena_workgroup.primary"]
WARN - plan.json - main - Missing Common Tags: ["aws_athena_workgroup.support"]
WARN - plan.json - main - Missing Common Tags: ["aws_cloudwatch_event_rule.aws_health[0]"]
WARN - plan.json - main - Missing Common Tags: ["aws_cloudwatch_log_group.route53_resolver_query_log[0]"]
WARN - plan.json - main - Missing Common Tags: ["aws_cloudwatch_log_group.sns_deliveries[0]"]
WARN - plan.json - main - Missing Common Tags: ["aws_cloudwatch_log_group.sns_deliveries_failures[0]"]
WARN - plan.json - main - Missing Common Tags: ["aws_cloudwatch_log_group.sns_deliveries_failures_us_west_2[0]"]
WARN - plan.json - main - Missing Common Tags: ["aws_cloudwatch_log_group.sns_deliveries_us_west_2[0]"]
WARN - plan.json - main - Missing Common Tags: ["aws_cloudwatch_metric_alarm.bulk-bulk-not-being-processed-critical[0]"]
WARN - plan.json - main - Missing Common Tags: ["aws_cloudwatch_metric_alarm.bulk-bulk-not-being-processed-warning[0]"]
WARN - plan.json - main - Missing Common Tags: ["aws_cloudwatch_metric_alarm.bulk-inflights-not-being-processed-critical[0]"]
WARN - plan.json - main - Missing Common Tags: ["aws_cloudwatch_metric_alarm.bulk-inflights-not-being-processed-warning[0]"]
WARN - plan.json - main - Missing Common Tags: ["aws_cloudwatch_metric_alarm.bulk-not-being-processed-critical[0]"]
WARN - plan.json - main - Missing Common Tags: ["aws_cloudwatch_metric_alarm.bulk-not-being-processed-warning[0]"]
WARN - plan.json - main - Missing Common Tags: ["aws_cloudwatch_metric_alarm.contact-3-500-error-15-minutes-critical[0]"]
WARN - plan.json - main - Missing Common Tags: ["aws_cloudwatch_metric_alarm.document-download-bucket-size-warning[0]"]
WARN - plan.json - main - Missing Common Tags:... |
Staging: quicksight✅ Terraform Init: Plan: 0 to add, 1 to change, 0 to destroy Show summary
Show planResource actions are indicated with the following symbols:
~ update in-place
Terraform will perform the following actions:
# aws_s3_object.manifest_file will be updated in-place
~ resource "aws_s3_object" "manifest_file" {
~ etag = "4f558e8d8cdbbf914a95755cbda61968" -> "221f592f333f2fc284626cfdb8c4bc80"
id = "quicksight/s3-manifest-sms-usage.json"
tags = {}
+ version_id = (known after apply)
# (11 unchanged attributes hidden)
}
Plan: 0 to add, 1 to change, 0 to destroy.
─────────────────────────────────────────────────────────────────────────────
Saved the plan to: plan.tfplan
To perform exactly these actions, run the following command to apply:
terraform apply "plan.tfplan"
Show Conftest resultsWARN - plan.json - main - Missing Common Tags: ["aws_cloudformation_stack.sms-usage-notifications"]
WARN - plan.json - main - Missing Common Tags: ["aws_iam_policy.quicksight-rds"]
WARN - plan.json - main - Missing Common Tags: ["aws_iam_policy.quicksight-s3-usage"]
WARN - plan.json - main - Missing Common Tags: ["aws_iam_policy.quicksight_vpc_connection_ec2"]
WARN - plan.json - main - Missing Common Tags: ["aws_iam_policy.quicksight_vpc_connection_iam"]
WARN - plan.json - main - Missing Common Tags: ["aws_iam_role.quicksight"]
WARN - plan.json - main - Missing Common Tags: ["aws_iam_role.vpc_connection_role"]
WARN - plan.json - main - Missing Common Tags: ["aws_quicksight_data_set.jobs"]
WARN - plan.json - main - Missing Common Tags: ["aws_quicksight_data_set.login_events"]
WARN - plan.json - main - Missing Common Tags: ["aws_quicksight_data_set.notifications"]
WARN - plan.json - main - Missing Common Tags: ["aws_quicksight_data_set.organisation"]
WARN - plan.json - main - Missing Common Tags: ["aws_quicksight_data_set.send_rate"]
WARN - plan.json - main - Missing Common Tags: ["aws_quicksight_data_set.services"]
WARN - plan.json - main - Missing Common Tags: ["aws_quicksight_data_set.sms_usage"]
WARN - plan.json - main - Missing Common Tags: ["aws_quicksight_data_set.templates"]
WARN - plan.json - main - Missing Common Tags: ["aws_quicksight_data_set.users"]
WARN - plan.json - main - Missing Common Tags: ["aws_quicksight_data_source.rds"]
WARN - plan.json - main - Missing Common Tags: ["aws_quicksight_data_source.s3_sms_usage"]
WARN - plan.json - main - Missing Common Tags: ["aws_quicksight_vpc_connection.rds"]
WARN - plan.json - main - Missing Common Tags: ["aws_s3_object.manifest_file"]
39 tests, 19 passed, 20 warnings, 0 failures, 0 exceptions
|
Staging: pinpoint_to_sqs_sms_callbacks✅ Terraform Init: Plan: 12 to add, 0 to change, 0 to destroy Show summary
Show planResource actions are indicated with the following symbols:
+ create
Terraform will perform the following actions:
# aws_cloudwatch_log_metric_filter.pinpoint-sms-blocked-as-spam[0] will be created
+ resource "aws_cloudwatch_log_metric_filter" "pinpoint-sms-blocked-as-spam" {
+ id = (known after apply)
+ log_group_name = "sns/ca-central-1/239043911459/PinPointDirectPublishToPhoneNumber/Failure"
+ name = "pinpoint-sms-blocked-as-spam"
+ pattern = "{ $.messageStatus = \"SPAM\" }"
+ metric_transformation {
+ default_value = "0"
+ name = "pinpoint-sms-blocked-as-spam"
+ namespace = "LogMetrics"
+ unit = "None"
+ value = "1"
}
}
# aws_cloudwatch_log_metric_filter.pinpoint-sms-failures[0] will be created
+ resource "aws_cloudwatch_log_metric_filter" "pinpoint-sms-failures" {
+ id = (known after apply)
+ log_group_name = "sns/ca-central-1/239043911459/PinPointDirectPublishToPhoneNumber/Failure"
+ name = "pinpoint-sms-failures"
+ pattern = "{ ($.isFinal IS TRUE) && ( ($.messageStatus != \"SUCCESSFUL\") && ($.messageStatus != \"DELIVERED\") ) }"
+ metric_transformation {
+ default_value = "0"
+ name = "pinpoint-sms-failures"
+ namespace = "LogMetrics"
+ unit = "None"
+ value = "1"
}
}
# aws_cloudwatch_log_metric_filter.pinpoint-sms-phone-carrier-unavailable[0] will be created
+ resource "aws_cloudwatch_log_metric_filter" "pinpoint-sms-phone-carrier-unavailable" {
+ id = (known after apply)
+ log_group_name = "sns/ca-central-1/239043911459/PinPointDirectPublishToPhoneNumber/Failure"
+ name = "pinpoint-sms-phone-carrier-unavailable"
+ pattern = "{ $.messageStatus = \"CARRIER_UNREACHABLE\" }"
+ metric_transformation {
+ default_value = "0"
+ name = "pinpoint-sms-phone-carrier-unavailable"
+ namespace = "LogMetrics"
+ unit = "None"
+ value = "1"
}
}
# aws_cloudwatch_log_metric_filter.pinpoint-sms-rate-exceeded[0] will be created
+ resource "aws_cloudwatch_log_metric_filter" "pinpoint-sms-rate-exceeded" {
+ id = (known after apply)
+ log_group_name = "sns/ca-central-1/239043911459/PinPointDirectPublishToPhoneNumber/Failure"
+ name = "pinpoint-sms-rate-exceeded"
+ pattern = "{ $.messageStatusDescription = \"Rate exceeded.\" }"
+ metric_transformation {
+ default_value = "0"
+ name = "pinpoint-sms-rate-exceeded"
+ namespace = "LogMetrics"
+ unit = "None"
+ value = "1"
}
}
# aws_cloudwatch_log_metric_filter.pinpoint-sms-successes[0] will be created
+ resource "aws_cloudwatch_log_metric_filter" "pinpoint-sms-successes" {
+ id = (known after apply)
+ log_group_name = "sns/ca-central-1/239043911459/PinPointDirectPublishToPhoneNumber"
+ name = "pinpoint-sms-successes"
+ pattern = "{ ($.isFinal IS TRUE) && ( ($.messageStatus = \"SUCCESSFUL\") || ($.messageStatus = \"DELIVERED\") ) }"
+ metric_transformation {
+ default_value = "0"
+ name = "pinpoint-sms-successes"
+ namespace = "LogMetrics"
+ unit = "None"
+ value = "1"
}
}
# aws_cloudwatch_metric_alarm.pinpoint-sms-blocked-as-spam-warning[0] will be created
+ resource "aws_cloudwatch_metric_alarm" "pinpoint-sms-blocked-as-spam-warning" {
+ actions_enabled = true
+ alarm_actions = [
+ "arn:aws:sns:ca-central-1:239043911459:alert-warning",
]
+ alarm_description = "More than 10 Pinpoint SMS have been blocked as spam over 12 hours"
+ alarm_name = "pinpoint-sms-blocked-as-spam-warning"
+ arn = (known after apply)
+ comparison_operator = "GreaterThanOrEqualToThreshold"
+ evaluate_low_sample_count_percentiles = (known after apply)
+ evaluation_periods = 1
+ id = (known after apply)
+ metric_name = "pinpoint-sms-blocked-as-spam"
+ namespace = "LogMetrics"
+ period = 43200
+ statistic = "Sum"
+ tags_all = (known after apply)
+ threshold = 10
+ treat_missing_data = "notBreaching"
}
# aws_cloudwatch_metric_alarm.pinpoint-sms-phone-carrier-unavailable-warning[0] will be created
+ resource "aws_cloudwatch_metric_alarm" "pinpoint-sms-phone-carrier-unavailable-warning" {
+ actions_enabled = true
+ alarm_actions = [
+ "arn:aws:sns:ca-central-1:239043911459:alert-warning",
]
+ alarm_description = "More than 100 Pinpoint SMS failed because a phone carrier is unavailable over 3 hours"
+ alarm_name = "pinpoint-sms-phone-carrier-unavailable-warning"
+ arn = (known after apply)
+ comparison_operator = "GreaterThanOrEqualToThreshold"
+ evaluate_low_sample_count_percentiles = (known after apply)
+ evaluation_periods = 1
+ id = (known after apply)
+ metric_name = "pinpoint-sms-phone-carrier-unavailable"
+ namespace = "LogMetrics"
+ period = 10800
+ statistic = "Sum"
+ tags_all = (known after apply)
+ threshold = 100
+ treat_missing_data = "notBreaching"
}
# aws_cloudwatch_metric_alarm.pinpoint-sms-rate-exceeded-warning[0] will be created
+ resource "aws_cloudwatch_metric_alarm" "pinpoint-sms-rate-exceeded-warning" {
+ actions_enabled = true
+ alarm_actions = [
+ "arn:aws:sns:ca-central-1:239043911459:alert-warning",
]
+ alarm_description = "At least 1 Pinpoint SMS rate exceeded error in 5 minutes"
+ alarm_name = "pinpoint-sms-rate-exceeded-warning"
+ arn = (known after apply)
+ comparison_operator = "GreaterThanOrEqualToThreshold"
+ evaluate_low_sample_count_percentiles = (known after apply)
+ evaluation_periods = 1
+ id = (known after apply)
+ metric_name = "pinpoint-sms-rate-exceeded"
+ namespace = "LogMetrics"
+ period = 300
+ statistic = "Sum"
+ tags_all = (known after apply)
+ threshold = 1
+ treat_missing_data = "notBreaching"
}
# aws_cloudwatch_metric_alarm.pinpoint-sms-success-rate-canadian-numbers-critical[0] will be created
+ resource "aws_cloudwatch_metric_alarm" "pinpoint-sms-success-rate-canadian-numbers-critical" {
+ actions_enabled = true
+ alarm_actions = [
+ "arn:aws:sns:ca-central-1:239043911459:alert-critical",
]
+ alarm_description = "Pinpoint SMS success rate to Canadian numbers is below 25% over 2 consecutive periods of 12 hours"
+ alarm_name = "pinpoint-sms-success-rate-canadian-numbers-critical"
+ arn = (known after apply)
+ comparison_operator = "LessThanThreshold"
+ datapoints_to_alarm = 2
+ evaluate_low_sample_count_percentiles = (known after apply)
+ evaluation_periods = 2
+ id = (known after apply)
+ ok_actions = [
+ "arn:aws:sns:ca-central-1:239043911459:alert-ok",
]
+ tags_all = (known after apply)
+ threshold = 0.25
+ treat_missing_data = "notBreaching"
+ metric_query {
+ id = "failures"
+ return_data = false
+ metric {
+ metric_name = "pinpoint-sms-failures"
+ namespace = "LogMetrics"
+ period = 43200
+ stat = "Sum"
+ unit = "Count"
}
}
+ metric_query {
+ id = "successes"
+ return_data = false
+ metric {
+ metric_name = "pinpoint-sms-successes"
+ namespace = "LogMetrics"
+ period = 43200
+ stat = "Sum"
+ unit = "Count"
}
}
+ metric_query {
+ expression = "successes / (successes + failures)"
+ id = "success_rate"
+ label = "Success Rate"
+ return_data = true
}
}
# aws_cloudwatch_metric_alarm.pinpoint-sms-success-rate-warning[0] will be created
+ resource "aws_cloudwatch_metric_alarm" "pinpoint-sms-success-rate-warning" {
+ actions_enabled = true
+ alarm_actions = [
+ "arn:aws:sns:ca-central-1:239043911459:alert-warning",
]
+ alarm_description = "Pinpoint SMS success rate is below 60% over 2 consecutive periods of 12 hours"
+ alarm_name = "pinpoint-sms-success-rate-warning"
+ arn = (known after apply)
+ comparison_operator = "LessThanThreshold"
+ datapoints_to_alarm = 2
+ evaluate_low_sample_count_percentiles = (known after apply)
+ evaluation_periods = 2
+ id = (known after apply)
+ tags_all = (known after apply)
+ threshold = 0.6
+ treat_missing_data = "notBreaching"
+ metric_query {
+ id = "failures"
+ return_data = false
+ metric {
+ metric_name = "pinpoint-sms-failures"
+ namespace = "LogMetrics"
+ period = 43200
+ stat = "Sum"
+ unit = "Count"
}
}
+ metric_query {
+ id = "successes"
+ return_data = false
+ metric {
+ metric_name = "pinpoint-sms-successes"
+ namespace = "LogMetrics"
+ period = 43200
+ stat = "Sum"
+ unit = "Count"
}
}
+ metric_query {
+ expression = "successes / (successes + failures)"
+ id = "success_rate"
+ label = "Success Rate"
+ return_data = true
}
}
# aws_cloudwatch_metric_alarm.total-sms-spending-critical[0] will be created
+ resource "aws_cloudwatch_metric_alarm" "total-sms-spending-critical" {
+ actions_enabled = true
+ alarm_actions = [
+ "arn:aws:sns:ca-central-1:239043911459:alert-warning",
]
+ alarm_description = "SMS spending reached 90% of limit this month"
+ alarm_name = "total-sms-spending-critical"
+ arn = (known after apply)
+ comparison_operator = "GreaterThanOrEqualToThreshold"
+ evaluate_low_sample_count_percentiles = (known after apply)
+ evaluation_periods = 1
+ id = (known after apply)
+ tags_all = (known after apply)
+ threshold = 0.9
+ treat_missing_data = "notBreaching"
+ metric_query {
+ id = "pinpoint_spend"
+ return_data = false
+ metric {
+ metric_name = "TextMessageMonthlySpend"
+ namespace = "AWS/SMSVoice"
+ period = 300
+ stat = "Maximum"
+ unit = "Count"
}
}
+ metric_query {
+ id = "sns_spend"
+ return_data = false
+ metric {
+ metric_name = "SMSMonthToDateSpentUSD"
+ namespace = "AWS/SNS"
+ period = 300
+ stat = "Maximum"
+ unit = "Count"
}
}
+ metric_query {
+ expression = "sns_spend + pinpoint_spend"
+ id = "total_spend"
+ label = "Total SMS Monthly Spend"
+ return_data = true
}
}
# aws_cloudwatch_metric_alarm.total-sms-spending-warning[0] will be created
+ resource "aws_cloudwatch_metric_alarm" "total-sms-spending-warning" {
+ actions_enabled = true
+ alarm_actions = [
+ "arn:aws:sns:ca-central-1:239043911459:alert-warning",
]
+ alarm_description = "SMS spending reached 80% of limit this month"
+ alarm_name = "total-sms-spending-warning"
+ arn = (known after apply)
+ comparison_operator = "GreaterThanOrEqualToThreshold"
+ evaluate_low_sample_count_percentiles = (known after apply)
+ evaluation_periods = 1
+ id = (known after apply)
+ tags_all = (known after apply)
+ threshold = 0.8
+ treat_missing_data = "notBreaching"
+ metric_query {
+ id = "pinpoint_spend"
+ return_data = false
+ metric {
+ metric_name = "TextMessageMonthlySpend"
+ namespace = "AWS/SMSVoice"
+ period = 300
+ stat = "Maximum"
+ unit = "Count"
}
}
+ metric_query {
+ id = "sns_spend"
+ return_data = false
+ metric {
+ metric_name = "SMSMonthToDateSpentUSD"
+ namespace = "AWS/SNS"
+ period = 300
+ stat = "Maximum"
+ unit = "Count"
}
}
+ metric_query {
+ expression = "sns_spend + pinpoint_spend"
+ id = "total_spend"
+ label = "Total SMS Monthly Spend"
+ return_data = true
}
}
Plan: 12 to add, 0 to change, 0 to destroy.
─────────────────────────────────────────────────────────────────────────────
Saved the plan to: plan.tfplan
To perform exactly these actions, run the following command to apply:
terraform apply "plan.tfplan"
Show Conftest resultsWARN - plan.json - main - Missing Common Tags: ["aws_cloudwatch_log_group.pinpoint_deliveries"]
WARN - plan.json - main - Missing Common Tags: ["aws_cloudwatch_log_group.pinpoint_deliveries_failures"]
WARN - plan.json - main - Missing Common Tags: ["aws_cloudwatch_log_group.pinpoint_to_sqs_sms_callbacks_log_group[0]"]
WARN - plan.json - main - Missing Common Tags: ["aws_cloudwatch_metric_alarm.lambda-image-pinpoint-delivery-receipts-errors-critical[0]"]
WARN - plan.json - main - Missing Common Tags: ["aws_cloudwatch_metric_alarm.lambda-image-pinpoint-delivery-receipts-errors-warning[0]"]
WARN - plan.json - main - Missing Common Tags: ["aws_cloudwatch_metric_alarm.logs-1-500-error-1-minute-warning-pinpoint_to_sqs_sms_callbacks-api[0]"]
WARN - plan.json - main - Missing Common Tags: ["aws_cloudwatch_metric_alarm.logs-10-500-error-5-minutes-critical-pinpoint_to_sqs_sms_callbacks-api[0]"]
WARN - plan.json - main - Missing Common Tags: ["aws_cloudwatch_metric_alarm.pinpoint-sms-blocked-as-spam-warning[0]"]
WARN - plan.json - main - Missing Common Tags: ["aws_cloudwatch_metric_alarm.pinpoint-sms-phone-carrier-unavailable-warning[0]"]
WARN - plan.json - main - Missing Common Tags: ["aws_cloudwatch_metric_alarm.pinpoint-sms-rate-exceeded-warning[0]"]
WARN - plan.json - main - Missing Common Tags: ["aws_cloudwatch_metric_alarm.pinpoint-sms-success-rate-canadian-numbers-critical[0]"]
WARN - plan.json - main - Missing Common Tags: ["aws_cloudwatch_metric_alarm.pinpoint-sms-success-rate-warning[0]"]
WARN - plan.json - main - Missing Common Tags: ["aws_cloudwatch_metric_alarm.total-sms-spending-critical[0]"]
WARN - plan.json - main - Missing Common Tags: ["aws_cloudwatch_metric_alarm.total-sms-spending-warning[0]"]
WARN - plan.json - main - Missing Common Tags: ["aws_iam_policy.pinpoint_logs"]
WARN - plan.json - main - Missing Common Tags: ["aws_iam_role.pinpoint_logs"]
35 tests, 19 passed, 16 warnings, 0 failures, 0 exceptions
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Summary | Résumé
Need to replicate the SNS / SMS alarms and dashboards
Related Issues | Cartes liées
Test instructions | Instructions pour tester la modification
Tested on dev (turning on
cloudwatch_enabled
so they'd get created)Release Instructions | Instructions pour le déploiement
None.
Reviewer checklist | Liste de vérification du réviseur