Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix alarms with missing data #1510

Merged
merged 2 commits into from
Sep 3, 2024
Merged

Conversation

sastels
Copy link
Contributor

@sastels sastels commented Sep 3, 2024

Summary | Résumé

The pinpoint-sms-success-rate-warning alarm doesn't have any data (see discussion here)

Looking at the alarm in Metrics and removing the unit parameter caused the graph to have data. This is an optional field so I believe deleting it should be fine (especially if it fixes the metrics / alarms!)

Also fixed the total-sms-spending-* alarms - the Pinpoint spending was not appearing (AWS/SMSVoice). Removing the units fixed these.

Related Issues | Cartes liées

Didn't make one 😬

Test instructions | Instructions pour tester la modification

Check that data appears. Should have signifcant spending in total-sms-spending-warning (with the Pinpoint now counted). Can view the alarm in metrics and plot all the metrics used in the calculations.

Release Instructions | Instructions pour le déploiement

Other metrics might be affected similarly...

Reviewer checklist | Liste de vérification du réviseur

  • This PR does not break existing functionality.
  • This PR does not violate GCNotify's privacy policies.
  • This PR does not raise new security concerns. Refer to our GC Notify Risk Register document on our Google drive.
  • This PR does not significantly alter performance.
  • Additional required documentation resulting of these changes is covered (such as the README, setup instructions, a related ADR or the technical documentation).

⚠ If boxes cannot be checked off before merging the PR, they should be moved to the "Release Instructions" section with appropriate steps required to verify before release. For example, changes to celery code may require tests on staging to verify that performance has not been affected.

Copy link

github-actions bot commented Sep 3, 2024

Staging: pinpoint_to_sqs_sms_callbacks

✅   Terraform Init: success
✅   Terraform Validate: success
✅   Terraform Format: success
✅   Terraform Plan: success
✅   Conftest: success

Plan: 0 to add, 4 to change, 0 to destroy
Show summary
CHANGE NAME
update aws_cloudwatch_metric_alarm.pinpoint-sms-success-rate-critical[0]
aws_cloudwatch_metric_alarm.pinpoint-sms-success-rate-warning[0]
aws_cloudwatch_metric_alarm.total-sms-spending-critical[0]
aws_cloudwatch_metric_alarm.total-sms-spending-warning[0]
Show plan
Resource actions are indicated with the following symbols:
  ~ update in-place

Terraform will perform the following actions:

  # aws_cloudwatch_metric_alarm.pinpoint-sms-success-rate-critical[0] will be updated in-place
  ~ resource "aws_cloudwatch_metric_alarm" "pinpoint-sms-success-rate-critical" {
        id                        = "pinpoint-sms-success-rate-critical"
        tags                      = {}
        # (15 unchanged attributes hidden)

      - metric_query {
          - id          = "failures" -> null
          - period      = 0 -> null
          - return_data = false -> null

          - metric {
              - dimensions  = {} -> null
              - metric_name = "pinpoint-sms-failures" -> null
              - namespace   = "LogMetrics" -> null
              - period      = 43200 -> null
              - stat        = "Sum" -> null
              - unit        = "Count" -> null
            }
        }
      - metric_query {
          - id          = "successes" -> null
          - period      = 0 -> null
          - return_data = false -> null

          - metric {
              - dimensions  = {} -> null
              - metric_name = "pinpoint-sms-successes" -> null
              - namespace   = "LogMetrics" -> null
              - period      = 43200 -> null
              - stat        = "Sum" -> null
              - unit        = "Count" -> null
            }
        }
      - metric_query {
          - expression  = "successes / (successes + failures)" -> null
          - id          = "success_rate" -> null
          - label       = "Success Rate" -> null
          - period      = 0 -> null
          - return_data = true -> null
        }
      + metric_query {
          + id          = "failures"
          + return_data = false

          + metric {
              + metric_name = "pinpoint-sms-failures"
              + namespace   = "LogMetrics"
              + period      = 43200
              + stat        = "Sum"
            }
        }
      + metric_query {
          + id          = "successes"
          + return_data = false

          + metric {
              + metric_name = "pinpoint-sms-successes"
              + namespace   = "LogMetrics"
              + period      = 43200
              + stat        = "Sum"
            }
        }
      + metric_query {
          + expression  = "successes / (successes + failures)"
          + id          = "success_rate"
          + label       = "Success Rate"
          + return_data = true
        }
    }

  # aws_cloudwatch_metric_alarm.pinpoint-sms-success-rate-warning[0] will be updated in-place
  ~ resource "aws_cloudwatch_metric_alarm" "pinpoint-sms-success-rate-warning" {
        id                        = "pinpoint-sms-success-rate-warning"
        tags                      = {}
        # (15 unchanged attributes hidden)

      - metric_query {
          - id          = "failures" -> null
          - period      = 0 -> null
          - return_data = false -> null

          - metric {
              - dimensions  = {} -> null
              - metric_name = "pinpoint-sms-failures" -> null
              - namespace   = "LogMetrics" -> null
              - period      = 43200 -> null
              - stat        = "Sum" -> null
              - unit        = "Count" -> null
            }
        }
      - metric_query {
          - id          = "successes" -> null
          - period      = 0 -> null
          - return_data = false -> null

          - metric {
              - dimensions  = {} -> null
              - metric_name = "pinpoint-sms-successes" -> null
              - namespace   = "LogMetrics" -> null
              - period      = 43200 -> null
              - stat        = "Sum" -> null
              - unit        = "Count" -> null
            }
        }
      - metric_query {
          - expression  = "successes / (successes + failures)" -> null
          - id          = "success_rate" -> null
          - label       = "Success Rate" -> null
          - period      = 0 -> null
          - return_data = true -> null
        }
      + metric_query {
          + id          = "failures"
          + return_data = false

          + metric {
              + metric_name = "pinpoint-sms-failures"
              + namespace   = "LogMetrics"
              + period      = 43200
              + stat        = "Sum"
            }
        }
      + metric_query {
          + id          = "successes"
          + return_data = false

          + metric {
              + metric_name = "pinpoint-sms-successes"
              + namespace   = "LogMetrics"
              + period      = 43200
              + stat        = "Sum"
            }
        }
      + metric_query {
          + expression  = "successes / (successes + failures)"
          + id          = "success_rate"
          + label       = "Success Rate"
          + return_data = true
        }
    }

  # aws_cloudwatch_metric_alarm.total-sms-spending-critical[0] will be updated in-place
  ~ resource "aws_cloudwatch_metric_alarm" "total-sms-spending-critical" {
        id                        = "total-sms-spending-critical"
        tags                      = {}
        # (15 unchanged attributes hidden)

      - metric_query {
          - id          = "pinpoint_spend" -> null
          - period      = 0 -> null
          - return_data = false -> null

          - metric {
              - dimensions  = {} -> null
              - metric_name = "TextMessageMonthlySpend" -> null
              - namespace   = "AWS/SMSVoice" -> null
              - period      = 300 -> null
              - stat        = "Maximum" -> null
              - unit        = "Count" -> null
            }
        }
      - metric_query {
          - id          = "sns_spend" -> null
          - period      = 0 -> null
          - return_data = false -> null

          - metric {
              - dimensions  = {} -> null
              - metric_name = "SMSMonthToDateSpentUSD" -> null
              - namespace   = "AWS/SNS" -> null
              - period      = 300 -> null
              - stat        = "Maximum" -> null
              - unit        = "Count" -> null
            }
        }
      - metric_query {
          - expression  = "sns_spend + pinpoint_spend" -> null
          - id          = "total_spend" -> null
          - label       = "Total SMS Monthly Spend" -> null
          - period      = 0 -> null
          - return_data = true -> null
        }
      + metric_query {
          + id          = "pinpoint_spend"
          + return_data = false

          + metric {
              + metric_name = "TextMessageMonthlySpend"
              + namespace   = "AWS/SMSVoice"
              + period      = 300
              + stat        = "Maximum"
            }
        }
      + metric_query {
          + id          = "sns_spend"
          + return_data = false

          + metric {
              + metric_name = "SMSMonthToDateSpentUSD"
              + namespace   = "AWS/SNS"
              + period      = 300
              + stat        = "Maximum"
            }
        }
      + metric_query {
          + expression  = "sns_spend + pinpoint_spend"
          + id          = "total_spend"
          + label       = "Total SMS Monthly Spend"
          + return_data = true
        }
    }

  # aws_cloudwatch_metric_alarm.total-sms-spending-warning[0] will be updated in-place
  ~ resource "aws_cloudwatch_metric_alarm" "total-sms-spending-warning" {
        id                        = "total-sms-spending-warning"
        tags                      = {}
        # (15 unchanged attributes hidden)

      - metric_query {
          - id          = "pinpoint_spend" -> null
          - period      = 0 -> null
          - return_data = false -> null

          - metric {
              - dimensions  = {} -> null
              - metric_name = "TextMessageMonthlySpend" -> null
              - namespace   = "AWS/SMSVoice" -> null
              - period      = 300 -> null
              - stat        = "Maximum" -> null
              - unit        = "Count" -> null
            }
        }
      - metric_query {
          - id          = "sns_spend" -> null
          - period      = 0 -> null
          - return_data = false -> null

          - metric {
              - dimensions  = {} -> null
              - metric_name = "SMSMonthToDateSpentUSD" -> null
              - namespace   = "AWS/SNS" -> null
              - period      = 300 -> null
              - stat        = "Maximum" -> null
              - unit        = "Count" -> null
            }
        }
      - metric_query {
          - expression  = "sns_spend + pinpoint_spend" -> null
          - id          = "total_spend" -> null
          - label       = "Total SMS Monthly Spend" -> null
          - period      = 0 -> null
          - return_data = true -> null
        }
      + metric_query {
          + id          = "pinpoint_spend"
          + return_data = false

          + metric {
              + metric_name = "TextMessageMonthlySpend"
              + namespace   = "AWS/SMSVoice"
              + period      = 300
              + stat        = "Maximum"
            }
        }
      + metric_query {
          + id          = "sns_spend"
          + return_data = false

          + metric {
              + metric_name = "SMSMonthToDateSpentUSD"
              + namespace   = "AWS/SNS"
              + period      = 300
              + stat        = "Maximum"
            }
        }
      + metric_query {
          + expression  = "sns_spend + pinpoint_spend"
          + id          = "total_spend"
          + label       = "Total SMS Monthly Spend"
          + return_data = true
        }
    }

Plan: 0 to add, 4 to change, 0 to destroy.

─────────────────────────────────────────────────────────────────────────────

Saved the plan to: plan.tfplan

To perform exactly these actions, run the following command to apply:
    terraform apply "plan.tfplan"
Show Conftest results
WARN - plan.json - main - Missing Common Tags: ["aws_cloudwatch_log_group.pinpoint_deliveries"]
WARN - plan.json - main - Missing Common Tags: ["aws_cloudwatch_log_group.pinpoint_deliveries_failures"]
WARN - plan.json - main - Missing Common Tags: ["aws_cloudwatch_log_group.pinpoint_to_sqs_sms_callbacks_log_group[0]"]
WARN - plan.json - main - Missing Common Tags: ["aws_cloudwatch_metric_alarm.lambda-image-pinpoint-delivery-receipts-errors-critical[0]"]
WARN - plan.json - main - Missing Common Tags: ["aws_cloudwatch_metric_alarm.lambda-image-pinpoint-delivery-receipts-errors-warning[0]"]
WARN - plan.json - main - Missing Common Tags: ["aws_cloudwatch_metric_alarm.logs-1-500-error-1-minute-warning-pinpoint_to_sqs_sms_callbacks-api[0]"]
WARN - plan.json - main - Missing Common Tags: ["aws_cloudwatch_metric_alarm.logs-10-500-error-5-minutes-critical-pinpoint_to_sqs_sms_callbacks-api[0]"]
WARN - plan.json - main - Missing Common Tags: ["aws_cloudwatch_metric_alarm.pinpoint-sms-blocked-as-spam-warning[0]"]
WARN - plan.json - main - Missing Common Tags: ["aws_cloudwatch_metric_alarm.pinpoint-sms-phone-carrier-unavailable-warning[0]"]
WARN - plan.json - main - Missing Common Tags: ["aws_cloudwatch_metric_alarm.pinpoint-sms-rate-exceeded-warning[0]"]
WARN - plan.json - main - Missing Common Tags: ["aws_cloudwatch_metric_alarm.pinpoint-sms-success-rate-critical[0]"]
WARN - plan.json - main - Missing Common Tags: ["aws_cloudwatch_metric_alarm.pinpoint-sms-success-rate-warning[0]"]
WARN - plan.json - main - Missing Common Tags: ["aws_cloudwatch_metric_alarm.total-sms-spending-critical[0]"]
WARN - plan.json - main - Missing Common Tags: ["aws_cloudwatch_metric_alarm.total-sms-spending-warning[0]"]
WARN - plan.json - main - Missing Common Tags: ["aws_iam_policy.pinpoint_logs"]
WARN - plan.json - main - Missing Common Tags: ["aws_iam_role.pinpoint_logs"]

35 tests, 19 passed, 16 warnings, 0 failures, 0 exceptions

@sastels sastels marked this pull request as ready for review September 3, 2024 17:58
@sastels sastels requested a review from a team September 3, 2024 17:58
Copy link

github-actions bot commented Sep 3, 2024

Updating alarms ⏰? Great! Please update the Google Sheet and add a 👍 to this message after 🙏

@sastels sastels changed the title remove optional unit param Fix alarms with missing data Sep 3, 2024
@cds-snc cds-snc deleted a comment from github-actions bot Sep 3, 2024
@sastels sastels merged commit 2c56186 into main Sep 3, 2024
27 checks passed
@sastels sastels deleted the fix-pinpoint-sms-success-alarm-metrics branch September 3, 2024 18:16
@sastels sastels mentioned this pull request Sep 4, 2024
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants