Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add warnings for throttling exceptions #1555

Merged
merged 5 commits into from
Sep 26, 2024
Merged

Conversation

sastels
Copy link
Contributor

@sastels sastels commented Sep 25, 2024

Summary | Résumé

Add 2 warnings for getting ThrottlingExceptions (ie we've sent AWS to many SMS at once)

Related Issues | Cartes liées

Before merging this PR

Read code suggestions left by the
cds-ai-codereviewer bot. Address
valid suggestions and shortly write down reasons to not address others. To help
with the classification of the comments, please use these reactions on each of the
comments made by the AI review:

Classification Reaction Emoticon
Useful +1 👍
Noisy eyes 👀
Hallucination confused 😕
Wrong but teachable rocket 🚀
Wrong and incorrect -1 👎

The classifications will be extracted and summarized into an analysis of how helpful
or not the AI code review really is.

Test instructions | Instructions pour tester la modification

Trigger in staging by uploading a large sms send to 613-555-01** numbers.

Release Instructions | Instructions pour le déploiement

None.

Reviewer checklist | Liste de vérification du réviseur

  • This PR does not break existing functionality.
  • This PR does not violate GCNotify's privacy policies.
  • This PR does not raise new security concerns. Refer to our GC Notify Risk Register document on our Google drive.
  • This PR does not significantly alter performance.
  • Additional required documentation resulting of these changes is covered (such as the README, setup instructions, a related ADR or the technical documentation).

⚠ If boxes cannot be checked off before merging the PR, they should be moved to the "Release Instructions" section with appropriate steps required to verify before release. For example, changes to celery code may require tests on staging to verify that performance has not been affected.

alarm_description = "Have received a throttling exception in the last minute"
comparison_operator = "GreaterThanOrEqualToThreshold"
evaluation_periods = "1"
metric_name = aws_cloudwatch_log_metric_filter.throttling-exceptions[0].metric_transformation[0].name

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The metric_name should be a string value, but it seems to be referencing an array element. Ensure that aws_cloudwatch_log_metric_filter.throttling-exceptions[0].metric_transformation[0].name is correctly defined and accessible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True that we'd have a problem if those [0] array elements didn't exist, but that doesn't have anything to do with the type of metric_name, and it's the pattern we use everywhere. (and we only declare all these things if cloudwatch_enabled is set so they all work together).

comparison_operator = "GreaterThanOrEqualToThreshold"
evaluation_periods = "1"
metric_name = aws_cloudwatch_log_metric_filter.throttling-exceptions[0].metric_transformation[0].name
namespace = aws_cloudwatch_log_metric_filter.throttling-exceptions[0].metric_transformation[0].namespace

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The namespace should be a string value, but it seems to be referencing an array element. Ensure that aws_cloudwatch_log_metric_filter.throttling-exceptions[0].metric_transformation[0].namespace is correctly defined and accessible.

alarm_description = "Have received 100 throttling exception in the last minute"
comparison_operator = "GreaterThanOrEqualToThreshold"
evaluation_periods = "1"
metric_name = aws_cloudwatch_log_metric_filter.throttling-exceptions[0].metric_transformation[0].name

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The metric_name should be a string value, but it seems to be referencing an array element. Ensure that aws_cloudwatch_log_metric_filter.throttling-exceptions[0].metric_transformation[0].name is correctly defined and accessible.

comparison_operator = "GreaterThanOrEqualToThreshold"
evaluation_periods = "1"
metric_name = aws_cloudwatch_log_metric_filter.throttling-exceptions[0].metric_transformation[0].name
namespace = aws_cloudwatch_log_metric_filter.throttling-exceptions[0].metric_transformation[0].namespace

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The namespace should be a string value, but it seems to be referencing an array element. Ensure that aws_cloudwatch_log_metric_filter.throttling-exceptions[0].metric_transformation[0].namespace is correctly defined and accessible.

namespace = "LogMetrics"
value = "1"
}
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a newline at the end of the file to follow best practices and ensure compatibility with various tools and editors.

Copy link

Staging: eks

✅   Terraform Init: success
✅   Terraform Validate: success
✅   Terraform Format: success
✅   Terraform Plan: success
✅   Conftest: success

⚠️   Warning: resources will be destroyed by this change!

Plan: 4 to add, 1 to change, 1 to destroy
Show summary
CHANGE NAME
update aws_cloudwatch_metric_alarm.service-callback-too-many-failures-warning[0]
recreate aws_cloudwatch_metric_alarm.service-callback-too-many-failures-critical[0]
add aws_cloudwatch_log_metric_filter.throttling-exceptions[0]
aws_cloudwatch_metric_alarm.many-throttling-exceptions-warning[0]
aws_cloudwatch_metric_alarm.throttling-exception-warning[0]
Show plan
Resource actions are indicated with the following symbols:
  + create
  ~ update in-place
-/+ destroy and then create replacement

Terraform will perform the following actions:

  # aws_cloudwatch_log_metric_filter.throttling-exceptions[0] will be created
  + resource "aws_cloudwatch_log_metric_filter" "throttling-exceptions" {
      + id             = (known after apply)
      + log_group_name = "/aws/containerinsights/notification-canada-ca-staging-eks-cluster/application"
      + name           = "throttling-exceptions"
      + pattern        = "ThrottlingException"

      + metric_transformation {
          + name      = "throttling-exceptions"
          + namespace = "LogMetrics"
          + unit      = "None"
          + value     = "1"
        }
    }

  # aws_cloudwatch_metric_alarm.many-throttling-exceptions-warning[0] will be created
  + resource "aws_cloudwatch_metric_alarm" "many-throttling-exceptions-warning" {
      + actions_enabled                       = true
      + alarm_actions                         = [
          + "arn:aws:sns:ca-central-1:239043911459:alert-warning",
        ]
      + alarm_description                     = "Have received 100 throttling exception in the last minute"
      + alarm_name                            = "many-throttling-exceptions-warning"
      + arn                                   = (known after apply)
      + comparison_operator                   = "GreaterThanOrEqualToThreshold"
      + evaluate_low_sample_count_percentiles = (known after apply)
      + evaluation_periods                    = 1
      + id                                    = (known after apply)
      + metric_name                           = "throttling-exceptions"
      + namespace                             = "LogMetrics"
      + period                                = 60
      + statistic                             = "Sum"
      + tags_all                              = (known after apply)
      + threshold                             = 100
      + treat_missing_data                    = "notBreaching"
    }

  # aws_cloudwatch_metric_alarm.service-callback-too-many-failures-critical[0] must be replaced
-/+ resource "aws_cloudwatch_metric_alarm" "service-callback-too-many-failures-critical" {
      ~ alarm_name                            = "service-callback-too-many-failures-warning" -> "service-callback-too-many-failures-critical" # forces replacement
      ~ arn                                   = "arn:aws:cloudwatch:ca-central-1:239043911459:alarm:service-callback-too-many-failures-warning" -> (known after apply)
      - datapoints_to_alarm                   = 0 -> null
      - dimensions                            = {} -> null
      + evaluate_low_sample_count_percentiles = (known after apply)
      ~ id                                    = "service-callback-too-many-failures-warning" -> (known after apply)
      - insufficient_data_actions             = [] -> null
      - ok_actions                            = [] -> null
      - tags                                  = {} -> null
      ~ tags_all                              = {} -> (known after apply)
        # (11 unchanged attributes hidden)
    }

  # aws_cloudwatch_metric_alarm.service-callback-too-many-failures-warning[0] will be updated in-place
  ~ resource "aws_cloudwatch_metric_alarm" "service-callback-too-many-failures-warning" {
      ~ alarm_actions             = [
          - "arn:aws:sns:ca-central-1:239043911459:alert-critical",
          + "arn:aws:sns:ca-central-1:239043911459:alert-warning",
        ]
      ~ alarm_description         = "Service reached the max number of callback retries 100 times in 10 minutes" -> "Service reached the max number of callback retries 25 times in 5 minutes"
        id                        = "service-callback-too-many-failures-warning"
      ~ period                    = 600 -> 300
        tags                      = {}
      ~ threshold                 = 100 -> 25
        # (14 unchanged attributes hidden)
    }

  # aws_cloudwatch_metric_alarm.throttling-exception-warning[0] will be created
  + resource "aws_cloudwatch_metric_alarm" "throttling-exception-warning" {
      + actions_enabled                       = true
      + alarm_actions                         = [
          + "arn:aws:sns:ca-central-1:239043911459:alert-warning",
        ]
      + alarm_description                     = "Have received a throttling exception in the last minute"
      + alarm_name                            = "throttling-exception-warning"
      + arn                                   = (known after apply)
      + comparison_operator                   = "GreaterThanOrEqualToThreshold"
      + evaluate_low_sample_count_percentiles = (known after apply)
      + evaluation_periods                    = 1
      + id                                    = (known after apply)
      + metric_name                           = "throttling-exceptions"
      + namespace                             = "LogMetrics"
      + period                                = 60
      + statistic                             = "Sum"
      + tags_all                              = (known after apply)
      + threshold                             = 1
      + treat_missing_data                    = "notBreaching"
    }

Plan: 4 to add, 1 to change, 1 to destroy.

─────────────────────────────────────────────────────────────────────────────

Saved the plan to: plan.tfplan

To perform exactly these actions, run the following command to apply:
    terraform apply "plan.tfplan"
Show Conftest results
WARN - plan.json - main - Missing Common Tags: ["aws_acm_certificate.client_vpn"]
WARN - plan.json - main - Missing Common Tags: ["aws_acm_certificate.notification-canada-ca"]
WARN - plan.json - main - Missing Common Tags: ["aws_acm_certificate.notification-canada-ca-alt[0]"]
WARN - plan.json - main - Missing Common Tags: ["aws_acmpca_certificate_authority.client_vpn"]
WARN - plan.json - main - Missing Common Tags: ["aws_alb.notification-canada-ca"]
WARN - plan.json - main - Missing Common Tags: ["aws_alb_listener.internal_alb_tls"]
WARN - plan.json - main - Missing Common Tags: ["aws_alb_listener.notification-canada-ca"]
WARN - plan.json - main - Missing Common Tags: ["aws_alb_target_group.internal_nginx_http"]
WARN - plan.json - main - Missing Common Tags: ["aws_alb_target_group.notification-canada-ca-admin"]
WARN - plan.json - main - Missing Common Tags: ["aws_alb_target_group.notification-canada-ca-api"]
WARN - plan.json - main - Missing Common Tags: ["aws_alb_target_group.notification-canada-ca-document"]
WARN - plan.json - main - Missing Common Tags: ["aws_alb_target_group.notification-canada-ca-document-api"]
WARN - plan.json - main - Missing Common Tags: ["aws_alb_target_group.notification-canada-ca-documentation"]
WARN - plan.json - main - Missing Common Tags: ["aws_cloudwatch_log_group.notification-canada-ca-eks-application-logs[0]"]
WARN - plan.json - main - Missing Common Tags: ["aws_cloudwatch_log_group.notification-canada-ca-eks-cluster-logs[0]"]
WARN - plan.json - main - Missing Common Tags: ["aws_cloudwatch_log_group.notification-canada-ca-eks-prometheus-logs[0]"]
WARN - plan.json - main - Missing Common Tags: ["aws_cloudwatch_metric_alarm.admin-evicted-pods[0]"]
WARN - plan.json - main - Missing Common Tags: ["aws_cloudwatch_metric_alarm.admin-pods-high-cpu-warning[0]"]
WARN - plan.json - main - Missing Common Tags: ["aws_cloudwatch_metric_alarm.admin-pods-high-memory-warning[0]"]
WARN - plan.json - main - Missing Common Tags:...

@sastels sastels marked this pull request as ready for review September 26, 2024 18:58
Copy link

Updating alarms ⏰? Great! Please update the Google Sheet and add a 👍 to this message after 🙏

@sastels sastels requested review from whabanks and a team September 26, 2024 18:59
Copy link

Updating alarms ⏰? Great! Please update the Google Sheet and add a 👍 to this message after 🙏

Copy link
Contributor

@P0NDER0SA P0NDER0SA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes looks ok

@cds-snc cds-snc deleted a comment from github-actions bot Sep 26, 2024
@sastels sastels merged commit 74166db into main Sep 26, 2024
29 checks passed
@sastels sastels deleted the throttling-exception-warning branch September 26, 2024 19:06
@whabanks whabanks mentioned this pull request Oct 1, 2024
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants