Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/3959 bg monitoring fixed #2423

Merged
merged 6 commits into from
Nov 28, 2024

Conversation

richard-jones
Copy link
Contributor

Replaces PR: #2419

update bg monitoring settings for mission critical monitoring changes

Applies tighter monitoring constraints on the background jobs, as per this spreadsheet: https://docs.google.com/spreadsheets/d/1XWq-7R9hZmZsBA27eOgs9cO-SfJnLEE0dW50DVtKhpI/edit?gid=543591434#gid=543591434

This PR...

  • has scripts to run
  • has migrations to run
  • adds new infrastructure
  • changes the CI pipeline
  • affects the public site
  • affects the editorial area
  • affects the publisher area
  • affects the monitoring

Developer Checklist

Developers should review and confirm each of these items before requesting review

  • Code meets acceptance criteria from issue
  • Unit tests are written and all pass
  • User Test Scripts (if required) are written and have been run through
  • Project's coding standards are met
    • No deprecated methods are used
    • No magic strings/numbers - all strings are in constants or messages files
    • ES queries are wrapped in a Query object rather than inlined in the code
    • Where possible our common library functions have been used (e.g. dates manipulated via dates)
    • Cleaned up commented out code, etc
    • Urls are constructed with url_for not hard-coded
  • Code documentation and related non-code documentation has all been updated
  • Migation has been created and tested
  • There is a recent merge from develop

Reviewer Checklist

Reviewers should review and confirm each of these items before approval
If there are multiple reviewers, this section should be duplicated for each reviewer

  • Code meets acceptance criteria from issue
  • Unit tests are written and all pass
  • User Test Scripts (if required) are written and have been run through
  • Project's coding standards are met
    • No deprecated methods are used
    • No magic strings/numbers - all strings are in constants or messages files
    • ES queries are wrapped in a Query object rather than inlined in the code
    • Where possible our common library functions have been used (e.g. dates manipulated via dates)
    • Cleaned up commented out code, etc
    • Urls are constructed with url_for not hard-coded
  • Code documentation and related non-code documentation has all been updated
  • Migation has been created and tested
  • There is a recent merge from develop

Testing

This tightens the monitoring on a number of bg jobs, so would be good to validate that this won't cause any major surge in unnecessary reporting.

Probably need to send it live and then adjust.

Deployment

Configuration changes

Adds new values in:

  • BG_MONITOR_QUEUED_CONFIG
  • BG_MONITOR_ERRORS_CONFIG
  • BG_MONITOR_DEFAULT_CONFIG

Monitoring

This may trigger new types of "unstable" alerts in the background jobs reporting, so we should ensure that those alerts come through

Copy link
Contributor

@RK206 RK206 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test cases failing from local. Can you please check. Are there any configuration changes required to test?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Failed tests on my local. Not sure if these are specific to my local. Fixed import issue on my local in portality/bll/services/background_task_status.py before testing.

doajtest/unit/test_background_task_status.py:178 (TestBackgroundTaskStatus.test_create_background_status__error_in_period_not_found)
self = <doajtest.unit.test_background_task_status.TestBackgroundTaskStatus testMethod=test_create_background_status__error_in_period_not_found>

    @apply_test_case_config(bg_monitor_errors_config__a)
    def test_create_background_status__error_in_period_not_found(self):
        save_mock_bgjob(JournalCSVBackgroundTask.__action__,
                        status=constants.BGJOB_STATUS_ERROR,
                        created_before_sec=1000000000)
    
        status_dict = background_task_status.create_background_status()
    
        journal_csv_dict = status_dict['queues']['scheduled_short']['errors'].get(JournalCSVBackgroundTask.__action__, {})
    
>       assert is_stable(status_dict['status'])
E       AssertionError: assert False
E        +  where False = is_stable('unstable')

test_background_task_status.py:189: AssertionError
doajtest/unit/test_background_task_status.py:106 (TestBackgroundTaskStatus.test_create_background_status__invalid_last_completed__events_queue)
self = <doajtest.unit.test_background_task_status.TestBackgroundTaskStatus testMethod=test_create_background_status__invalid_last_completed__events_queue>

    @apply_test_case_config(bg_monitor_last_completed__now)
    def test_create_background_status__invalid_last_completed__events_queue(self):
        save_mock_bgjob(JournalCSVBackgroundTask.__action__,
                        queue_id=constants.BGJOB_QUEUE_ID_EVENTS,
                        status=constants.BGJOB_STATUS_COMPLETE, )
    
        status_dict = background_task_status.create_background_status()
    
        assert not is_stable(status_dict['status'])
>       self.assert_unstable_dict(status_dict['queues'].get('events', {}))

test_background_task_status.py:116: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

val = {'err_msgs': [], 'errors': {'application_autochecks': {'err_msgs': [], 'in_monitoring_period': 0, 'status': 'stable', ...table'}, 'ingest_articles': {'err_msgs': [], 'last_run': None, 'last_run_status': None, 'status': 'stable'}, ...}, ...}

    @staticmethod
    def assert_unstable_dict(val):
>       assert not is_stable(val.get('status'))
E       AssertionError: assert not True
E        +  where True = is_stable('stable')
E        +    where 'stable' = <built-in method get of dict object at 0x165c907c0>('status')
E        +      where <built-in method get of dict object at 0x165c907c0> = {'err_msgs': [], 'errors': {'application_autochecks': {'err_msgs': [], 'in_monitoring_period': 0, 'status': 'stable', 'total': 0}, 'article_bulk_create': {'err_msgs': [], 'in_monitoring_period': 0, 'status': 'stable', 'total': 0}, 'article_bulk_delete': {'err_msgs': [], 'in_monitoring_period': 0, 'status': 'stable', 'total': 0}, 'ingest_articles': {'err_msgs': [], 'in_monitoring_period': 0, 'status': 'stable', 'total': 0}, ...}, 'last_completed_job': '2024-11-04T04:46:47Z', 'last_run_successful': {'application_autochecks': {'err_msgs': [], 'last_run': None, 'last_run_status': None, 'status': 'stable'}, 'article_bulk_create': {'err_msgs': [], 'last_run': None, 'last_run_status': None, 'status': 'stable'}, 'article_bulk_delete': {'err_msgs': [], 'last_run': None, 'last_run_status': None, 'status': 'stable'}, 'ingest_articles': {'err_msgs': [], 'last_run': None, 'last_run_status': None, 'status': 'stable'}, ...}, ...}.get

test_background_task_status.py:104: AssertionError

doajtest/unit/test_background_task_status.py:232 (TestBackgroundTaskStatus.test_create_background_status__queued_valid_oldest)
self = <doajtest.unit.test_background_task_status.TestBackgroundTaskStatus testMethod=test_create_background_status__queued_valid_oldest>

    @apply_test_case_config(bg_monitor_queued_config__a)
    def test_create_background_status__queued_valid_oldest(self):
        save_mock_bgjob(JournalCSVBackgroundTask.__action__,
                        status=constants.BGJOB_STATUS_QUEUED, )
    
        status_dict = background_task_status.create_background_status()
        print(json.dumps(status_dict, indent=4))
    
        journal_csv_dict = status_dict['queues']['scheduled_short']['queued'].get(JournalCSVBackgroundTask.__action__, {})
    
>       assert is_stable(status_dict['status'])
E       AssertionError: assert False
E        +  where False = is_stable('unstable')

test_background_task_status.py:243: AssertionError
doajtest/unit/test_background_task_status.py:205 (TestBackgroundTaskStatus.test_create_background_status__queued_valid_total)
self = <doajtest.unit.test_background_task_status.TestBackgroundTaskStatus testMethod=test_create_background_status__queued_valid_total>

    @apply_test_case_config(bg_monitor_queued_config__zero_total)
    def test_create_background_status__queued_valid_total(self):
        save_mock_bgjob(JournalCSVBackgroundTask.__action__,
                        status=constants.BGJOB_STATUS_COMPLETE, )
    
        status_dict = background_task_status.create_background_status()
    
        journal_csv_dict = status_dict['queues']['scheduled_short']['queued'].get(JournalCSVBackgroundTask.__action__, {})
    
>       assert is_stable(status_dict['status'])
E       AssertionError: assert False
E        +  where False = is_stable('unstable')

test_background_task_status.py:215: AssertionError
doajtest/unit/test_background_task_status.py:130 (TestBackgroundTaskStatus.test_create_background_status__valid_last_completed)
self = <doajtest.unit.test_background_task_status.TestBackgroundTaskStatus testMethod=test_create_background_status__valid_last_completed>

    @apply_test_case_config(bg_monitor_last_completed__a)
    def test_create_background_status__valid_last_completed(self):
        save_mock_bgjob(JournalCSVBackgroundTask.__action__,
                        queue_id=constants.BGJOB_QUEUE_ID_EVENTS,
                        status=constants.BGJOB_STATUS_COMPLETE, )
        save_mock_bgjob(AnonExportBackgroundTask.__action__,
                        queue_id=constants.BGJOB_QUEUE_ID_SCHEDULED_LONG,
                        status=constants.BGJOB_STATUS_COMPLETE, )
    
        status_dict = background_task_status.create_background_status()
    
>       assert is_stable(status_dict['status'])
E       AssertionError: assert False
E        +  where False = is_stable('unstable')

test_background_task_status.py:142: AssertionError

doajtest/unit/test_background_task_status.py:145 (TestBackgroundTaskStatus.test_create_background_status__valid_last_completed__no_record)
self = <doajtest.unit.test_background_task_status.TestBackgroundTaskStatus testMethod=test_create_background_status__valid_last_completed__no_record>

    @apply_test_case_config(bg_monitor_last_completed__now)
    def test_create_background_status__valid_last_completed__no_record(self):
        status_dict = background_task_status.create_background_status()
>       assert is_stable(status_dict['status'])
E       AssertionError: assert False
E        +  where False = is_stable('unstable')

test_background_task_status.py:149: AssertionError

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RK206 The testing was a bit confused now there is an additional monitoring stream, so I have replaced it with a parameterised test that gets all the variations and that appears to be working fine.

@@ -4,7 +4,9 @@
import itertools
from typing import Iterable

from portality.constants import BGJOB_QUEUE_ID_LONG, BGJOB_QUEUE_ID_MAIN, BGJOB_STATUS_ERROR, BGJOB_STATUS_QUEUED, \
import constants
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should it import from portality?

@@ -4,7 +4,9 @@
import itertools
from typing import Iterable

from portality.constants import BGJOB_QUEUE_ID_LONG, BGJOB_QUEUE_ID_MAIN, BGJOB_STATUS_ERROR, BGJOB_STATUS_QUEUED, \
import constants
from constants import BGJOB_STATUS_COMPLETE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above. import from portality?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that's a pycharm thing, it sometimes does the imports for you, and it does them relative, which seems to work locally but not anywhere else. Very annoying!

Copy link
Contributor

@Steven-Eardley Steven-Eardley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few changes for @RK206 due to recent deployment changes.

Note the queues have been added in addition to the old ones (main and long_running) so we don't need to touch the old tasks, and we only need to add the symlinks for deployment for the new consumers. The release will therefore be fairly straightforward.

@@ -0,0 +1,9 @@
[program:huey-events]
command=/home/cloo/doaj/bin/python /home/cloo/doaj/bin/huey_consumer.py -v portality.tasks.consumer_events_queue.events_queue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated deployment - changed binary paths /home/cloo/doaj/**venv/**bin/python and /home/cloo/doaj/**venv/**bin/huey_consumer.py

@@ -0,0 +1,9 @@
[program:huey-scheduled-long]
command=/home/cloo/doaj/bin/python /home/cloo/doaj/bin/huey_consumer.py -v portality.tasks.consumer_scheduled_long_queue.scheduled_long_queue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

venv path again

@@ -0,0 +1,9 @@
[program:huey-scheduled-short]
command=/home/cloo/doaj/bin/python /home/cloo/doaj/bin/huey_consumer.py -v portality.tasks.consumer_scheduled_short_queue.scheduled_short_queue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

venv path

long_running.always_eager = True
main_queue.immediate = True
long_running.immediate = True
events_queue.always_eager = True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

syntax change for new huey - always_eager is now immediate

@Steven-Eardley
Copy link
Contributor

I retract my suggestion for @RK206 to make the above changes for new deployment - I'll take care of that and fix the conflicts

@Steven-Eardley Steven-Eardley merged commit 9fd67b1 into develop Nov 28, 2024
1 of 2 checks passed
@Steven-Eardley Steven-Eardley deleted the feature/3959_bg_monitoring_fixed branch November 28, 2024 16:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants