Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce the frequency of reconcile and parameterize the reconcile settings #637

Open
leochr opened this issue Oct 11, 2024 · 2 comments
Open
Assignees

Comments

@leochr
Copy link
Member

leochr commented Oct 11, 2024

Reduce the frequency of reconcile to avoid stressing out the operator and long queues when there are lots of CR instances.

  • Implement smart reconcile to reduce the frequency in certain scenarios.
  • For example, if the error status conditions/messages haven't changed since the last 3 2 reconciles, then increase the reconcile interval (i.e. 5 seconds to 10 seconds). If error status didn't change in the last 4 reconciles, then increase the interval from 10 seconds to 20 seconds and so on up to 4 minutes perhaps.
  • Adjust the reconcile interval for successful status in a similar way, but perhaps cap the max at 2 minutes.
  • Parameterize the base reconcile interval (15 seconds by default) and the increase percentage (100% - meaning 15 seconds to 30 seconds), so they can be set in the Operator's ConfigMap by users if needed.
@halim-lee halim-lee self-assigned this Oct 11, 2024
@leochr leochr changed the title Reduce the frequency of reconcile and parameterize the reconcile intervals Reduce the frequency of reconcile and parameterize the reconcile settings Oct 11, 2024
@leochr
Copy link
Member Author

leochr commented Nov 1, 2024

Meeting summary - Friday, November 1, 2024 (Leo, Melissa):

  • Smart reconcile improved the overall reconcile and ready time (90+ instances became ready in 3 minutes; new instance (101st) became ready in 31 seconds). Controller was run locally.

Next steps:

  • Investigate if there is anything to do for the 'Warning' status condition
  • Handle invalid user config for smart reconcile in the Operator configmap: Output warning and use default values

@halim-lee
Copy link
Member

halim-lee commented Nov 7, 2024

Controller was run on cloud using driver from OLO OnePipeline Build: 627

The tests were run on a large OCP cluster. The test result seems to have larger numbers compared to the local controller.

All instances in a single namespace tests:
1-namespace-100-new-working.log
1-namespace-100-working-1-new-working.log
1-namespace-101-working-10-nonworking.log

  • It took 5m11s to fully reconcile 100 instances, all in the same namespace.
  • It took 2m4s to fully reconcile 1 new instance, with 100 working instances.
  • It took 1m19s to fully reconcile 1 wrongly configured instance after it was corrected, with 101 working instances and 9 non-working instances running.

Each instance in its own namespace tests:
each-new-namespace-100-new-working.log
each-new-namespace-100-working-1-new-working.log
each-new-namespace-101-working-10-non-working.log

  • Took longer than single namespace tests
  • It took 10m to fully reconcile 100 instances, each in its own namespace.
  • It took 2m17s minutes to fully reconcile 1 new instance, with 100 working instances.
  • It took 2m29s minutes to fully reconcile 1 wrongly configured instance after it was corrected, with 101 working instances and 9 non-working instances running.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants