Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent unnecessary reconciles on provider restarts #696

Open
gravufo opened this issue May 22, 2024 · 6 comments
Open

Prevent unnecessary reconciles on provider restarts #696

gravufo opened this issue May 22, 2024 · 6 comments
Labels
enhancement New feature or request performance

Comments

@gravufo
Copy link

gravufo commented May 22, 2024

What problem are you facing?

Today, when provider pods are restarted (whether it's due to updates to the provider version or simply k8s nodes shuffling), the cache is completely lost forcing the controllers to reconcile all MRs regardless of the last time they were synced and the configured poll frequency.

Depending on the provider, this can be fine and fast, but can sometimes be very slow and even problematic (think external APIs rate limiting). The specific use-case I am referring to is provider-upjet-azure which easily gets rate-limited by the Azure RM API if it gets restarted when managing a large number of MRs.

How could Crossplane help solve your problem?

Since we have no choice but to requeue all objects (this makes sense), I think a nice improvement would be to validate the last time this object (assuming Synced and Ready are both true) was reconciled and calculating based on the poll frequency if it needs to be reconciled again right now or if it should be requeued at the time the poll frequency would be hit.

This would prevent mega bursts of external calls every time we need to restart provider pods, regardless of the reason.

@gravufo gravufo added the enhancement New feature or request label May 22, 2024
@gravufo
Copy link
Author

gravufo commented Jul 25, 2024

I have done a basic implementation of this on a custom provider by adding the following fields to our MR CRDs:

	SpecHash              string                      `json:"specHash,omitempty"`
	LastExternalReconcile metav1.Time                 `json:"lastExternalReconcile,omitempty"`

The following logic was done in the reconciler:

func (c *External) Observe(ctx context.Context, mg resource.Managed) (managed.ExternalObservation, error) {
	// Calculate the hash of the spec of the MR
	hash, err := ComputeHash(mg)
	if err != nil {
		return managed.ExternalObservation{}, err
	}

	// On first run, the saved hash will be empty and/or the LastExternalReconcile will be empty
	firstRun := GetSpecHash(mg) == "" || GetLastExternalReconcile(mg).IsZero()
	// If the spec has changed or the sync period has passed, we need to resync
	needsResync := GetSpecHash(mg) != hash || GetLastExternalReconcile(mg).Add(*constants.SyncPeriod).Before(time.Now()) || mg.GetCondition(xpv1.TypeReady).Reason == xpv1.Deleting().Reason
	if firstRun || needsResync {
		result, err := c.Handler.Observe(ctx, mg)
		if err != nil {
			return result, errors.Wrap(err, "error in fetching resource")
		}

		if !result.ResourceExists {
			return result, nil
		}

                // Set the calculated hash in the MR status
		SetSpecHash(mg, hash)
                // Set the last external reconcile time in the MR status
		SetLastExternalReconcile(mg, time.Now())

		mg.SetConditions(xpv1.Available())
		return result, nil
	}

	return managed.ExternalObservation{ResourceExists: true, ResourceUpToDate: true}, nil
}

This makes it so a reconcile is done if the MR spec changes or if the Sync time has passed since the last external reconcile.
Hope this gives more detail about what kind of implementation could be done.

@jbw976
Copy link
Member

jbw976 commented Jul 31, 2024

Thanks for your efforts to describe this issue and for showing up to community meetings to advocate for it. We appreciate the way you're engaging on this @gravufo 🙇‍♂️

I think the general idea is reasonable to me, so that large deployments can make the conscious trade-off of sync frequency (and drift detection) vs. control plane performance and health. A few thoughts:

  • should probably be opt-in as opposed to the default behavior, so the minority of users that know they have a lot of resources and potential performance issues can turn it on, while other folks with smaller deployments get the current behavior
  • these tracking fields like last sync and spec hash could fit better as annotations so that all these objects across the crossplane ecosystem don't have to deal with upgrading their schemas
  • is this sync period a new value, or would an existing value be reused like syncInterval or pollInterval? https://github.com/crossplane-contrib/provider-upjet-aws/blob/main/cmd/provider/ec2/zz_main.go#L65-L66

@blakeromano
Copy link

  • these tracking fields like last sync and spec hash could fit better as annotations so that all these objects across the crossplane ecosystem don't have to deal with upgrading their schemas

I wonder how big the hash actually looks? I think we are moved to SSA so things like size of annotation size shouldn't really matter but I do wonder if we hinder the experience of dealing with annotations relating to external-name, by adding the hash in annotations.

@gravufo
Copy link
Author

gravufo commented Jul 31, 2024

Thank you for your feedback @jbw976!

Here are my thoughts about every point you brought up:

  • should probably be opt-in as opposed to the default behavior, so the minority of users that know they have a lot of resources and potential performance issues can turn it on, while other folks with smaller deployments get the current behavior

I'm definitely not against making this behaviour opt-in, but I would argue that this would benefit everyone, including users in small scale. As a user (large or small scale), I expect my resource to be synced immediately when initially created or when I update the spec and I expect it to be regularly reconciled based on the syncInterval that is set on the provider.

From there on, it would be clear that the pollInterval is simply to ensure we haven't missed an update on the MR at the Kubernetes level and the syncInterval would garantee that your MR will be reconciled at that frequency and no faster (unless something changes in the spec).

Not having this garantee makes the syncInterval parameter quite awkward and hard to understand the real impact.

  • these tracking fields like last sync and spec hash could fit better as annotations so that all these objects across the crossplane ecosystem don't have to deal with upgrading their schemas

I think this is a possibility, however I see it more like a Condition, just as the Synced and Ready states are set. That is just my impression though, no strong opinion on where it is saved.

It's actually the syncInterval. I did not seem to find it exposed anywhere for us to use in the Observe function, so I had to save it in a global var at the initial setup of the controller, but this is just a quirk 😅 Ideally we could just use the syncInterval directly and that's probably what we would want if it were to be done in this repo.

About the comment from @blakeromano:

I wonder how big the hash actually looks? I think we are moved to SSA so things like size of annotation size shouldn't really matter but I do wonder if we hinder the experience of dealing with annotations relating to external-name, by adding the hash in annotations.

In our implementation, we used hashstructure which generates small hashes that look like this: specHash: "12301606614063621924". Note that the repo has just been archived a few days ago...Definitely will need to stop using that 😄
That said, I'm pretty sure there's a better way to determine if something has changed and requires an external reconcile. We just used a basic hashing of the spec value as a "quick and dirty way" of achieving our goal.
Note that the hash doesn't have to be bulletproof, the goal is just to get a best-effort way of reducing the external calls as much as possible, because those are costly and have an impact. We could technically just use the resource generation or resourceVersion as indicators of changes and save observedGeneration in the status. This is how the k8s Deployment controller works and decides whether a new ReplicaSet is necessary or not. Not sure how they implemented it exactly though.

I appreciate the feedback and I hope we can have a good discussion about this! Thanks everyone 👍

@negz
Copy link
Member

negz commented Jul 31, 2024

I'm supportive of addressing the issue. I'm not sure about the proposed solution though. Some initial thoughts:

  • I'd definitely prefer to use status.observedGeneration instead of adding a hash of spec. There's a desire to support that more generally. Work started per Enable user to understand when resource is reconciled crossplane#4655, but we only got around to adding the field to all MRs (and XRs? I forget). I don't think we have any code that actually propagates the generation to the observed generation.

  • I'm ambivalent about making this opt-in. I agree it'd benefit everyone, but it is also a behavior change. I'm not sure how impactful the behavior change would be though. Would anyone be relying on the burst of reconciles at startup time and be surprised if it was more smoothed out?

  • I don't think we can rely on the status conditions to know when the resource was last reconciled. Their lastTransitionTime only tells you the last time a state transitioned (e.g. the MR moved between ready and unready, or reconcile error and reconcile success). Maybe you weren't suggesting this? I don't see it in your example code snippet.

  • We need to be careful about recording the last reconcile time, since writing that reconcile time will queue a new reconcile. At a glance I don't think your example code would be subject to that issue though since it'd typically return early from the queued reconcile.

Is there anything we could do to the existing controller-runtime / client-go reconcile rate-limiting machinery to smooth out large bursts of reconciles on startup without needing to add tracking fields, additional logic, etc? I wonder if there's any prior art for addressing this issue in other Kubernetes controllers, especially controller-runtime based controllers. I imagine others must have faced similar problems.

@gravufo
Copy link
Author

gravufo commented Jul 31, 2024

  • I'm ambivalent about making this opt-in. I agree it'd benefit everyone, but it is also a behavior change. I'm not sure how impactful the behavior change would be though. Would anyone be relying on the burst of reconciles at startup time and be surprised if it was more smoothed out?

You're right, it is indeed a behaviour change. I may have been too optimistic about it being enabled by default. I really don't mind if there's a flag, but I think aiming to have this behaviour by default would make more sense to me, because that's how I initially thought it would behave before I actually realized it wasn't. Not sure how we could go about that though...feature flag it and in a major change the default?

  • I don't think we can rely on the status conditions to know when the resource was last reconciled. Their lastTransitionTime only tells you the last time a state transitioned (e.g. the MR moved between ready and unready, or reconcile error and reconcile success). Maybe you weren't suggesting this? I don't see it in your example code snippet.

I actually don't save it as a condition for that specific reason, I simply saved it as a field straight under status. As mentioned in my previous comment and like you suggested, using something similar to status.observedGeneration may make more sense.

  • We need to be careful about recording the last reconcile time, since writing that reconcile time will queue a new reconcile. At a glance I don't think your example code would be subject to that issue though since it'd typically return early from the queued reconcile.
    Is there anything we could do to the existing controller-runtime / client-go reconcile rate-limiting machinery to smooth out large bursts of reconciles on startup without needing to add tracking fields, additional logic, etc? I wonder if there's any prior art for addressing this issue in other Kubernetes controllers, especially controller-runtime based controllers. I imagine others must have faced similar problems.

I agree that it would be nice to avoid the extra reconcile loop if possible, but exiting early mitigates most of the drawbacks, especially considering that the extra reconcile would only happen when the external api is called only, which is not that frequent (depending on syncInterval and/or spec changes).

I tried to think of other products/controllers in the kubernetes ecosystem that could have the same type of problem and the only thing I thought of is maybe ArgoCD since it reconciles apps every 3 minutes by default and has to sync external git repos and/or helm repos. Not sure if it's exactly the same though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request performance
Projects
Status: Backlog
Development

No branches or pull requests

4 participants