Prevent unnecessary reconciles on provider restarts #696

gravufo · 2024-05-22T16:44:13Z

What problem are you facing?

Today, when provider pods are restarted (whether it's due to updates to the provider version or simply k8s nodes shuffling), the cache is completely lost forcing the controllers to reconcile all MRs regardless of the last time they were synced and the configured poll frequency.

Depending on the provider, this can be fine and fast, but can sometimes be very slow and even problematic (think external APIs rate limiting). The specific use-case I am referring to is provider-upjet-azure which easily gets rate-limited by the Azure RM API if it gets restarted when managing a large number of MRs.

How could Crossplane help solve your problem?

Since we have no choice but to requeue all objects (this makes sense), I think a nice improvement would be to validate the last time this object (assuming Synced and Ready are both true) was reconciled and calculating based on the poll frequency if it needs to be reconciled again right now or if it should be requeued at the time the poll frequency would be hit.

This would prevent mega bursts of external calls every time we need to restart provider pods, regardless of the reason.

The text was updated successfully, but these errors were encountered:

gravufo · 2024-07-25T18:22:34Z

I have done a basic implementation of this on a custom provider by adding the following fields to our MR CRDs:

	SpecHash              string                      `json:"specHash,omitempty"`
	LastExternalReconcile metav1.Time                 `json:"lastExternalReconcile,omitempty"`

The following logic was done in the reconciler:

func (c *External) Observe(ctx context.Context, mg resource.Managed) (managed.ExternalObservation, error) {
	// Calculate the hash of the spec of the MR
	hash, err := ComputeHash(mg)
	if err != nil {
		return managed.ExternalObservation{}, err
	}

	// On first run, the saved hash will be empty and/or the LastExternalReconcile will be empty
	firstRun := GetSpecHash(mg) == "" || GetLastExternalReconcile(mg).IsZero()
	// If the spec has changed or the sync period has passed, we need to resync
	needsResync := GetSpecHash(mg) != hash || GetLastExternalReconcile(mg).Add(*constants.SyncPeriod).Before(time.Now()) || mg.GetCondition(xpv1.TypeReady).Reason == xpv1.Deleting().Reason
	if firstRun || needsResync {
		result, err := c.Handler.Observe(ctx, mg)
		if err != nil {
			return result, errors.Wrap(err, "error in fetching resource")
		}

		if !result.ResourceExists {
			return result, nil
		}

                // Set the calculated hash in the MR status
		SetSpecHash(mg, hash)
                // Set the last external reconcile time in the MR status
		SetLastExternalReconcile(mg, time.Now())

		mg.SetConditions(xpv1.Available())
		return result, nil
	}

	return managed.ExternalObservation{ResourceExists: true, ResourceUpToDate: true}, nil
}

This makes it so a reconcile is done if the MR spec changes or if the Sync time has passed since the last external reconcile.
Hope this gives more detail about what kind of implementation could be done.

jbw976 · 2024-07-31T16:26:54Z

Thanks for your efforts to describe this issue and for showing up to community meetings to advocate for it. We appreciate the way you're engaging on this @gravufo 🙇‍♂️

I think the general idea is reasonable to me, so that large deployments can make the conscious trade-off of sync frequency (and drift detection) vs. control plane performance and health. A few thoughts:

should probably be opt-in as opposed to the default behavior, so the minority of users that know they have a lot of resources and potential performance issues can turn it on, while other folks with smaller deployments get the current behavior
these tracking fields like last sync and spec hash could fit better as annotations so that all these objects across the crossplane ecosystem don't have to deal with upgrading their schemas
is this sync period a new value, or would an existing value be reused like syncInterval or pollInterval? https://github.com/crossplane-contrib/provider-upjet-aws/blob/main/cmd/provider/ec2/zz_main.go#L65-L66

blakeromano · 2024-07-31T16:33:44Z

these tracking fields like last sync and spec hash could fit better as annotations so that all these objects across the crossplane ecosystem don't have to deal with upgrading their schemas

I wonder how big the hash actually looks? I think we are moved to SSA so things like size of annotation size shouldn't really matter but I do wonder if we hinder the experience of dealing with annotations relating to external-name, by adding the hash in annotations.

gravufo · 2024-07-31T16:45:35Z

Thank you for your feedback @jbw976!

Here are my thoughts about every point you brought up:

should probably be opt-in as opposed to the default behavior, so the minority of users that know they have a lot of resources and potential performance issues can turn it on, while other folks with smaller deployments get the current behavior

I'm definitely not against making this behaviour opt-in, but I would argue that this would benefit everyone, including users in small scale. As a user (large or small scale), I expect my resource to be synced immediately when initially created or when I update the spec and I expect it to be regularly reconciled based on the syncInterval that is set on the provider.

From there on, it would be clear that the pollInterval is simply to ensure we haven't missed an update on the MR at the Kubernetes level and the syncInterval would garantee that your MR will be reconciled at that frequency and no faster (unless something changes in the spec).

Not having this garantee makes the syncInterval parameter quite awkward and hard to understand the real impact.

these tracking fields like last sync and spec hash could fit better as annotations so that all these objects across the crossplane ecosystem don't have to deal with upgrading their schemas

I think this is a possibility, however I see it more like a Condition, just as the Synced and Ready states are set. That is just my impression though, no strong opinion on where it is saved.

is this sync period a new value, or would an existing value be reused like syncInterval or pollInterval? https://github.com/crossplane-contrib/provider-upjet-aws/blob/main/cmd/provider/ec2/zz_main.go#L65-L66

It's actually the syncInterval. I did not seem to find it exposed anywhere for us to use in the Observe function, so I had to save it in a global var at the initial setup of the controller, but this is just a quirk 😅 Ideally we could just use the syncInterval directly and that's probably what we would want if it were to be done in this repo.

About the comment from @blakeromano:

I wonder how big the hash actually looks? I think we are moved to SSA so things like size of annotation size shouldn't really matter but I do wonder if we hinder the experience of dealing with annotations relating to external-name, by adding the hash in annotations.

In our implementation, we used hashstructure which generates small hashes that look like this: specHash: "12301606614063621924". Note that the repo has just been archived a few days ago...Definitely will need to stop using that 😄
That said, I'm pretty sure there's a better way to determine if something has changed and requires an external reconcile. We just used a basic hashing of the spec value as a "quick and dirty way" of achieving our goal.
Note that the hash doesn't have to be bulletproof, the goal is just to get a best-effort way of reducing the external calls as much as possible, because those are costly and have an impact. We could technically just use the resource generation or resourceVersion as indicators of changes and save observedGeneration in the status. This is how the k8s Deployment controller works and decides whether a new ReplicaSet is necessary or not. Not sure how they implemented it exactly though.

I appreciate the feedback and I hope we can have a good discussion about this! Thanks everyone 👍

negz · 2024-07-31T18:00:02Z

I'm supportive of addressing the issue. I'm not sure about the proposed solution though. Some initial thoughts:

I'd definitely prefer to use status.observedGeneration instead of adding a hash of spec. There's a desire to support that more generally. Work started per Enable user to understand when resource is reconciled crossplane#4655, but we only got around to adding the field to all MRs (and XRs? I forget). I don't think we have any code that actually propagates the generation to the observed generation.
I'm ambivalent about making this opt-in. I agree it'd benefit everyone, but it is also a behavior change. I'm not sure how impactful the behavior change would be though. Would anyone be relying on the burst of reconciles at startup time and be surprised if it was more smoothed out?
I don't think we can rely on the status conditions to know when the resource was last reconciled. Their lastTransitionTime only tells you the last time a state transitioned (e.g. the MR moved between ready and unready, or reconcile error and reconcile success). Maybe you weren't suggesting this? I don't see it in your example code snippet.
We need to be careful about recording the last reconcile time, since writing that reconcile time will queue a new reconcile. At a glance I don't think your example code would be subject to that issue though since it'd typically return early from the queued reconcile.

Is there anything we could do to the existing controller-runtime / client-go reconcile rate-limiting machinery to smooth out large bursts of reconciles on startup without needing to add tracking fields, additional logic, etc? I wonder if there's any prior art for addressing this issue in other Kubernetes controllers, especially controller-runtime based controllers. I imagine others must have faced similar problems.

gravufo · 2024-07-31T23:25:57Z

I'm ambivalent about making this opt-in. I agree it'd benefit everyone, but it is also a behavior change. I'm not sure how impactful the behavior change would be though. Would anyone be relying on the burst of reconciles at startup time and be surprised if it was more smoothed out?

You're right, it is indeed a behaviour change. I may have been too optimistic about it being enabled by default. I really don't mind if there's a flag, but I think aiming to have this behaviour by default would make more sense to me, because that's how I initially thought it would behave before I actually realized it wasn't. Not sure how we could go about that though...feature flag it and in a major change the default?

I don't think we can rely on the status conditions to know when the resource was last reconciled. Their lastTransitionTime only tells you the last time a state transitioned (e.g. the MR moved between ready and unready, or reconcile error and reconcile success). Maybe you weren't suggesting this? I don't see it in your example code snippet.

I actually don't save it as a condition for that specific reason, I simply saved it as a field straight under status. As mentioned in my previous comment and like you suggested, using something similar to status.observedGeneration may make more sense.

We need to be careful about recording the last reconcile time, since writing that reconcile time will queue a new reconcile. At a glance I don't think your example code would be subject to that issue though since it'd typically return early from the queued reconcile.
Is there anything we could do to the existing controller-runtime / client-go reconcile rate-limiting machinery to smooth out large bursts of reconciles on startup without needing to add tracking fields, additional logic, etc? I wonder if there's any prior art for addressing this issue in other Kubernetes controllers, especially controller-runtime based controllers. I imagine others must have faced similar problems.

I agree that it would be nice to avoid the extra reconcile loop if possible, but exiting early mitigates most of the drawbacks, especially considering that the extra reconcile would only happen when the external api is called only, which is not that frequent (depending on syncInterval and/or spec changes).

I tried to think of other products/controllers in the kubernetes ecosystem that could have the same type of problem and the only thing I thought of is maybe ArgoCD since it reconciles apps every 3 minutes by default and has to sync external git repos and/or helm repos. Not sure if it's exactly the same though.

gravufo added the enhancement New feature or request label May 22, 2024

jbw976 added this to Crossplane Roadmap Jul 25, 2024

jbw976 added the performance label Jul 25, 2024

jbw976 mentioned this issue Jul 25, 2024

[Bug]: UserAssignedIdentities and FederatedIdentityCredentials are not able to sync since v1.0.0 crossplane-contrib/provider-upjet-azure#740

Closed

1 task

jbw976 moved this to Backlog in Crossplane Roadmap Jul 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prevent unnecessary reconciles on provider restarts #696

Prevent unnecessary reconciles on provider restarts #696

gravufo commented May 22, 2024

gravufo commented Jul 25, 2024

jbw976 commented Jul 31, 2024

blakeromano commented Jul 31, 2024

gravufo commented Jul 31, 2024 •

edited

Loading

negz commented Jul 31, 2024 •

edited

Loading

gravufo commented Jul 31, 2024

Prevent unnecessary reconciles on provider restarts #696

Prevent unnecessary reconciles on provider restarts #696

Comments

gravufo commented May 22, 2024

What problem are you facing?

How could Crossplane help solve your problem?

gravufo commented Jul 25, 2024

jbw976 commented Jul 31, 2024

blakeromano commented Jul 31, 2024

gravufo commented Jul 31, 2024 • edited Loading

negz commented Jul 31, 2024 • edited Loading

gravufo commented Jul 31, 2024

gravufo commented Jul 31, 2024 •

edited

Loading

negz commented Jul 31, 2024 •

edited

Loading