Skip to content

Conversation

@kurlov
Copy link
Contributor

@kurlov kurlov commented Oct 20, 2025

Follow up for #459

Add test for WithSelector using controller manager to verify the reconciler only processes CRs that match the configured selector.

@codecov-commenter
Copy link

codecov-commenter commented Oct 23, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 74.71%. Comparing base (08ab7fb) to head (9149213).
⚠️ Report is 125 commits behind head on main.

❗ There is a different number of reports uploaded between BASE (08ab7fb) and HEAD (9149213). Click for more details.

HEAD has 1 upload less than BASE
Flag BASE (08ab7fb) HEAD (9149213)
2 1
Additional details and impacted files
@@             Coverage Diff             @@
##             main     #496       +/-   ##
===========================================
- Coverage   85.06%   74.71%   -10.35%     
===========================================
  Files          19       31       +12     
  Lines        1346     2037      +691     
===========================================
+ Hits         1145     1522      +377     
- Misses        125      433      +308     
- Partials       76       82        +6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Member

@porridge porridge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is much better than the previous one, and it will actually fail when we accidentally stop passing the predicate, so it's useful.

However, I'm afraid that this test passing does not guarantee much. AFAICT there is no way to be sure whether the controller will never schedule the unwanted CR for reconciliation, or if it just did not get round to doing that yet. 🤔

Perhaps we can improve our confidence a bit by creating the unlabeled CR before the labeled one?

Comment on lines 1544 to 1553
controllerName := fmt.Sprintf("%v-controller", strings.ToLower(gvk.Kind))
Expect(r.addDefaults(mgr, controllerName)).To(Succeed())
r.setupScheme(mgr)

c, err := controller.New(controllerName, mgr, controller.Options{
Reconciler: reconciler,
MaxConcurrentReconciles: 1,
})
Expect(err).ToNot(HaveOccurred())
Expect(r.setupWatches(mgr, c)).To(Succeed())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This duplicates a bunch of code from SetupWithManager. I don't think we should be doing this, it makes the test brittle. I guess the reason is that you want to append to reconciledCRs? Perhaps consider using a hook for this instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's a great idea. I changed the test to use hooks

Eventually(func() []string {
reconciledCRsMutex.Lock()
defer reconciledCRsMutex.Unlock()
return reconciledCRs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that since this returns a slice pointing at the same backing array, the caller will race with other accesses 🤔
Perhaps instead check the existence of desired element in the Eventually, in the scope of the lock, and then make Should just check for true return value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might not understand the comment correctly but there is mutex in write/read to prevent data race. And the Eventually() matches synchronously after the function returns but before the mutex is unlocked. Also tests run with -race flag and during the development I had a few data race issues but all of them resolved. So I believe it should be fine.

In addition the using of slice/map is more accurate (not perfect though) than bool var because bool in Eventually() will match even when reconciler reconciled two different label selectors

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the Eventually() matches synchronously after the function returns but before the mutex is unlocked.

I don't think this is true. Mutex is released when the func completes, and then the Eventually works on a copy of the returned slice with the same backing array as the mutex-protected slice. This program shows the data corruption that can happen. After running it a few times I got it to produce:

Image

I don't think I get the point behind slice vs bool.. you can have the bool mean anything you need inside the func...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I got your point and moved assertions inside Eventually() and just check the predicate with .Should(BeTrue())

var preds []predicate.Predicate

if r.labelSelector.Size() > 0 {
if len(r.labelSelector.MatchLabels) > 0 || len(r.labelSelector.MatchExpressions) > 0 {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor thing, I guess it's a bit better instead of pb generated method

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better in what sense? I don't think computational performance is important in this (initialization) phase. The goal of using Size() was to make it robust against addition of new fields to the metav1.LabelSelector size.

Copy link
Contributor Author

@kurlov kurlov Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that Size() will be > 0 for empty LabelSelector{} struct but it's not the case. Changed back to Size()

})
})

It("should reconcile CRs independently when using two managers with different label selectors", func() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also added a test with two managers to simulate the original bug when two managers with different label selectors reconciles each other CRs

@kurlov kurlov requested a review from porridge November 27, 2025 07:49
var preds []predicate.Predicate

if r.labelSelector.Size() > 0 {
if len(r.labelSelector.MatchLabels) > 0 || len(r.labelSelector.MatchExpressions) > 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better in what sense? I don't think computational performance is important in this (initialization) phase. The goal of using Size() was to make it robust against addition of new fields to the metav1.LabelSelector size.

Eventually(func() []string {
reconciledCRsMutex.Lock()
defer reconciledCRsMutex.Unlock()
return reconciledCRs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the Eventually() matches synchronously after the function returns but before the mutex is unlocked.

I don't think this is true. Mutex is released when the func completes, and then the Eventually works on a copy of the returned slice with the same backing array as the mutex-protected slice. This program shows the data corruption that can happen. After running it a few times I got it to produce:

Image

I don't think I get the point behind slice vs bool.. you can have the bool mean anything you need inside the func...

}
return nil
})
_ = setupManagerWithSelectorAndPostHook(ctx, postHook, anotherMatchingLabels)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering... now that we have all these managers and reconcilers, how do we make sure they do not affect other Describes? Or that we don't get affected by other ones? I cannot see them being stopped directly anywhere. Do they run indefinitely? 🤔

Copy link
Contributor Author

@kurlov kurlov Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added managers shut down. Also after experiencing a few issues with cache/leftover CRs I decided to not share CRs between tests and even name them differently (there is CR cleanup but still it's really easy to get not obvious issues). It is a few repeated lines in the tests but I believe it is worth it. It will make life a bit easier for person who will touch these tests in the future.

@kurlov kurlov requested a review from porridge December 1, 2025 09:44
Copy link
Member

@porridge porridge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏻

})

By("verifying only the labeled CR was reconciled", func() {
Eventually(func() bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible that the non-labeled object gets reconciled straight after and we don't catch it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, unfortunately there does not seem to be a way to know whether controller-runtime will never do something, or just didn't get round to doing it yet :-(

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worth calling this out in a comment somewhere. In case we hit a flake, we can get the context.

mgr, done := setupManagerWithSelectorAndPostHook(ctx, postHook, matchingLabels)
doneChans = append(doneChans, done)

postHook2 := makePostHook(&mu, &anotherReconciledCRs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we're using the same mutex across the two managers. Could this synchronization cause problems? would it be better to use different mutexes?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants