Skip to content

Conversation

@xrstf
Copy link
Contributor

@xrstf xrstf commented Sep 4, 2025

Summary

This PR implements a new feature for the operator: It can now provision RBAC inside kcp and thereby grant people permissions and take them away, too.

This is now the first PR to make use of the internal-proxy (#87): since admins can configure any random workspace path or cluster name, the operator needs to be able to provision on any shard and more importantly, figure out which shard. To solve this, we use our internal proxy ("internal" still means a standalone Deployment, of course).

Each Kubeconfig object can now hold a workspace and a desired list of permissions inside that workspace. The operator will try to reconcile these RBAC resources accordingly, and also take care of cleaning up when a Kubeconfig is removed or changed (it's possible for users to change the workspace that RBAC should be placed in, and the operator will first cleanup the old cluster and then provision the new one).

To keep track of where RBAC has been deployed, a new field in the Kubeconfig status has been introduced. We discussed this and decided that this is a safe place to do so, as anyone with permissions to manage Kubeconfigs is technically an admin and so endusers cannot/should not fiddle with Kubeconfigs. If that were possible, the operator currently has no way of defending against malicious changes.

Each kubeconfig manages its own RBAC and all resources inside kcp are named based on the UID of the Kubeconfig object. This ensures uniqueness all around and avoids having to merge desired RBACs into one ClusterRole(Binding) and unfiddle them when RBAC for one Kubeconfig is removed.

Notably, since the kcp-operator now has to talk with shards and the front-proxy, this PR modifies the local e2e setup to work like the CI e2e test: build an operator image and deploy it into kind, rather than running the operator on the host machine. This is a bit sad for quick debugging tests, but saves us from somehow having to either dynamically expose the pods through kind to the host, or rewriting URLs in the operator somehow.

What Type of PR Is This?

/kind feature

Related Issue(s)

Fixes #49

Release Notes

Add authorization options to Kubeconfigs: the operator can now grant permissions to access a workspace to make newly created Kubeconfigs immediately useful.

@kcp-ci-bot kcp-ci-bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note-none Denotes a PR that doesn't merit a release note. dco-signoff: yes Indicates the PR's author has signed the DCO. labels Sep 4, 2025
@kcp-ci-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from xrstf. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kcp-ci-bot kcp-ci-bot added do-not-merge/needs-kind Indicates a PR lacks a `kind/foo` label and requires one. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Sep 4, 2025
@kcp-ci-bot kcp-ci-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 10, 2025
@kcp-ci-bot kcp-ci-bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Oct 20, 2025
xrstf added 13 commits October 29, 2025 10:35
On-behalf-of: @SAP christoph.mewes@sap.com
On-behalf-of: @SAP christoph.mewes@sap.com
On-behalf-of: @SAP christoph.mewes@sap.com
On-behalf-of: @SAP christoph.mewes@sap.com
On-behalf-of: @SAP christoph.mewes@sap.com
On-behalf-of: @SAP christoph.mewes@sap.com
…ervice inside kind

On-behalf-of: @SAP christoph.mewes@sap.com
On-behalf-of: @SAP christoph.mewes@sap.com
On-behalf-of: @SAP christoph.mewes@sap.com
On-behalf-of: @SAP christoph.mewes@sap.com
On-behalf-of: @SAP christoph.mewes@sap.com
xrstf added 3 commits October 29, 2025 13:51
On-behalf-of: @SAP christoph.mewes@sap.com
On-behalf-of: @SAP christoph.mewes@sap.com
On-behalf-of: @SAP christoph.mewes@sap.com
@kcp-ci-bot kcp-ci-bot added the release-note Denotes a PR that will be considered when it comes time to generate release notes. label Oct 29, 2025
@kcp-ci-bot kcp-ci-bot removed the release-note-none Denotes a PR that doesn't merit a release note. label Oct 29, 2025
xrstf added 2 commits October 29, 2025 16:36
On-behalf-of: @SAP christoph.mewes@sap.com
On-behalf-of: @SAP christoph.mewes@sap.com
@xrstf
Copy link
Contributor Author

xrstf commented Oct 29, 2025

/kind feature

@kcp-ci-bot kcp-ci-bot added kind/feature Categorizes issue or PR as related to a new feature. and removed do-not-merge/needs-kind Indicates a PR lacks a `kind/foo` label and requires one. labels Oct 29, 2025
@xrstf
Copy link
Contributor Author

xrstf commented Nov 3, 2025

/retest

Comment on lines +62 to +65
clusterRoles:
items:
type: string
type: array
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does clusterRoles only reference objects that already exist in the cluster? I am wondering if it could be made possible to also configure RBAC inside the cluster?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now this is only binding to pre-existing ClusterRoles, yes.

On-behalf-of: @SAP christoph.mewes@sap.com
@xrstf
Copy link
Contributor Author

xrstf commented Nov 7, 2025

/retest

4 similar comments
@xrstf
Copy link
Contributor Author

xrstf commented Nov 7, 2025

/retest

@xrstf
Copy link
Contributor Author

xrstf commented Nov 7, 2025

/retest

@xrstf
Copy link
Contributor Author

xrstf commented Nov 7, 2025

/retest

@xrstf
Copy link
Contributor Author

xrstf commented Nov 17, 2025

/retest

On-behalf-of: @SAP christoph.mewes@sap.com
@xrstf xrstf changed the title WIP - Kubeconfig RBAC Provision RBAC for Kubeconfigs inside kcp Nov 17, 2025
@kcp-ci-bot kcp-ci-bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 17, 2025
@xrstf
Copy link
Contributor Author

xrstf commented Nov 17, 2025

/merge-method squash

@xrstf xrstf added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Nov 17, 2025
//go:build e2e

/*
Copyright 2025 The KCP Authors.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean kcp?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hrhrhrhrhrhrhrhr got me. However kcp-dev/kcp#3665 says

I did not touch the boilerplate header as I am not sure about the CNCF rammifications, but I would personally of course also change them. If we can.

@kcp-ci-bot kcp-ci-bot added the lgtm Indicates that a PR is ready to be merged. label Nov 17, 2025
@kcp-ci-bot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 654f43b28b5bb40f956a6f3e363d93b10caa1d77

Comment on lines +1 to +140
/*
Copyright 2025 The KCP Authors.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/

package client

import (
"context"
"fmt"

"github.com/kcp-dev/logicalcluster/v3"

corev1 "k8s.io/api/core/v1"
"k8s.io/apimachinery/pkg/runtime"
"k8s.io/apimachinery/pkg/types"
"k8s.io/client-go/rest"
ctrlruntimeclient "sigs.k8s.io/controller-runtime/pkg/client"

"github.com/kcp-dev/kcp-operator/internal/resources"
operatorv1alpha1 "github.com/kcp-dev/kcp-operator/sdk/apis/operator/v1alpha1"
)

func NewRootShardClient(ctx context.Context, c ctrlruntimeclient.Client, rootShard *operatorv1alpha1.RootShard, cluster logicalcluster.Name, scheme *runtime.Scheme) (ctrlruntimeclient.Client, error) {
baseUrl := fmt.Sprintf("https://%s.%s.svc.cluster.local:6443", resources.GetRootShardServiceName(rootShard), rootShard.Namespace)

if !cluster.Empty() {
baseUrl = fmt.Sprintf("%s/clusters/%s", baseUrl, cluster.String())
}

return newClient(ctx, c, baseUrl, scheme, rootShard, nil, nil)
}

func NewRootShardProxyClient(ctx context.Context, c ctrlruntimeclient.Client, rootShard *operatorv1alpha1.RootShard, cluster logicalcluster.Name, scheme *runtime.Scheme) (ctrlruntimeclient.Client, error) {
baseUrl := fmt.Sprintf("https://%s.%s.svc.cluster.local:6443", resources.GetRootShardProxyServiceName(rootShard), rootShard.Namespace)

if !cluster.Empty() {
baseUrl = fmt.Sprintf("%s/clusters/%s", baseUrl, cluster.String())
}

return newClient(ctx, c, baseUrl, scheme, rootShard, nil, nil)
}

func NewShardClient(ctx context.Context, c ctrlruntimeclient.Client, shard *operatorv1alpha1.Shard, cluster logicalcluster.Name, scheme *runtime.Scheme) (ctrlruntimeclient.Client, error) {
baseUrl := fmt.Sprintf("https://%s.%s.svc.cluster.local:6443", resources.GetShardServiceName(shard), shard.Namespace)

if !cluster.Empty() {
baseUrl = fmt.Sprintf("%s/clusters/%s", baseUrl, cluster.String())
}

return newClient(ctx, c, baseUrl, scheme, nil, shard, nil)
}

func newClient(
ctx context.Context,
c ctrlruntimeclient.Client,
url string,
scheme *runtime.Scheme,
// only one of these three should be provided, the others nil
rootShard *operatorv1alpha1.RootShard,
shard *operatorv1alpha1.Shard,
frontProxy *operatorv1alpha1.FrontProxy,
) (ctrlruntimeclient.Client, error) {
tlsConfig, err := getTLSConfig(ctx, c, rootShard, shard, frontProxy)
if err != nil {
return nil, fmt.Errorf("failed to determine TLS settings: %w", err)
}

cfg := &rest.Config{
Host: url,
TLSClientConfig: tlsConfig,
}

return ctrlruntimeclient.New(cfg, ctrlruntimeclient.Options{Scheme: scheme})
}

// +kubebuilder:rbac:groups=core,resources=secrets,verbs=get

func getTLSConfig(ctx context.Context, c ctrlruntimeclient.Client, rootShard *operatorv1alpha1.RootShard, shard *operatorv1alpha1.Shard, frontProxy *operatorv1alpha1.FrontProxy) (rest.TLSClientConfig, error) {
rootShard, err := getRootShard(ctx, c, rootShard, shard, frontProxy)
if err != nil {
return rest.TLSClientConfig{}, fmt.Errorf("failed to determine effective RootShard: %w", err)
}

// get the secret for the kcp-operator client cert
key := types.NamespacedName{
Namespace: rootShard.Namespace,
Name: resources.GetRootShardCertificateName(rootShard, operatorv1alpha1.OperatorCertificate),
}

certSecret := &corev1.Secret{}
if err := c.Get(ctx, key, certSecret); err != nil {
return rest.TLSClientConfig{}, fmt.Errorf("failed to get root shard proxy Secret: %w", err)
}

return rest.TLSClientConfig{
CAData: certSecret.Data["ca.crt"],
CertData: certSecret.Data["tls.crt"],
KeyData: certSecret.Data["tls.key"],
}, nil
}

// +kubebuilder:rbac:groups=operator.kcp.io,resources=rootshards,verbs=get

func getRootShard(ctx context.Context, c ctrlruntimeclient.Client, rootShard *operatorv1alpha1.RootShard, shard *operatorv1alpha1.Shard, frontProxy *operatorv1alpha1.FrontProxy) (*operatorv1alpha1.RootShard, error) {
if rootShard != nil {
return rootShard, nil
}

var ref *corev1.LocalObjectReference

switch {
case shard != nil:
ref = shard.Spec.RootShard.Reference

case frontProxy != nil:
ref = frontProxy.Spec.RootShard.Reference

default:
panic("Must be called with either RootShard, Shard or FrontProxy.")
}

rootShard = &operatorv1alpha1.RootShard{}
if err := c.Get(ctx, types.NamespacedName{Namespace: rootShard.Namespace, Name: ref.Name}, rootShard); err != nil {
return nil, fmt.Errorf("failed to get RootShard: %w", err)
}

return rootShard, nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uh. So if somebody misconfigures somebody one reference, we get into a panic loop? can we error here? Let's say we add a cache server in the future, forget to update this switch and boom...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you misread the code a bit (it's not my prettiest, but it's the best I could do).

Any misconfigured references will be caught and reported as errors in NewInternalKubeconfigClient. This is where the refs are checked and the appropriate functions are being called. It's impossible for the code to panic just because of a misconfigured object. The panic occurs when a developer calls the function wrong. getRootShard is only called by getTLSConfig, which is only called by the 3 explicit helper functions. And calling any of them with nil is also still a developer error and not something that should be reported as a runtime issue.

add a cache server in the future, forget to update this switch and boom...

That is exactly why it's a panic. So we do not forget.

Copy link
Contributor

@mjudeikis mjudeikis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove that one panic from the reconciler? :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dco-signoff: yes Indicates the PR's author has signed the DCO. kind/feature Categorizes issue or PR as related to a new feature. lgtm Indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feature: allow RBAC bootstrapping for user identities in Kubeconfigs

5 participants