Skip to content

Conversation

xrstf
Copy link
Contributor

@xrstf xrstf commented Jul 11, 2025

Summary

This PR implements a new feature for kcp: per-workspace authentication. Admins can configure authentication configurations (as known from Kubernetes) for each workspace type and if a request targets a logicalcluster for which the type has auth configs, then this new authentication mechanism will form a union with the existing kcp authentication.

This feature is guarded by a new alpha feature gate WorkspaceAuthentication, disabled by default. It can be set on each shard and each front-proxy independently.

Feature Overview

The primary intended usecase for this is inside the front-proxy. The proxy has a global view and can reliably resolve the necessary objects across all shards. If desired admins can even setup a dedicated front-proxy that has this new feature enabled.

However, to make e2e testing more convenient, the feature is also part of a shard's localproxy (if the feature is enabled on that shard, of course). In this case, the authenticator will only work if all of AuthConfig, WorkspaceType, Workspace and LogicalCluster are on the same shard. It purposefully does not try to gain a global view across all shards and this is meant to be done by using front-proxies.

API

The new API type WorkspaceAuthenticationConfiguration is based entirely on kube-apiserver's AuthenticationConfiguration, which could not be used because of a) the heavy dependency and b) because it doesn't have any json tags.

There is a primitive set of conversion functions to convert from our API to Kube API, in order to then construct the OIDC authenticators.

Architecture

kcp's front-proxy already comes with an index (a collection of maps of shards, clusters, workspaces etc.) and an index_controller (responsible for connecting dynamically to each shard, watch these resources on them and feed the index). This index is used to resolve incoming workspace paths and/or logicalclusters to their target shard.

This index was extended to also keep track of the workspace type for each logicalcluster (thankfully the fully-qualified workspace type is stored as an annotation on each LogicalCluster) and will include this information when calling the Lookup() function.

The index and its controller are cloned for this new feature, forming the authIndex and authIndexController. These do similar things, but keep track of WorkspaceTypes and AuthConfigs. The authIndex builds and starts the authenticators for each AuthConfig, since the OIDC authentication in Kubernetes is stateful and requires discovery first, so it's not possible to just statelessly build an authenticator and use it only for one request.

The authIndex then ultimately also has a Lookup(workspaceType) function, which will resolve the type, find the auth configs and then assemble them into a single authenticator union.

Important: For a token to be accepted, it has to contain both the configured kcp audience (from the CLI flags) and the audiences configured in the AuthenticationConfiguration objects. It's simply not possible in Kube to have the "core" audiences be ignored because of the way Kube handles audience validation. I think this also makes sense to require tokens to at least specify that they are meant to be used in kcp.

Kube does audience validation in a weird way. It stores the *REQUIRED* audiences in the request context, but named `audiences`, leading you to believe these are the audiences that came with the request. The existence of this context field will make Kube validate that the incoming token has this audience set. There is no way to disable this behaviour when constructing a Kube OIDC authenticator.

Even worse: Even if you hack around this and replace the context of a request entirely before calling our own per-workspace authenticator, then this authenticator's response will only contain the audiences configured in the AuthenticationConfigurations. This means the core kcp audience will be missing from the auth result.

BUT Kube's authenticator is built to check that after calling the underlying authenticator (us), the audiences match what is in the request context. This context it has remembered and is out of our reach entirely. So if we cheated with a custom context, the surrounding check by Kube would fail again.

Request Handling

Importantly, the front-proxy has the concept of optional authentication, where the authenticator is only called if additional authn flags were configured on the front-proxy. Otherwise unauthenticated requests are passed on to the shards.

With this feature, the existence of an authenticator is not a static decision anymore, but depends on the target logicalcluster.

To allow for this, this PR...

  • refactors the proxy's mapping handler to not perform the index lookup and forwarding in the same code, but instead have a new lookup middleware that is only responsible for calling index.Lookup() and storing the logicalcluster, shard and now workspacetype in the request's context. It's necessary to have this information in the context because...
  • a new AuthResolver middleware is added early (but after lookup) into the handler chain of both the front-proxy and the kcp shard server. This new middleware will take the looked up workspacetype and ask the authIndex for the relevant authenticator. If one is found, it is stored in the request context. Then...
  • in the front-proxy, the optionalAuthentication middleware now takes the optional authenticator out of the context and combines it smartly with the other CLI-based auth configs,
  • in the kcp shard server, the existing Authenticator is unioned with a WorkspaceAuthenticator, which in itself just takes the authenticator out of the context and calls it.

Notable, in the shard/localproxy scenario, there is no authIndexController managing the watches per shard, since there is only the local shard. This means the kcp server has to start one single ShardWatcher directly, whereas the front-proxy has the authIndexController to manage ShardWatchers dynamically. ShardWatchers as an abstraction only exist because of this stripped-down behaviour in the localproxy.

Testing

I've added two new e2e tests, covering the basics of this feature. For testing I decided to use mockoidc, which allows to run an OIDC server in-process, which is just super convenient. However the upstream module has some bugs, so I had to fork it and fix those.

(This makes me wonder if httest should also come as a Go library, so our authorizer tests do have to manage fewer external processes...)

What Type of PR Is This?

/kind feature

Related Issue(s)

Fixes #3428

Release Notes

Add per-workspace authentication feature (behind the disabled by default feature gate `WorkspaceAuthentication`), allowing to configure additional authenticators (JWT/OIDC at the moment) for workspace types in order to admit external users into logical clusters.

@kcp-ci-bot kcp-ci-bot added release-note-none Denotes a PR that doesn't merit a release note. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/feature Categorizes issue or PR as related to a new feature. dco-signoff: yes Indicates the PR's author has signed the DCO. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Jul 11, 2025
@xrstf xrstf changed the title WIP - Per workspace authn WIP - Per-workspace Authentication Jul 11, 2025
@xrstf xrstf force-pushed the per-workspace-authn branch 3 times, most recently from f5ec0ce to 2092bcc Compare August 2, 2025 21:13
@xrstf
Copy link
Contributor Author

xrstf commented Aug 2, 2025

/retest

@xrstf xrstf force-pushed the per-workspace-authn branch from 2092bcc to 51a4db2 Compare August 3, 2025 15:25
@xrstf
Copy link
Contributor Author

xrstf commented Aug 3, 2025

/retest

2 similar comments
@xrstf
Copy link
Contributor Author

xrstf commented Aug 3, 2025

/retest

@xrstf
Copy link
Contributor Author

xrstf commented Aug 3, 2025

/retest

Copy link
Member

@embik embik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of notes. In addition, I have two important topics I couldn't quite place in the code:

  1. We should make sure we attach a scope to authenticated users in a workspace, so that their identity is only valid in the workspace they have authenticated for.
  2. What happens with the mini-front-proxy in the server when a WorkspaceType / AuthenticationConfig in a different shard is referenced? Is that (not) supposed to work?

@xrstf xrstf force-pushed the per-workspace-authn branch from f1ad557 to 1c72f76 Compare August 6, 2025 16:54
@xrstf
Copy link
Contributor Author

xrstf commented Aug 6, 2025

  1. I've added that functionality and added an e2e test to verify it.
  2. Yes, that's correct. The localproxy auth functionality only works if Workspace, WorkspaceType, LogicalCluste and WorkspaceAuthentication exist on the local shard. There is purposefully no intention to change this, as I would simply recommend using the front-proxy if you want this feature across all your shards. That's at least how I intended it.

@kcp-ci-bot kcp-ci-bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-none Denotes a PR that doesn't merit a release note. labels Aug 6, 2025
@xrstf xrstf changed the title WIP - Per-workspace Authentication Per-workspace Authentication Aug 7, 2025
@kcp-ci-bot kcp-ci-bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 7, 2025
@xrstf
Copy link
Contributor Author

xrstf commented Aug 8, 2025

/retest

Copy link
Contributor

@mjudeikis mjudeikis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More questions than issues :) looks very promising!

@@ -354,7 +355,7 @@ test-run-sharded-server: LOG_DIR ?= $(WORK_DIR)/.kcp
test-run-sharded-server:
mkdir -p "$(LOG_DIR)" "$(WORK_DIR)/.kcp"
rm -f "$(WORK_DIR)/.kcp/ready-to-test"
UNSAFE_E2E_HACK_DISABLE_ETCD_FSYNC=true NO_GORUN=1 ./bin/sharded-test-server --quiet --v=2 --log-dir-path="$(LOG_DIR)" --work-dir-path="$(WORK_DIR)" --shard-run-virtual-workspaces=false --shard-feature-gates=$(TEST_FEATURE_GATES) $(TEST_SERVER_ARGS) --number-of-shards=2 2>&1 & PID=$$!; echo "PID $$PID" && \
UNSAFE_E2E_HACK_DISABLE_ETCD_FSYNC=true NO_GORUN=1 ./bin/sharded-test-server --quiet --v=2 --log-dir-path="$(LOG_DIR)" --work-dir-path="$(WORK_DIR)" --shard-run-virtual-workspaces=false --shard-feature-gates=$(TEST_FEATURE_GATES) --proxy-feature-gates=$(PROXY_FEATURE_GATES) $(TEST_SERVER_ARGS) --number-of-shards=2 2>&1 & PID=$$!; echo "PID $$PID" && \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With second flag this looks strange now. 2 featureflags flags. Should we maybe get ticket to deprecate --shard-feature-gates in favor of --feature-gates and use use one everywhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because I needed to be able to control the feature independently between shard and front-proxy, because some tests are meant to ensure that the front-proxy does the authentication. That's why the sharded-test-server has a new CLI flag. The front-proxy's flag is called --feature-gate, like the shard's CLI flag is. Only on the test server do we need 2 sets of feature gates, hence 2 CLI flags.


Workspace types then reference a set of auth configs, and their configuration will apply to all their workspaces/logicalclusters. Compared to many other settings in a `WorkspaceType` that work only as a preset for _new_ workspaces, the configured auth configs will continue to affect workspaces, so when a `WorkspaceType` is changed, this will impact existing workspaces, too.

For every incoming HTTPS request, kcp will then resolve the logicalcluster, determine the used workspace type, assemble the list of auth configs and create an authenticator suitable for exactly the one logicalcluster targeted by the request. This workspace authenticator is an *alternative* to kcp's regular authentication (i.e. it forms a union with it).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is behaviour when you dont have set any core authentication and only AuthConfig?
One might not even have auth flags in kcp and just AuthConfig?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you do not have any auth flags configured on the front-proxy, it will pass through all requests targetting clusters without AuthConfigs. For clusters with AuthConfigs (via their types), the front-proxy will perform authentication and reject the request if it cannot be authenticated.

Without any auth flags, it will be difficult to access the root workspace I think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without any auth flags, it will be difficult to access the root workspace I think.
you could just use admin kubeconfig as admin. Its a security feature - you know there is no OIDC access to anything else apart workspaces, using it.

subjects:
- apiGroup: rbac.authorization.k8s.io
kind: Group
name: oidc:admins
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your example does not have oidc: prefix. This will not work, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can one create group system:masters or system:kcp:admin` in external authentication OIDC and so get wider access than one should?

Thinking if we should force everybody to use prefix for authentication configs? Or prevent certain groups in these authConfigs.

@embik wdyt?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, good question. We might have to do that indeed. I wonder if we should have a prefix prepended that is simply the cluster ID, even though that would be massively annoying for creating RBAC ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this is a problem. Yes the example could show the username claim with the "oidc:" prefix, but if you have AuthConfigs that do not configure a claim prefix, then yes, users could get any arbitrary group assigned to them. But their authentication is limitted (via the scope annotation thingy on the UserInfo) to just one single logicalcluster, so it should be fine. And who are we to prevent kcp users from giving someone poweruser permissions inside their clusters?

But I'm open to explicit rules you guys want to see here.

Copy link
Member

@embik embik Aug 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have to avoid privileged identities (the one @mjudeikis mentioned) because they can reach past the workspace and circumvent authorization. I'm not sure that scopes are enough for keeping them in check. Maybe the authenticator should reject anything with a system: prefix.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the point, They might be scoped to workspace, but Im not even sure what other "holes" they could poke with this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to reject non-prefixed AuthSettings or eventually forbid the WorkspaceAuthenticator from returning a set of protected groups, regardless of the OIDC settings?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a wrapper that removes system: groups and rejects system: names outright. A new e2e test ensures that this works.

@kcp-ci-bot kcp-ci-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 14, 2025
xrstf added 4 commits August 14, 2025 16:12
On-behalf-of: @SAP christoph.mewes@sap.com
On-behalf-of: @SAP christoph.mewes@sap.com
On-behalf-of: @SAP christoph.mewes@sap.com
On-behalf-of: @SAP christoph.mewes@sap.com
xrstf added 16 commits August 14, 2025 16:12
…re for localproxy

My previous attempt at indexing the workspace types was not actually properly shard-aware. This
commit fixes that and moves the code to pkg/authentication/ since it wil be soon be re-used by the
localproxy.

On-behalf-of: @SAP christoph.mewes@sap.com
…also have the feature working

This is mainly thought to make `make test-e2e` work properly. My main idea is still to enable this
feature on front-proxies, not per-shard.

On-behalf-of: @SAP christoph.mewes@sap.com
On-behalf-of: @SAP christoph.mewes@sap.com
On-behalf-of: @SAP christoph.mewes@sap.com
* the service account resolver needs to run before our authentication
* additionalAuth alone is not alone; it actually depends on the target cluster whether authentication
  is supposed to happen, i.e. it's request-dependent. Rework to introduce a new middleware that finds
  the authenticator before the authentication middleware, so it can make an informed decision.

On-behalf-of: @SAP christoph.mewes@sap.com
On-behalf-of: @SAP christoph.mewes@sap.com
On-behalf-of: @SAP christoph.mewes@sap.com
On-behalf-of: @SAP christoph.mewes@sap.com
On-behalf-of: @SAP christoph.mewes@sap.com
On-behalf-of: @SAP christoph.mewes@sap.com
On-behalf-of: @SAP christoph.mewes@sap.com
@xrstf xrstf force-pushed the per-workspace-authn branch from b6cd80d to 46b1c0c Compare August 15, 2025 09:26
@kcp-ci-bot kcp-ci-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 15, 2025
@xrstf xrstf requested review from embik and mjudeikis August 15, 2025 09:39
Copy link
Member

@embik embik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

@kcp-ci-bot kcp-ci-bot added the lgtm Indicates that a PR is ready to be merged. label Aug 21, 2025
@kcp-ci-bot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 1961af325ace6f7c1b219e0d74737b78ec5fc0f1

@kcp-ci-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: embik

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kcp-ci-bot kcp-ci-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 21, 2025
@embik
Copy link
Member

embik commented Aug 21, 2025

/retest

@kcp-ci-bot kcp-ci-bot merged commit 339e173 into kcp-dev:main Aug 21, 2025
14 checks passed
@xrstf xrstf deleted the per-workspace-authn branch August 25, 2025 09:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. dco-signoff: yes Indicates the PR's author has signed the DCO. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/feature Categorizes issue or PR as related to a new feature. lgtm Indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

feature: per-workspace authentication configuration
4 participants