feat: Extended OIDC support to extract groups & namespaces and token injection with multiple methods#6089
Conversation
|
Thanks a lot for this 🙌 We’re currently using a bit of a temporary hack by overriding Really appreciate the work here! 💜 |
|
Thank you for this work. Could you add an override for the "current_user" parameter? Our OIDC server uses the UPN instead of preferred_username and I'm currently working on a monkey patch solution for this. But it would be much more convenient to have this in the official release. |
b15294d to
bcd3bd3
Compare
bcd3bd3 to
b15294d
Compare
b15294d to
712a2b9
Compare
bd69747 to
2ff5568
Compare
ee462c9 to
09858f9
Compare
| value: quay.io/feastdev/feature-server:0.61.0 | ||
| - name: RELATED_IMAGE_CRON_JOB | ||
| value: quay.io/openshift/origin-cli:4.17 | ||
| - name: OIDC_ISSUER_URL |
There was a problem hiding this comment.
🔴 OIDC_ISSUER_URL env var missing value: field in bundle CSV makes OLM deployment spec invalid
In the operator's OLM bundle CSV at line 273, the OIDC_ISSUER_URL env entry is missing a value: key. Compare the correct form in config/manager/manager.yaml:82-83 which has value: "". Without value:, the YAML parser treats the next line (image: quay.io/...) as a key within the env var map entry rather than a sibling container-level field. This makes the container spec structurally invalid: the container loses its image field entirely. OLM deployments using this CSV will fail to create the operator pod.
Expected vs actual YAML structure
Expected (from config/manager/manager.yaml):
- name: OIDC_ISSUER_URL
value: ""
image: quay.io/feastdev/feast-operator:0.61.0Actual (bundle CSV):
- name: OIDC_ISSUER_URL
image: quay.io/feastdev/feast-operator:0.61.0Was this helpful? React with 👍 or 👎 to provide feedback.
There was a problem hiding this comment.
operator-sdk generate bundle strips value: "" from env vars by design. This is a known operator-sdk behavior as it considers value: "" equivalent to absent and removes it during YAML serialization. The source manager.yaml has value: "", kustomize build (install.yaml) preserves it, but operator-sdk (CSV) strips it.
- Add OIDC_ISSUER_URL to manager Deployment env (base) - Add param to ODH/RHOAI params.env with empty default - Kustomize replacements from feast-operator-parameters ConfigMap - Document Open Data Hub operator integration Pairs with opendatahub-operator injecting OIDC_ISSUER_URL at reconcile time from GatewayConfig when the cluster uses external OIDC (RHOAIENG-55767). Made-with: Cursor Signed-off-by: Aniket Paluskar <[email protected]>
…r Secret-less OIDC configuration Signed-off-by: Aniket Paluskar <[email protected]>
…-signed OIDC provider TLS verification Signed-off-by: Aniket Paluskar <[email protected]>
Signed-off-by: Aniket Paluskar <[email protected]>
…mization.yaml to upstream default Signed-off-by: Aniket Paluskar <[email protected]>
Signed-off-by: Aniket Paluskar <[email protected]>
…fix mypy attr-defined error Signed-off-by: Aniket Paluskar <[email protected]>
… CA path into non-ODH OIDC config Signed-off-by: Aniket Paluskar <[email protected]>
… path, pass ca_cert_path to client token fetch, and update secrets baseline Signed-off-by: Aniket Paluskar <[email protected]>
…3-tier priority Signed-off-by: Aniket Paluskar <[email protected]>
… token parser Signed-off-by: Aniket Paluskar <[email protected]>
Signed-off-by: Aniket Paluskar <[email protected]>
| try: | ||
| await self._validate_token(access_token) | ||
| logger.debug("Token successfully validated.") | ||
| except Exception as e: | ||
| if self._is_ssl_error(e): | ||
| logger.error( | ||
| "OIDC provider SSL certificate verification failed. " | ||
| "If using a self-signed certificate, set verify_ssl: false " | ||
| "or provide a CA certificate via ca_cert_path." | ||
| ) | ||
| logger.error(f"Token validation failed: {e}") | ||
| raise AuthenticationError(f"Invalid token: {e}") |
There was a problem hiding this comment.
🔴 OidcTokenParser._decode_token called redundantly even though unverified decode already happened
In OidcTokenParser.user_details_from_access_token (oidc_token_parser.py:186), _decode_token calls PyJWKClient and jwt.decode to fully decode the token. However, the method already did an unverified jwt.decode at line 152 to get unverified. The unverified payload is discarded and the token is decoded again via _decode_token, but the _validate_token call at line 174 already checked it with OAuth2AuthorizationCodeBearer. When _decode_token is called, PyJWKClient makes an HTTP call to the JWKS endpoint using an ssl_context derived from config — but _validate_token at line 54-60 does not use the same ssl_context. So _validate_token may fail with an SSL error on self-signed certs (using system trust), while _decode_token would succeed (with custom CA). The SSL context configuration is only applied to the JWKS fetch in _decode_token, not to the OAuth2AuthorizationCodeBearer validation which internally hits the OIDC discovery endpoints without the custom ssl_context/verify_ssl settings.
Prompt for agents
The _validate_token method at line 45-60 in oidc_token_parser.py uses OAuth2AuthorizationCodeBearer which calls the OIDC discovery endpoints (token_url, authorization_url, refresh_url) from OIDCDiscoveryService. While OIDCDiscoveryService._fetch_discovery_data does respect verify_ssl/ca_cert_path, the OAuth2AuthorizationCodeBearer __call__ method from FastAPI does NOT use the custom SSL context. It simply uses the default httpx/requests SSL verification. This means that for self-signed OIDC providers, the _validate_token step will fail with SSL errors even though verify_ssl is set to false or a ca_cert_path is provided. The _decode_token method correctly uses an ssl_context, but _validate_token does not pass one through. This issue makes the verify_ssl and ca_cert_path settings ineffective for the token validation step on the server side. Consider either removing the _validate_token step (since _decode_token already verifies the signature and expiry) or patching the OAuth2 scheme to use the custom SSL context.
Was this helpful? React with 👍 or 👎 to provide feedback.
There was a problem hiding this comment.
The unverified decode is intentional for routing (SA token detection, intra-comm check)
OAuth2AuthorizationCodeBearer.call() does not make network calls, so there is no SSL mismatch
The _validate_token being a lightweight header check is pre-existing behavior, not introduced by this PR, and is tracked as a future improvement
| apimeta.RemoveStatusCondition(&authz.Handler.FeatureStore.Status.Conditions, feastKubernetesAuthConditions[metav1.ConditionTrue].Type) | ||
|
|
||
| if authz.isOidcAuth() { | ||
| if err := authz.createFeastClusterRole(); err != nil { |
There was a problem hiding this comment.
createFeastClusterRoleis using setFeastClusterRole, which gives OIDC-only deployments unnecessary exctra permissions, but since those are only read-only, considering it as non-blocker, can be handled as followup
There was a problem hiding this comment.
this also leave the ClusterRole and ClusterRoleBinding are orphaned when a user switches from OIDC auth to no auth (or to Kubernetes auth)
There was a problem hiding this comment.
Will address this in follow up PR. Thanks for review.
ntkathole
left a comment
There was a problem hiding this comment.
Overall changes looking good, big PR and functionality
|
@aniketpalu Please make sure to have followup changes as required for non blocking issues |
# [0.61.0](v0.60.0...v0.61.0) (2026-04-07) ### Bug Fixes * Add grpcio dependency group to transformation server Dockerfile ([2c2150a](2c2150a)) * Add https readiness check for rest-registry tests ([ea85e63](ea85e63)) * Add website build check for PRs and fix blog frontmatter YAML error ([#6079](#6079)) ([30a3a43](30a3a43)) * Added missing jackc/pgx/v5 entries ([94ad0e7](94ad0e7)) * Added MLflow metric charts across feature selection ([#6080](#6080)) ([a403361](a403361)) * Check duplicate names for feature view across types ([#5999](#5999)) ([95b9af8](95b9af8)) * Fix integration tests ([#6046](#6046)) ([02d5548](02d5548)) * Fix missing error handling for resource_counts endpoint ([d9706ce](d9706ce)) * Fix non-specific label selector on metrics service ([a1a160d](a1a160d)) * fix path feature_definitions.py ([7d7df68](7d7df68)) * Fix regstry Rest API tests intermittent failure ([d53a339](d53a339)) * Fixed IntegrityError on SqlRegistry ([#6047](#6047)) ([325e148](325e148)) * Fixed intermittent failures in get_historical_features ([c335ec7](c335ec7)) * Fixed pre-commit check ([114b7db](114b7db)) * Fixed the intermittent FeatureViewNotFoundException ([661ecc7](661ecc7)) * Fixed uv cache permission error for docker build on mac ([ad807be](ad807be)) * Fixes a `PydanticDeprecatedSince20` warning for trino_offline_store ([#5991](#5991)) ([abfd18a](abfd18a)) * Handle existing RBAC role gracefully in namespace registry ([b46a62b](b46a62b)) * Ignore ipynb files during apply ([#6151](#6151)) ([4ea123d](4ea123d)) * Integration test failures ([#6040](#6040)) ([9165870](9165870)) * Mount TLS volumes for init container ([080a9b5](080a9b5)) * **postgres:** Use end_date in synthetic entity_df for non-entity retrieval ([#6110](#6110)) ([088a802](088a802)), closes [#6066](#6066) * Ray offline store tests are duplicated across 3 workflows ([54f705a](54f705a)) * Reenable tests ([#6036](#6036)) ([82ee7f8](82ee7f8)) * SSL/TLS mode by default for postgres connection ([4844488](4844488)) * Use commitlint pre-commit hook instead of a separate action ([35a81e7](35a81e7)) ### Features * Add Claude Code agent skills for Feast ([#6081](#6081)) ([1e5b60f](1e5b60f)), closes [#5976](#5976) [#6007](#6007) * Add complex type support (Map, JSON, Struct) with schema validation ([#5974](#5974)) ([1200dbf](1200dbf)) * Add decimal to supported feature types ([#6029](#6029)) ([#6226](#6226)) ([cff6fbf](cff6fbf)) * Add feast apply init container to automate registry population on pod start ([#6106](#6106)) ([6b31a43](6b31a43)) * Add feature view versioning support to PostgreSQL and MySQL online stores ([#6193](#6193)) ([940e0f0](940e0f0)), closes [#6168](#6168) [#6169](#6169) [#2728](#2728) * Add materialization, feature freshness, request latency, and push metrics to feature server ([2c6be18](2c6be18)) * Add metadata statistics to registry api ([ef1d4fc](ef1d4fc)) * Add non-entity retrieval support for ClickHouse offline store ([4d08ddc](4d08ddc)), closes [#5835](#5835) * Add OnlineStore for MongoDB ([#6025](#6025)) ([bf4e3fa](bf4e3fa)), closes [golang/go#74462](golang/go#74462) * Add Oracle DB as Offline store in python sdk & operator ([#6017](#6017)) ([9d35368](9d35368)) * Add RBAC aggregation labels to FeatureStore ClusterRoles ([daf77c6](daf77c6)) * Add ServiceMonitor auto-generation for Prometheus discovery ([#6126](#6126)) ([56e6d21](56e6d21)) * Add typed_features field to grpc write request (([#6117](#6117)) ([#6118](#6118)) ([eeaa6db](eeaa6db)), closes [#6116](#6116) * Add UUID and TIME_UUID as feature types ([#5885](#5885)) ([#5951](#5951)) ([5d6e311](5d6e311)) * Add version indicators to lineage graph nodes ([#6187](#6187)) ([73805d3](73805d3)) * Add version tracking to FeatureView ([#6101](#6101)) ([ed4a4f2](ed4a4f2)) * Added Agent skills for AI Agents ([#6007](#6007)) ([99008c8](99008c8)) * Added CodeQL SAST scanning and detect-secrets pre-commit hook ([547b516](547b516)) * Added odfv transformations metrics ([8b5a526](8b5a526)) * Adding optional name to Aggregation (feast-dev[#5994](#5994)) ([#6083](#6083)) ([56469f7](56469f7)) * Created DocEmbedder class ([#5973](#5973)) ([0719c06](0719c06)) * Extended OIDC support to extract groups & namespaces and token injection with multiple methods ([#6089](#6089)) ([7c04026](7c04026)) * Feature Server High-Availability on Kubernetes ([#6028](#6028)) ([9c07b4c](9c07b4c)), closes [Hi#Availability](https://github.com/Hi/issues/Availability) [Hi#Availability](https://github.com/Hi/issues/Availability) * **go:** Implement metrics and tracing for http and grpc servers ([#5925](#5925)) ([2b4ec9a](2b4ec9a)) * Horizontal scaling support to the Feast operator ([#6000](#6000)) ([3ec13e6](3ec13e6)) * Making feature view source optional (feast-dev[#6074](#6074)) ([#6075](#6075)) ([76917b7](76917b7)) * Replace ORJSONResponse with Pydantic response models for faster JSON serialization ([65cf03c](65cf03c)) * Support arm docker build ([#6061](#6061)) ([1e1f5d9](1e1f5d9)) * Support distinct count aggregation [[#6116](#6116)] ([3639570](3639570)) * Support HTTP in MCP ([#6109](#6109)) ([e72b983](e72b983)) * Support nested collection types (Array/Set of Array/Set) ([#5947](#5947)) ([#6132](#6132)) ([ab61642](ab61642)) * Support podAnnotations on Deployment pod template ([1b3cdc1](1b3cdc1)) * Use orjson for faster JSON serialization in feature server ([6f5203a](6f5203a)) * Utilize date partition column in BigQuery ([#6076](#6076)) ([4ea9b32](4ea9b32)) ### Performance Improvements * Online feature response construction in a single pass over read rows ([113fb04](113fb04)) * Optimize protobuf parsing in Redis online store ([#6023](#6023)) ([59dfdb8](59dfdb8)) * Optimize timestamp conversion in _convert_rows_to_protobuf ([33a2e95](33a2e95)) * Parallelize DynamoDB batch reads in sync online_read ([#6024](#6024)) ([9699944](9699944)) * Remove redundant entity key serialization in online_read ([d87283f](d87283f))
# [0.62.0](v0.61.0...v0.62.0) (2026-04-08) ### Bug Fixes * Added missing jackc/pgx/v5 entries ([94ad0e7](94ad0e7)) * Fix missing error handling for resource_counts endpoint ([d9706ce](d9706ce)) * fix path feature_definitions.py ([7d7df68](7d7df68)) * Fix regstry Rest API tests intermittent failure ([d53a339](d53a339)) * Fixed intermittent failures in get_historical_features ([c335ec7](c335ec7)) * Fixed the intermittent FeatureViewNotFoundException ([661ecc7](661ecc7)) * Handle existing RBAC role gracefully in namespace registry ([b46a62b](b46a62b)) * Ignore ipynb files during apply ([#6151](#6151)) ([4ea123d](4ea123d)) * Mount TLS volumes for init container ([080a9b5](080a9b5)) * **postgres:** Use end_date in synthetic entity_df for non-entity retrieval ([#6110](#6110)) ([088a802](088a802)), closes [#6066](#6066) * SSL/TLS mode by default for postgres connection ([4844488](4844488)) * Sync v0.61-branch so v0.61.0 tag is reachable from master ([af66878](af66878)) ### Features * Add Claude Code agent skills for Feast ([#6081](#6081)) ([1e5b60f](1e5b60f)), closes [#5976](#5976) [#6007](#6007) * Add decimal to supported feature types ([#6029](#6029)) ([#6226](#6226)) ([cff6fbf](cff6fbf)) * Add feast apply init container to automate registry population on pod start ([#6106](#6106)) ([6b31a43](6b31a43)) * Add feature view versioning support to PostgreSQL and MySQL online stores ([#6193](#6193)) ([940e0f0](940e0f0)), closes [#6168](#6168) [#6169](#6169) [#2728](#2728) * Add metadata statistics to registry api ([ef1d4fc](ef1d4fc)) * Add Oracle DB as Offline store in python sdk & operator ([#6017](#6017)) ([9d35368](9d35368)) * Add RBAC aggregation labels to FeatureStore ClusterRoles ([daf77c6](daf77c6)) * Add ServiceMonitor auto-generation for Prometheus discovery ([#6126](#6126)) ([56e6d21](56e6d21)) * Add typed_features field to grpc write request (([#6117](#6117)) ([#6118](#6118)) ([eeaa6db](eeaa6db)), closes [#6116](#6116) * Add UUID and TIME_UUID as feature types ([#5885](#5885)) ([#5951](#5951)) ([5d6e311](5d6e311)) * Add version indicators to lineage graph nodes ([#6187](#6187)) ([73805d3](73805d3)) * Add version tracking to FeatureView ([#6101](#6101)) ([ed4a4f2](ed4a4f2)) * Added Agent skills for AI Agents ([#6007](#6007)) ([99008c8](99008c8)) * Added odfv transformations metrics ([8b5a526](8b5a526)) * Created DocEmbedder class ([#5973](#5973)) ([0719c06](0719c06)) * Extended OIDC support to extract groups & namespaces and token injection with multiple methods ([#6089](#6089)) ([7c04026](7c04026)) * Replace ORJSONResponse with Pydantic response models for faster JSON serialization ([65cf03c](65cf03c)) * Support distinct count aggregation [[#6116](#6116)] ([3639570](3639570)) * Support HTTP in MCP ([#6109](#6109)) ([e72b983](e72b983)) * Support nested collection types (Array/Set of Array/Set) ([#5947](#5947)) ([#6132](#6132)) ([ab61642](ab61642)) * Support podAnnotations on Deployment pod template ([1b3cdc1](1b3cdc1)) * Utilize date partition column in BigQuery ([#6076](#6076)) ([4ea9b32](4ea9b32)) ### Performance Improvements * Online feature response construction in a single pass over read rows ([113fb04](113fb04))
What this PR does / why we need it:
Extends OIDC authentication in Feast to support
GroupBasedPolicyandNamespaceBasedPolicyfor OIDC users, adds flexible client token injection, and wires up operator support for OIDC deployments.Previously, the OIDC token parser only extracted
preferred_usernameandresource_accessroles from the JWT.GroupBasedPolicy,NamespaceBasedPolicy, andCombinedGroupNamespacePolicycould never grant access for OIDC users because theUserobject was always created with empty groups and namespaces.Server Side Changes
Files:
oidc_token_parser.py,oidc_service.py,utils.pyWhen
auth.type: oidc, the Feast server now handles two types of incoming tokens:preferred_username(withupnfallback for Azure AD / Entra ID),groupsclaim, andresource_access.<client_id>.roles.GroupBasedPolicyuses the groups.client_idis optional; when absent, roles default to empty and groups still work.kubernetes.ioclaim in an initial unverified decode. These are delegated to a lightweight TokenReview that validates the token and extracts the namespace from the SA identity.NamespaceBasedPolicyuses the namespace. No RBAC queries are performed, onlytokenreviews/createis needed.Additional server improvements:
verify_sslfield (defaulttrue) controls TLS verification for OIDC discovery and JWKS endpoints. Set tofalsefor self-signed certificates.ca_cert_pathfield allows specifying a custom CA certificate for the OIDC provider whenverify_ssl: true.PyJWKClientErroris now caught (previously onlyInvalidTokenErrorwas caught), preventing 500 responses when JWKS endpoints are unreachable or keys are not found.Server Configuration (
feature_store.yaml)Keycloak Setup Required
The OIDC client in Keycloak must have a Groups claim mapper configured:
Client > Mappers > Add "Group Membership" mapper > claim name:
groupsWithout this mapper, JWTs will not contain the
groupsclaim andGroupBasedPolicywill deny all access.Client Side Changes
Files:
oidc_authentication_client_manager.py,auth_model.py,repo_config.pyOidcAuthClientManager.get_token()now supports multiple token sources with a clear priority:tokenfield in config: static JWT, returned directly, no network callstoken_env_varfield: reads from the named environment variable on every call (supports token refresh). If the variable is configured but missing, raisesPermissionErrorimmediately with no fallback.client_secretfield (withauth_discovery_urlandclient_id): fetches token from the IDP via client_credentials or ROPG flowFEAST_OIDC_TOKENenvironment variable: last resort fallback for bare{type: oidc}configs/var/run/secrets/kubernetes.io/serviceaccount/token: for workbench pods running inside Kubernetestoken,token_env_var, andclient_secretare mutually exclusive (enforced by a Pydantic validator at config load time).Client Configuration Examples
Zero config (workbench pods or kube-authkit users):
Picks up
FEAST_OIDC_TOKENenv var if set, otherwise reads the mounted SA token.Named environment variable (supports external token refresh):
Static token (testing / CI):
Client credentials flow (service to service):
Operator Changes
Files:
featurestore_types.go,repo_config.go,tls.go,services_types.go,authz.goThe operator now generates separate server and client OIDC configs:
auth_discovery_url(resolved from three sources), optionalclient_id/client_secretfrom Secret,verify_ssl, andca_cert_path.{type: oidc}with an optionaltoken_env_var. Client pods never see server credentials.New CRD Fields on
OidcAuthzissuerUrl/.well-known/openid-configurationto derive the discovery endpoint.secretRefauth_discovery_url,client_id,client_secret).secretKeyNametokenEnvVarverifySSLcaCertConfigMapname,key)Discovery URL Resolution Priority
issuerUrlfield (derives discovery URL by appending/.well-known/openid-configuration)auth_discovery_urlkey (used verbatim)OIDC_ISSUER_URLenvironment variable on the operator pod (for RHOAI/ODH zero config)CA Certificate Resolution Priority
caCertConfigMapfield (user specified ConfigMap, mounted at/etc/pki/tls/oidc-ca/ca.crt)odh-trusted-ca-bundleConfigMap withodh-ca-bundle.crtkey (RHOAI/ODH clusters)RBAC
When
authz: oidcis configured, the operator provisions a ClusterRole and ClusterRoleBinding granting the Feast server SAtokenreviews/createpermission. This enables the SA token delegation path for workbench pods.Operator CR Example
Upstream (with issuerUrl):
ODH (OpenDataHub) zero config (platform injects OIDC_ISSUER_URL):
Permission Examples
Which issue(s) this PR fixes:
#6088
Misc