Skip to content

feat: Add ServiceMonitor auto-generation for Prometheus discovery#6126

Merged
ntkathole merged 2 commits intofeast-dev:masterfrom
ntkathole:service_monitor
Mar 25, 2026
Merged

feat: Add ServiceMonitor auto-generation for Prometheus discovery#6126
ntkathole merged 2 commits intofeast-dev:masterfrom
ntkathole:service_monitor

Conversation

@ntkathole
Copy link
Copy Markdown
Member

@ntkathole ntkathole commented Mar 18, 2026

What this PR does / why we need it:

This implements ServiceMonitor auto-generation for the Feast Operator, enabling automatic Prometheus discovery for Kubernetes deployments when metrics: true is set on the online store. The operator detects the monitoring.coreos.com API group at startup. If the Prometheus Operator CRD is absent, ServiceMonitor operations are skipped.

Which issue(s) this PR fixes:

Fixes #6072


Open with Devin

@ntkathole ntkathole self-assigned this Mar 18, 2026
@ntkathole ntkathole requested a review from a team as a code owner March 18, 2026 14:22
@ntkathole
Copy link
Copy Markdown
Member Author

cc @astefanutti

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 4 additional findings.

Open in Devin Review

@ntkathole ntkathole force-pushed the service_monitor branch 2 times, most recently from ab3e933 to 0671276 Compare March 18, 2026 14:51
devin-ai-integration[bot]

This comment was marked as resolved.

cr := feast.Handler.FeatureStore
objMeta := feast.GetObjectMetaType(OnlineFeastType)

return map[string]interface{}{
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use the apply config from the Prometheus operator client?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is trade-off, it will make code cleaner but our use is of just one small builder. May be we can avoid coupling (new dependencies, version alignment) to the Prometheus operator's release cycle ?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the Prometheus operator API now comes as a separate module https://github.com/prometheus-operator/prometheus-operator/tree/main/pkg/apis/monitoring to avoid coupling clients with the operator itself.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

@ntkathole ntkathole force-pushed the service_monitor branch 2 times, most recently from ae685b6 to d4b817f Compare March 25, 2026 05:01
@ntkathole ntkathole requested a review from astefanutti March 25, 2026 07:27
Copy link
Copy Markdown

@astefanutti astefanutti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @ntkathole!

@ntkathole ntkathole merged commit 56e6d21 into feast-dev:master Mar 25, 2026
27 checks passed
jyejare pushed a commit to opendatahub-io/feast that referenced this pull request Mar 26, 2026
…ast-dev#6126)

* feat: Add ServiceMonitor auto-generation for Prometheus discovery

Signed-off-by: ntkathole <[email protected]>

* fix: Server-side apply

Signed-off-by: ntkathole <[email protected]>

---------

Signed-off-by: ntkathole <[email protected]>
yuan1j pushed a commit to yuan1j/feast that referenced this pull request Apr 2, 2026
…ast-dev#6126)

* feat: Add ServiceMonitor auto-generation for Prometheus discovery

Signed-off-by: ntkathole <[email protected]>

* fix: Server-side apply

Signed-off-by: ntkathole <[email protected]>

---------

Signed-off-by: ntkathole <[email protected]>
Signed-off-by: yuanjun220 <[email protected]>
franciscojavierarceo pushed a commit that referenced this pull request Apr 7, 2026
# [0.61.0](v0.60.0...v0.61.0) (2026-04-07)

### Bug Fixes

* Add grpcio dependency group to transformation server Dockerfile ([2c2150a](2c2150a))
* Add https readiness check for rest-registry tests ([ea85e63](ea85e63))
* Add website build check for PRs and fix blog frontmatter YAML error ([#6079](#6079)) ([30a3a43](30a3a43))
* Added missing jackc/pgx/v5 entries ([94ad0e7](94ad0e7))
* Added MLflow metric charts across feature selection ([#6080](#6080)) ([a403361](a403361))
* Check duplicate names for feature view across types ([#5999](#5999)) ([95b9af8](95b9af8))
* Fix integration tests ([#6046](#6046)) ([02d5548](02d5548))
* Fix missing error handling for resource_counts endpoint ([d9706ce](d9706ce))
* Fix non-specific label selector on metrics service ([a1a160d](a1a160d))
* fix path feature_definitions.py ([7d7df68](7d7df68))
* Fix regstry Rest API tests intermittent failure ([d53a339](d53a339))
* Fixed IntegrityError on SqlRegistry ([#6047](#6047)) ([325e148](325e148))
* Fixed intermittent failures in get_historical_features ([c335ec7](c335ec7))
* Fixed pre-commit check ([114b7db](114b7db))
* Fixed the intermittent FeatureViewNotFoundException ([661ecc7](661ecc7))
* Fixed uv cache permission error for docker build on mac ([ad807be](ad807be))
* Fixes a `PydanticDeprecatedSince20` warning for trino_offline_store ([#5991](#5991)) ([abfd18a](abfd18a))
* Handle existing RBAC role gracefully in namespace registry ([b46a62b](b46a62b))
* Ignore ipynb files during apply ([#6151](#6151)) ([4ea123d](4ea123d))
* Integration test failures ([#6040](#6040)) ([9165870](9165870))
* Mount TLS volumes for init container ([080a9b5](080a9b5))
* **postgres:** Use end_date in synthetic entity_df for non-entity retrieval ([#6110](#6110)) ([088a802](088a802)), closes [#6066](#6066)
* Ray offline store tests are duplicated across 3 workflows ([54f705a](54f705a))
* Reenable tests ([#6036](#6036)) ([82ee7f8](82ee7f8))
* SSL/TLS mode by default for postgres connection ([4844488](4844488))
* Use commitlint pre-commit hook instead of a separate action ([35a81e7](35a81e7))

### Features

* Add Claude Code agent skills for Feast ([#6081](#6081)) ([1e5b60f](1e5b60f)), closes [#5976](#5976) [#6007](#6007)
* Add complex type support (Map, JSON, Struct) with schema validation ([#5974](#5974)) ([1200dbf](1200dbf))
* Add decimal to supported feature types ([#6029](#6029)) ([#6226](#6226)) ([cff6fbf](cff6fbf))
* Add feast apply init container to automate registry population on pod start ([#6106](#6106)) ([6b31a43](6b31a43))
* Add feature view versioning support to PostgreSQL and MySQL online stores ([#6193](#6193)) ([940e0f0](940e0f0)), closes [#6168](#6168) [#6169](#6169) [#2728](#2728)
* Add materialization, feature freshness, request latency, and push metrics to feature server ([2c6be18](2c6be18))
* Add metadata statistics to registry api ([ef1d4fc](ef1d4fc))
* Add non-entity retrieval support for ClickHouse offline store ([4d08ddc](4d08ddc)), closes [#5835](#5835)
* Add OnlineStore for MongoDB ([#6025](#6025)) ([bf4e3fa](bf4e3fa)), closes [golang/go#74462](golang/go#74462)
* Add Oracle DB as Offline store in python sdk & operator ([#6017](#6017)) ([9d35368](9d35368))
* Add RBAC aggregation labels to FeatureStore ClusterRoles ([daf77c6](daf77c6))
* Add ServiceMonitor auto-generation for Prometheus discovery ([#6126](#6126)) ([56e6d21](56e6d21))
* Add typed_features field to grpc write request (([#6117](#6117)) ([#6118](#6118)) ([eeaa6db](eeaa6db)), closes [#6116](#6116)
* Add UUID and TIME_UUID as feature types ([#5885](#5885)) ([#5951](#5951)) ([5d6e311](5d6e311))
* Add version indicators to lineage graph nodes ([#6187](#6187)) ([73805d3](73805d3))
* Add version tracking to FeatureView ([#6101](#6101)) ([ed4a4f2](ed4a4f2))
* Added Agent skills for AI Agents ([#6007](#6007)) ([99008c8](99008c8))
* Added CodeQL SAST scanning and detect-secrets pre-commit hook ([547b516](547b516))
* Added odfv transformations metrics ([8b5a526](8b5a526))
* Adding optional name to Aggregation (feast-dev[#5994](#5994)) ([#6083](#6083)) ([56469f7](56469f7))
* Created DocEmbedder class ([#5973](#5973)) ([0719c06](0719c06))
* Extended OIDC support to extract groups & namespaces and token injection with multiple methods ([#6089](#6089)) ([7c04026](7c04026))
* Feature Server High-Availability on Kubernetes ([#6028](#6028)) ([9c07b4c](9c07b4c)), closes [Hi#Availability](https://github.com/Hi/issues/Availability) [Hi#Availability](https://github.com/Hi/issues/Availability)
* **go:** Implement metrics and tracing for http and grpc servers ([#5925](#5925)) ([2b4ec9a](2b4ec9a))
* Horizontal scaling support to the Feast operator ([#6000](#6000)) ([3ec13e6](3ec13e6))
* Making feature view source optional (feast-dev[#6074](#6074)) ([#6075](#6075)) ([76917b7](76917b7))
* Replace ORJSONResponse with Pydantic response models for faster JSON serialization ([65cf03c](65cf03c))
* Support arm docker build ([#6061](#6061)) ([1e1f5d9](1e1f5d9))
* Support distinct count aggregation [[#6116](#6116)] ([3639570](3639570))
* Support HTTP in MCP ([#6109](#6109)) ([e72b983](e72b983))
* Support nested collection types (Array/Set of Array/Set) ([#5947](#5947)) ([#6132](#6132)) ([ab61642](ab61642))
* Support podAnnotations on Deployment pod template ([1b3cdc1](1b3cdc1))
* Use orjson for faster JSON serialization in feature server ([6f5203a](6f5203a))
* Utilize date partition column in BigQuery ([#6076](#6076)) ([4ea9b32](4ea9b32))

### Performance Improvements

* Online feature response construction in a single pass over read rows ([113fb04](113fb04))
* Optimize protobuf parsing in Redis online store ([#6023](#6023)) ([59dfdb8](59dfdb8))
* Optimize timestamp conversion in _convert_rows_to_protobuf ([33a2e95](33a2e95))
* Parallelize DynamoDB batch reads in sync online_read ([#6024](#6024)) ([9699944](9699944))
* Remove redundant entity key serialization in online_read ([d87283f](d87283f))
franciscojavierarceo pushed a commit that referenced this pull request Apr 8, 2026
# [0.62.0](v0.61.0...v0.62.0) (2026-04-08)

### Bug Fixes

* Added missing jackc/pgx/v5 entries ([94ad0e7](94ad0e7))
* Fix missing error handling for resource_counts endpoint ([d9706ce](d9706ce))
* fix path feature_definitions.py ([7d7df68](7d7df68))
* Fix regstry Rest API tests intermittent failure ([d53a339](d53a339))
* Fixed intermittent failures in get_historical_features ([c335ec7](c335ec7))
* Fixed the intermittent FeatureViewNotFoundException ([661ecc7](661ecc7))
* Handle existing RBAC role gracefully in namespace registry ([b46a62b](b46a62b))
* Ignore ipynb files during apply ([#6151](#6151)) ([4ea123d](4ea123d))
* Mount TLS volumes for init container ([080a9b5](080a9b5))
* **postgres:** Use end_date in synthetic entity_df for non-entity retrieval ([#6110](#6110)) ([088a802](088a802)), closes [#6066](#6066)
* SSL/TLS mode by default for postgres connection ([4844488](4844488))
* Sync v0.61-branch so v0.61.0 tag is reachable from master ([af66878](af66878))

### Features

* Add Claude Code agent skills for Feast ([#6081](#6081)) ([1e5b60f](1e5b60f)), closes [#5976](#5976) [#6007](#6007)
* Add decimal to supported feature types ([#6029](#6029)) ([#6226](#6226)) ([cff6fbf](cff6fbf))
* Add feast apply init container to automate registry population on pod start ([#6106](#6106)) ([6b31a43](6b31a43))
* Add feature view versioning support to PostgreSQL and MySQL online stores ([#6193](#6193)) ([940e0f0](940e0f0)), closes [#6168](#6168) [#6169](#6169) [#2728](#2728)
* Add metadata statistics to registry api ([ef1d4fc](ef1d4fc))
* Add Oracle DB as Offline store in python sdk & operator ([#6017](#6017)) ([9d35368](9d35368))
* Add RBAC aggregation labels to FeatureStore ClusterRoles ([daf77c6](daf77c6))
* Add ServiceMonitor auto-generation for Prometheus discovery ([#6126](#6126)) ([56e6d21](56e6d21))
* Add typed_features field to grpc write request (([#6117](#6117)) ([#6118](#6118)) ([eeaa6db](eeaa6db)), closes [#6116](#6116)
* Add UUID and TIME_UUID as feature types ([#5885](#5885)) ([#5951](#5951)) ([5d6e311](5d6e311))
* Add version indicators to lineage graph nodes ([#6187](#6187)) ([73805d3](73805d3))
* Add version tracking to FeatureView ([#6101](#6101)) ([ed4a4f2](ed4a4f2))
* Added Agent skills for AI Agents ([#6007](#6007)) ([99008c8](99008c8))
* Added odfv transformations metrics ([8b5a526](8b5a526))
* Created DocEmbedder class ([#5973](#5973)) ([0719c06](0719c06))
* Extended OIDC support to extract groups & namespaces and token injection with multiple methods ([#6089](#6089)) ([7c04026](7c04026))
* Replace ORJSONResponse with Pydantic response models for faster JSON serialization ([65cf03c](65cf03c))
* Support distinct count aggregation [[#6116](#6116)] ([3639570](3639570))
* Support HTTP in MCP ([#6109](#6109)) ([e72b983](e72b983))
* Support nested collection types (Array/Set of Array/Set) ([#5947](#5947)) ([#6132](#6132)) ([ab61642](ab61642))
* Support podAnnotations on Deployment pod template ([1b3cdc1](1b3cdc1))
* Utilize date partition column in BigQuery ([#6076](#6076)) ([4ea9b32](4ea9b32))

### Performance Improvements

* Online feature response construction in a single pass over read rows ([113fb04](113fb04))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Operator should create a ServiceMonitor for FeatureStore metrics when metrics: true is enabled

3 participants