Skip to content

feat(go): Implement metrics and tracing for http and grpc servers#5925

Merged
shuchu merged 7 commits intofeast-dev:masterfrom
luisazofracabify:feat/go-server-observability
Feb 19, 2026
Merged

feat(go): Implement metrics and tracing for http and grpc servers#5925
shuchu merged 7 commits intofeast-dev:masterfrom
luisazofracabify:feat/go-server-observability

Conversation

@luisazofracabify
Copy link
Contributor

@luisazofracabify luisazofracabify commented Jan 30, 2026

Description

This PR significantly improves the observability and reliability of the Feast Go Feature Server by implementing comprehensive Prometheus metrics, and robust configuration options for both HTTP and gRPC modes. It addresses issues with hardcoded ports, inconsistent metric exposure, and potential race conditions during shutdown.

Changes

1. Prometheus Metrics Instrumentation

  • Unified Metrics Server: Implemented a dedicated, configurable HTTP server (default port :9090) for serving metrics in both HTTP and gRPC modes. This unifies the observability strategy.
  • HTTP Server Instrumentation: Added metricsMiddleware in internal/feast/server/http_server.go to track:
    • Request duration (http_request_duration_seconds)
    • Request counts by method, path, and status (http_requests_total)
    • Instrumented the /health endpoint for readiness probe visibility.
  • gRPC Metrics: Integrated go-grpc-prometheus to fully expose standard gRPC server metrics.

2. Tracing Enhancements

  • Dynamic Service Name: Improved the existing OpenTelemetry integration by adding support for the OTEL_SERVICE_NAME environment variable (defaults to FeastGoFeatureServer). This allows proper service identification in distributed tracing systems without code changes.

3. Server Configuration & Reliability

  • Configurable Metrics Port: Added -metrics-port flag to main.go.
  • Graceful Shutdown: Implemented sync.WaitGroup in StartHttpServer to ensure clean shutdown of the metrics server and logging services, matching the robust behavior of StartGrpcServer.
  • Code Cleanup: Refactored ServerStarter interfaces and removed unused legacy code.

How Has This Been Tested?

Verification

  • Confirmed /metrics endpoint accessibility on custom ports.
  • Verified /health and request metrics counters.
  • Validated status 200 recording fix.
  • Tested OTEL_SERVICE_NAME env var reflects in traces.
  • Verified clean shutdown logs (SIGINT/SIGTERM).

Checklist

  • My code follows the code style of this project.
  • I have added tests that prove my fix is effective or that my feature works.
  • I have updated the dependencies (go.mod / go.sum).
  • All new and existing tests passed.

Why gotoprom

It reduces boilerplate code for histograms and ensures type-safety for labels, preventing runtime panics due to mismatched label cardinality. It wraps the official prometheus client, so it's fully compatible


Open with Devin

@luisazofracabify luisazofracabify requested a review from a team as a code owner January 30, 2026 11:53
Copy link
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 6 additional flags.

Open in Devin Review

@luisazofracabify luisazofracabify force-pushed the feat/go-server-observability branch from 82aca5b to b876dba Compare January 30, 2026 11:58
devin-ai-integration[bot]

This comment was marked as resolved.

@luisazofracabify luisazofracabify force-pushed the feat/go-server-observability branch from b876dba to 8b5cd4b Compare January 30, 2026 12:30
devin-ai-integration[bot]

This comment was marked as resolved.

@luisazofracabify luisazofracabify force-pushed the feat/go-server-observability branch from 8b5cd4b to 847393e Compare January 30, 2026 12:39
@luisazofracabify luisazofracabify changed the title feat(go): implement metrics and tracing for http and grpc servers feat(go): Implement metrics and tracing for http and grpc servers Feb 2, 2026
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would remove this file. Such a test does not actually test that metrics actually record values.

@luisazofracabify luisazofracabify force-pushed the feat/go-server-observability branch 2 times, most recently from 655cd02 to 34de413 Compare February 6, 2026 09:01
devin-ai-integration[bot]

This comment was marked as resolved.

@luisazofracabify luisazofracabify force-pushed the feat/go-server-observability branch from e7a22fc to 2e4c1b3 Compare February 9, 2026 08:21
devin-ai-integration[bot]

This comment was marked as resolved.

@luisazofracabify luisazofracabify force-pushed the feat/go-server-observability branch from a5345c2 to f5449af Compare February 9, 2026 08:37
@luisazofracabify luisazofracabify force-pushed the feat/go-server-observability branch 2 times, most recently from db00427 to b9f99a6 Compare February 11, 2026 08:30
@luisazofracabify luisazofracabify force-pushed the feat/go-server-observability branch from b9f99a6 to 7b552a9 Compare February 11, 2026 12:16
@luisazofracabify luisazofracabify force-pushed the feat/go-server-observability branch from 7b552a9 to 1bf7277 Compare February 11, 2026 12:21
@luisazofracabify luisazofracabify force-pushed the feat/go-server-observability branch from 5d6cc58 to 816423b Compare February 11, 2026 13:27
devin-ai-integration[bot]

This comment was marked as resolved.

Copy link
Collaborator

@shuchu shuchu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@shuchu shuchu merged commit 2b4ec9a into feast-dev:master Feb 19, 2026
23 checks passed
@PepeluDev
Copy link
Contributor

@shuchu It seems like this PR removed some packages from the go.sum file, sorry about that. They should be added back.

franciscojavierarceo pushed a commit that referenced this pull request Mar 10, 2026
# [0.61.0](v0.60.0...v0.61.0) (2026-03-10)

### Bug Fixes

* Add grpcio dependency group to transformation server Dockerfile ([2c2150a](2c2150a))
* Add https readiness check for rest-registry tests ([ea85e63](ea85e63))
* Add website build check for PRs and fix blog frontmatter YAML error ([#6079](#6079)) ([30a3a43](30a3a43))
* Added MLflow metric charts across feature selection ([#6080](#6080)) ([a403361](a403361))
* Check duplicate names for feature view across types ([#5999](#5999)) ([95b9af8](95b9af8))
* Fix integration tests ([#6046](#6046)) ([02d5548](02d5548))
* Fix non-specific label selector on metrics service ([a1a160d](a1a160d))
* Fixed IntegrityError on SqlRegistry ([#6047](#6047)) ([325e148](325e148))
* Fixed pre-commit check ([114b7db](114b7db))
* Fixed uv cache permission error for docker build on mac ([ad807be](ad807be))
* Fixes a `PydanticDeprecatedSince20` warning for trino_offline_store ([#5991](#5991)) ([abfd18a](abfd18a))
* Integration test failures ([#6040](#6040)) ([9165870](9165870))
* Ray offline store tests are duplicated across 3 workflows ([54f705a](54f705a))
* Reenable tests ([#6036](#6036)) ([82ee7f8](82ee7f8))
* Use commitlint pre-commit hook instead of a separate action ([35a81e7](35a81e7))

### Features

* Add complex type support (Map, JSON, Struct) with schema validation ([#5974](#5974)) ([1200dbf](1200dbf))
* Add materialization, feature freshness, request latency, and push metrics to feature server ([2c6be18](2c6be18))
* Add non-entity retrieval support for ClickHouse offline store ([4d08ddc](4d08ddc)), closes [#5835](#5835)
* Add OnlineStore for MongoDB ([#6025](#6025)) ([bf4e3fa](bf4e3fa)), closes [golang/go#74462](golang/go#74462)
* Added CodeQL SAST scanning and detect-secrets pre-commit hook ([547b516](547b516))
* Adding optional name to Aggregation (feast-dev[#5994](#5994)) ([#6083](#6083)) ([56469f7](56469f7))
* Feature Server High-Availability on Kubernetes ([#6028](#6028)) ([9c07b4c](9c07b4c)), closes [Hi#Availability](https://github.com/Hi/issues/Availability) [Hi#Availability](https://github.com/Hi/issues/Availability)
* **go:** Implement metrics and tracing for http and grpc servers ([#5925](#5925)) ([2b4ec9a](2b4ec9a))
* Horizontal scaling support to the Feast operator ([#6000](#6000)) ([3ec13e6](3ec13e6))
* Making feature view source optional (feast-dev[#6074](#6074)) ([#6075](#6075)) ([76917b7](76917b7))
* Support arm docker build ([#6061](#6061)) ([1e1f5d9](1e1f5d9))
* Use orjson for faster JSON serialization in feature server ([6f5203a](6f5203a))

### Performance Improvements

* Optimize protobuf parsing in Redis online store ([#6023](#6023)) ([59dfdb8](59dfdb8))
* Optimize timestamp conversion in _convert_rows_to_protobuf ([33a2e95](33a2e95))
* Parallelize DynamoDB batch reads in sync online_read ([#6024](#6024)) ([9699944](9699944))
* Remove redundant entity key serialization in online_read ([d87283f](d87283f))
ntkathole pushed a commit to red-hat-data-services/feast that referenced this pull request Mar 16, 2026
…ast-dev#5925)

* feat(go): implement metrics and tracing for http and grpc servers

Signed-off-by: Luis Azofra Begara <[email protected]>

* fix(server): improve metrics, config, and shutdown logic

Signed-off-by: Luis Azofra Begara <[email protected]>

* chore: update go.sum after rebase

Signed-off-by: Luis Azofra Begara <[email protected]>

* docs: improve README instructions for metrics and tracing

Signed-off-by: Luis Azofra Begara <[email protected]>

* fix(server): resolve potential deadlock during shutdown

Signed-off-by: Luis Azofra Begara <[email protected]>

---------

Signed-off-by: Luis Azofra Begara <[email protected]>
ntkathole pushed a commit to red-hat-data-services/feast that referenced this pull request Mar 16, 2026
# [0.61.0](feast-dev/feast@v0.60.0...v0.61.0) (2026-03-10)

### Bug Fixes

* Add grpcio dependency group to transformation server Dockerfile ([2c2150a](feast-dev@2c2150a))
* Add https readiness check for rest-registry tests ([ea85e63](feast-dev@ea85e63))
* Add website build check for PRs and fix blog frontmatter YAML error ([feast-dev#6079](feast-dev#6079)) ([30a3a43](feast-dev@30a3a43))
* Added MLflow metric charts across feature selection ([feast-dev#6080](feast-dev#6080)) ([a403361](feast-dev@a403361))
* Check duplicate names for feature view across types ([feast-dev#5999](feast-dev#5999)) ([95b9af8](feast-dev@95b9af8))
* Fix integration tests ([feast-dev#6046](feast-dev#6046)) ([02d5548](feast-dev@02d5548))
* Fix non-specific label selector on metrics service ([a1a160d](feast-dev@a1a160d))
* Fixed IntegrityError on SqlRegistry ([feast-dev#6047](feast-dev#6047)) ([325e148](feast-dev@325e148))
* Fixed pre-commit check ([114b7db](feast-dev@114b7db))
* Fixed uv cache permission error for docker build on mac ([ad807be](feast-dev@ad807be))
* Fixes a `PydanticDeprecatedSince20` warning for trino_offline_store ([feast-dev#5991](feast-dev#5991)) ([abfd18a](feast-dev@abfd18a))
* Integration test failures ([feast-dev#6040](feast-dev#6040)) ([9165870](feast-dev@9165870))
* Ray offline store tests are duplicated across 3 workflows ([54f705a](feast-dev@54f705a))
* Reenable tests ([feast-dev#6036](feast-dev#6036)) ([82ee7f8](feast-dev@82ee7f8))
* Use commitlint pre-commit hook instead of a separate action ([35a81e7](feast-dev@35a81e7))

### Features

* Add complex type support (Map, JSON, Struct) with schema validation ([feast-dev#5974](feast-dev#5974)) ([1200dbf](feast-dev@1200dbf))
* Add materialization, feature freshness, request latency, and push metrics to feature server ([2c6be18](feast-dev@2c6be18))
* Add non-entity retrieval support for ClickHouse offline store ([4d08ddc](feast-dev@4d08ddc)), closes [feast-dev#5835](feast-dev#5835)
* Add OnlineStore for MongoDB ([feast-dev#6025](feast-dev#6025)) ([bf4e3fa](feast-dev@bf4e3fa)), closes [golang/go#74462](golang/go#74462)
* Added CodeQL SAST scanning and detect-secrets pre-commit hook ([547b516](feast-dev@547b516))
* Adding optional name to Aggregation (feast-dev[feast-dev#5994](feast-dev#5994)) ([feast-dev#6083](feast-dev#6083)) ([56469f7](feast-dev@56469f7))
* Feature Server High-Availability on Kubernetes ([feast-dev#6028](feast-dev#6028)) ([9c07b4c](feast-dev@9c07b4c)), closes [Hi#Availability](https://github.com/Hi/issues/Availability) [Hi#Availability](https://github.com/Hi/issues/Availability)
* **go:** Implement metrics and tracing for http and grpc servers ([feast-dev#5925](feast-dev#5925)) ([2b4ec9a](feast-dev@2b4ec9a))
* Horizontal scaling support to the Feast operator ([feast-dev#6000](feast-dev#6000)) ([3ec13e6](feast-dev@3ec13e6))
* Making feature view source optional (feast-dev[feast-dev#6074](feast-dev#6074)) ([feast-dev#6075](feast-dev#6075)) ([76917b7](feast-dev@76917b7))
* Support arm docker build ([feast-dev#6061](feast-dev#6061)) ([1e1f5d9](feast-dev@1e1f5d9))
* Use orjson for faster JSON serialization in feature server ([6f5203a](feast-dev@6f5203a))

### Performance Improvements

* Optimize protobuf parsing in Redis online store ([feast-dev#6023](feast-dev#6023)) ([59dfdb8](feast-dev@59dfdb8))
* Optimize timestamp conversion in _convert_rows_to_protobuf ([33a2e95](feast-dev@33a2e95))
* Parallelize DynamoDB batch reads in sync online_read ([feast-dev#6024](feast-dev#6024)) ([9699944](feast-dev@9699944))
* Remove redundant entity key serialization in online_read ([d87283f](feast-dev@d87283f))
ntkathole pushed a commit to red-hat-data-services/feast that referenced this pull request Mar 16, 2026
…ast-dev#5925)

* feat(go): implement metrics and tracing for http and grpc servers

Signed-off-by: Luis Azofra Begara <[email protected]>

* fix(server): improve metrics, config, and shutdown logic

Signed-off-by: Luis Azofra Begara <[email protected]>

* chore: update go.sum after rebase

Signed-off-by: Luis Azofra Begara <[email protected]>

* docs: improve README instructions for metrics and tracing

Signed-off-by: Luis Azofra Begara <[email protected]>

* fix(server): resolve potential deadlock during shutdown

Signed-off-by: Luis Azofra Begara <[email protected]>

---------

Signed-off-by: Luis Azofra Begara <[email protected]>
ntkathole pushed a commit to red-hat-data-services/feast that referenced this pull request Mar 16, 2026
# [0.61.0](feast-dev/feast@v0.60.0...v0.61.0) (2026-03-10)

### Bug Fixes

* Add grpcio dependency group to transformation server Dockerfile ([2c2150a](feast-dev@2c2150a))
* Add https readiness check for rest-registry tests ([ea85e63](feast-dev@ea85e63))
* Add website build check for PRs and fix blog frontmatter YAML error ([feast-dev#6079](feast-dev#6079)) ([30a3a43](feast-dev@30a3a43))
* Added MLflow metric charts across feature selection ([feast-dev#6080](feast-dev#6080)) ([a403361](feast-dev@a403361))
* Check duplicate names for feature view across types ([feast-dev#5999](feast-dev#5999)) ([95b9af8](feast-dev@95b9af8))
* Fix integration tests ([feast-dev#6046](feast-dev#6046)) ([02d5548](feast-dev@02d5548))
* Fix non-specific label selector on metrics service ([a1a160d](feast-dev@a1a160d))
* Fixed IntegrityError on SqlRegistry ([feast-dev#6047](feast-dev#6047)) ([325e148](feast-dev@325e148))
* Fixed pre-commit check ([114b7db](feast-dev@114b7db))
* Fixed uv cache permission error for docker build on mac ([ad807be](feast-dev@ad807be))
* Fixes a `PydanticDeprecatedSince20` warning for trino_offline_store ([feast-dev#5991](feast-dev#5991)) ([abfd18a](feast-dev@abfd18a))
* Integration test failures ([feast-dev#6040](feast-dev#6040)) ([9165870](feast-dev@9165870))
* Ray offline store tests are duplicated across 3 workflows ([54f705a](feast-dev@54f705a))
* Reenable tests ([feast-dev#6036](feast-dev#6036)) ([82ee7f8](feast-dev@82ee7f8))
* Use commitlint pre-commit hook instead of a separate action ([35a81e7](feast-dev@35a81e7))

### Features

* Add complex type support (Map, JSON, Struct) with schema validation ([feast-dev#5974](feast-dev#5974)) ([1200dbf](feast-dev@1200dbf))
* Add materialization, feature freshness, request latency, and push metrics to feature server ([2c6be18](feast-dev@2c6be18))
* Add non-entity retrieval support for ClickHouse offline store ([4d08ddc](feast-dev@4d08ddc)), closes [feast-dev#5835](feast-dev#5835)
* Add OnlineStore for MongoDB ([feast-dev#6025](feast-dev#6025)) ([bf4e3fa](feast-dev@bf4e3fa)), closes [golang/go#74462](golang/go#74462)
* Added CodeQL SAST scanning and detect-secrets pre-commit hook ([547b516](feast-dev@547b516))
* Adding optional name to Aggregation (feast-dev[feast-dev#5994](feast-dev#5994)) ([feast-dev#6083](feast-dev#6083)) ([56469f7](feast-dev@56469f7))
* Feature Server High-Availability on Kubernetes ([feast-dev#6028](feast-dev#6028)) ([9c07b4c](feast-dev@9c07b4c)), closes [Hi#Availability](https://github.com/Hi/issues/Availability) [Hi#Availability](https://github.com/Hi/issues/Availability)
* **go:** Implement metrics and tracing for http and grpc servers ([feast-dev#5925](feast-dev#5925)) ([2b4ec9a](feast-dev@2b4ec9a))
* Horizontal scaling support to the Feast operator ([feast-dev#6000](feast-dev#6000)) ([3ec13e6](feast-dev@3ec13e6))
* Making feature view source optional (feast-dev[feast-dev#6074](feast-dev#6074)) ([feast-dev#6075](feast-dev#6075)) ([76917b7](feast-dev@76917b7))
* Support arm docker build ([feast-dev#6061](feast-dev#6061)) ([1e1f5d9](feast-dev@1e1f5d9))
* Use orjson for faster JSON serialization in feature server ([6f5203a](feast-dev@6f5203a))

### Performance Improvements

* Optimize protobuf parsing in Redis online store ([feast-dev#6023](feast-dev#6023)) ([59dfdb8](feast-dev@59dfdb8))
* Optimize timestamp conversion in _convert_rows_to_protobuf ([33a2e95](feast-dev@33a2e95))
* Parallelize DynamoDB batch reads in sync online_read ([feast-dev#6024](feast-dev#6024)) ([9699944](feast-dev@9699944))
* Remove redundant entity key serialization in online_read ([d87283f](feast-dev@d87283f))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants