Fix Task Completion Time Without Backpressure Grafana panel metric name#60481
Conversation
There was a problem hiding this comment.
Code Review
This pull request correctly fixes a broken Grafana panel by updating the metric name for 'Task Completion Time Without Backpressure'. The change replaces the old metric name ray_data_task_completion_time_without_backpressure with the new one, ray_data_task_completion_time_excl_backpressure_s, which aligns with the recent metric renaming in op_runtime_metrics.py. The fix is accurate and addresses the issue described. The code change is clean and no further issues were found.
Update the panel's Prometheus expr to use ray_data_task_completion_time_excl_backpressure_s instead of ray_data_task_completion_time_without_backpressure. The metric was renamed in ray-project#57788 (op_runtime_metrics) but the data_dashboard_panels.py panel was not updated, causing the chart to show no data. Signed-off-by: kriyanshii <[email protected]>
99dd985 to
9d5a78a
Compare
|
@kriyanshii did you get a chance to build the dashboard locally and verify if the chart displays correctly? |
Yes. The metrics are there after the code change.
|
|
Nice. Just enabled merge |
…tric name (ray-project#60481) ## Description Fixes the broken **"Task Completion Time Without Backpressure"** metrics chart in the Ray Data Grafana dashboard. The panel was querying `ray_data_task_completion_time_without_backpressure`, which no longer exists. PR ray-project#57788 renamed the underlying metric to `task_completion_time_excl_backpressure_s` in `op_runtime_metrics.py`, but the Grafana panel in `data_dashboard_panels.py` was not updated. This PR updates the panel's Prometheus `expr` to use `ray_data_task_completion_time_excl_backpressure_s` so the chart displays data again. **Change:** Single-line fix in `data_dashboard_panels.py` — replace the old metric name with the correct one in the panel's `expr`. The formula (average task completion time excluding backpressure over a 5-minute window) is unchanged. ## Related issues Fixes the regression from ray-project#57788 (metric rename). Related to Ray Data monitoring / dashboard. Closes: ray-project#60163 ## Additional information - **Metric flow:** `op_runtime_metrics.task_completion_time_excl_backpressure_s` → Stats uses `data_{name}` → Metrics agent adds `ray_` namespace → **`ray_data_task_completion_time_excl_backpressure_s`** - **Manual verification:** Run a Ray Data job with Grafana + Prometheus (see [cluster metrics](https://docs.ray.io/en/latest/cluster/metrics.html)), then confirm the "Task Completion Time Without Backpressure" panel shows data. Signed-off-by: kriyanshii <[email protected]> Signed-off-by: jinbum-kim <[email protected]>
…tric name (ray-project#60481) ## Description Fixes the broken **"Task Completion Time Without Backpressure"** metrics chart in the Ray Data Grafana dashboard. The panel was querying `ray_data_task_completion_time_without_backpressure`, which no longer exists. PR ray-project#57788 renamed the underlying metric to `task_completion_time_excl_backpressure_s` in `op_runtime_metrics.py`, but the Grafana panel in `data_dashboard_panels.py` was not updated. This PR updates the panel's Prometheus `expr` to use `ray_data_task_completion_time_excl_backpressure_s` so the chart displays data again. **Change:** Single-line fix in `data_dashboard_panels.py` — replace the old metric name with the correct one in the panel's `expr`. The formula (average task completion time excluding backpressure over a 5-minute window) is unchanged. ## Related issues Fixes the regression from ray-project#57788 (metric rename). Related to Ray Data monitoring / dashboard. Closes: ray-project#60163 ## Additional information - **Metric flow:** `op_runtime_metrics.task_completion_time_excl_backpressure_s` → Stats uses `data_{name}` → Metrics agent adds `ray_` namespace → **`ray_data_task_completion_time_excl_backpressure_s`** - **Manual verification:** Run a Ray Data job with Grafana + Prometheus (see [cluster metrics](https://docs.ray.io/en/latest/cluster/metrics.html)), then confirm the "Task Completion Time Without Backpressure" panel shows data. Signed-off-by: kriyanshii <[email protected]>
…tric name (ray-project#60481) ## Description Fixes the broken **"Task Completion Time Without Backpressure"** metrics chart in the Ray Data Grafana dashboard. The panel was querying `ray_data_task_completion_time_without_backpressure`, which no longer exists. PR ray-project#57788 renamed the underlying metric to `task_completion_time_excl_backpressure_s` in `op_runtime_metrics.py`, but the Grafana panel in `data_dashboard_panels.py` was not updated. This PR updates the panel's Prometheus `expr` to use `ray_data_task_completion_time_excl_backpressure_s` so the chart displays data again. **Change:** Single-line fix in `data_dashboard_panels.py` — replace the old metric name with the correct one in the panel's `expr`. The formula (average task completion time excluding backpressure over a 5-minute window) is unchanged. ## Related issues Fixes the regression from ray-project#57788 (metric rename). Related to Ray Data monitoring / dashboard. Closes: ray-project#60163 ## Additional information - **Metric flow:** `op_runtime_metrics.task_completion_time_excl_backpressure_s` → Stats uses `data_{name}` → Metrics agent adds `ray_` namespace → **`ray_data_task_completion_time_excl_backpressure_s`** - **Manual verification:** Run a Ray Data job with Grafana + Prometheus (see [cluster metrics](https://docs.ray.io/en/latest/cluster/metrics.html)), then confirm the "Task Completion Time Without Backpressure" panel shows data. Signed-off-by: kriyanshii <[email protected]> Signed-off-by: 400Ping <[email protected]>
…tric name (ray-project#60481) ## Description Fixes the broken **"Task Completion Time Without Backpressure"** metrics chart in the Ray Data Grafana dashboard. The panel was querying `ray_data_task_completion_time_without_backpressure`, which no longer exists. PR ray-project#57788 renamed the underlying metric to `task_completion_time_excl_backpressure_s` in `op_runtime_metrics.py`, but the Grafana panel in `data_dashboard_panels.py` was not updated. This PR updates the panel's Prometheus `expr` to use `ray_data_task_completion_time_excl_backpressure_s` so the chart displays data again. **Change:** Single-line fix in `data_dashboard_panels.py` — replace the old metric name with the correct one in the panel's `expr`. The formula (average task completion time excluding backpressure over a 5-minute window) is unchanged. ## Related issues Fixes the regression from ray-project#57788 (metric rename). Related to Ray Data monitoring / dashboard. Closes: ray-project#60163 ## Additional information - **Metric flow:** `op_runtime_metrics.task_completion_time_excl_backpressure_s` → Stats uses `data_{name}` → Metrics agent adds `ray_` namespace → **`ray_data_task_completion_time_excl_backpressure_s`** - **Manual verification:** Run a Ray Data job with Grafana + Prometheus (see [cluster metrics](https://docs.ray.io/en/latest/cluster/metrics.html)), then confirm the "Task Completion Time Without Backpressure" panel shows data. Signed-off-by: kriyanshii <[email protected]> Signed-off-by: Adel Nour <[email protected]>
…tric name (ray-project#60481) ## Description Fixes the broken **"Task Completion Time Without Backpressure"** metrics chart in the Ray Data Grafana dashboard. The panel was querying `ray_data_task_completion_time_without_backpressure`, which no longer exists. PR ray-project#57788 renamed the underlying metric to `task_completion_time_excl_backpressure_s` in `op_runtime_metrics.py`, but the Grafana panel in `data_dashboard_panels.py` was not updated. This PR updates the panel's Prometheus `expr` to use `ray_data_task_completion_time_excl_backpressure_s` so the chart displays data again. **Change:** Single-line fix in `data_dashboard_panels.py` — replace the old metric name with the correct one in the panel's `expr`. The formula (average task completion time excluding backpressure over a 5-minute window) is unchanged. ## Related issues Fixes the regression from ray-project#57788 (metric rename). Related to Ray Data monitoring / dashboard. Closes: ray-project#60163 ## Additional information - **Metric flow:** `op_runtime_metrics.task_completion_time_excl_backpressure_s` → Stats uses `data_{name}` → Metrics agent adds `ray_` namespace → **`ray_data_task_completion_time_excl_backpressure_s`** - **Manual verification:** Run a Ray Data job with Grafana + Prometheus (see [cluster metrics](https://docs.ray.io/en/latest/cluster/metrics.html)), then confirm the "Task Completion Time Without Backpressure" panel shows data. Signed-off-by: kriyanshii <[email protected]> Signed-off-by: peterxcli <[email protected]>
…tric name (ray-project#60481) ## Description Fixes the broken **"Task Completion Time Without Backpressure"** metrics chart in the Ray Data Grafana dashboard. The panel was querying `ray_data_task_completion_time_without_backpressure`, which no longer exists. PR ray-project#57788 renamed the underlying metric to `task_completion_time_excl_backpressure_s` in `op_runtime_metrics.py`, but the Grafana panel in `data_dashboard_panels.py` was not updated. This PR updates the panel's Prometheus `expr` to use `ray_data_task_completion_time_excl_backpressure_s` so the chart displays data again. **Change:** Single-line fix in `data_dashboard_panels.py` — replace the old metric name with the correct one in the panel's `expr`. The formula (average task completion time excluding backpressure over a 5-minute window) is unchanged. ## Related issues Fixes the regression from ray-project#57788 (metric rename). Related to Ray Data monitoring / dashboard. Closes: ray-project#60163 ## Additional information - **Metric flow:** `op_runtime_metrics.task_completion_time_excl_backpressure_s` → Stats uses `data_{name}` → Metrics agent adds `ray_` namespace → **`ray_data_task_completion_time_excl_backpressure_s`** - **Manual verification:** Run a Ray Data job with Grafana + Prometheus (see [cluster metrics](https://docs.ray.io/en/latest/cluster/metrics.html)), then confirm the "Task Completion Time Without Backpressure" panel shows data. Signed-off-by: kriyanshii <[email protected]> Signed-off-by: peterxcli <[email protected]>

Description
Fixes the broken "Task Completion Time Without Backpressure" metrics chart in the Ray Data Grafana dashboard.
The panel was querying
ray_data_task_completion_time_without_backpressure, which no longer exists. PR #57788 renamed the underlying metric totask_completion_time_excl_backpressure_sinop_runtime_metrics.py, but the Grafana panel indata_dashboard_panels.pywas not updated. This PR updates the panel's Prometheusexprto useray_data_task_completion_time_excl_backpressure_sso the chart displays data again.Change: Single-line fix in
data_dashboard_panels.py— replace the old metric name with the correct one in the panel'sexpr. The formula (average task completion time excluding backpressure over a 5-minute window) is unchanged.Related issues
Fixes the regression from #57788 (metric rename). Related to Ray Data monitoring / dashboard.
Closes: #60163
Additional information
op_runtime_metrics.task_completion_time_excl_backpressure_s→ Stats usesdata_{name}→ Metrics agent addsray_namespace →ray_data_task_completion_time_excl_backpressure_s