Add endpoint to fetch dataset metadata with token by HadiSDev · Pull Request #4964 · unslothai/unsloth

HadiSDev · 2026-04-10T19:43:11Z

gemini-code-assist

Code Review

This pull request introduces a backend proxy for fetching HuggingFace dataset splits, enabling the use of server-side authentication tokens to access private or gated datasets. The changes include new Pydantic models for split metadata, a /splits API endpoint in the backend, and an update to the frontend useHfDatasetSplits hook to route requests through the backend. Feedback focuses on optimizing the backend logic to avoid performance issues caused by sequential network requests for datasets with many configurations and improving error logging within the split-fetching loop to avoid masking potential issues.

for more information, see https://pre-commit.ci

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 926ba30b26

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

- Remap upstream HF 401/403 to 422 so authFetch does not mistake HF auth failures for expired Studio sessions (avoids forced logout) - Guard against malformed HF split entries (null splits key, missing fields) to prevent KeyError on non-standard responses - Return generic error message in catch-all handler instead of raw str(e) which can leak internal network details - Validate SplitsRequest.dataset is non-empty at the model level

for more information, see https://pre-commit.ci

- Strip whitespace from HF_TOKEN env var to match how user-supplied tokens are handled (trailing spaces in .env files break HF auth) - Avoid leaking upstream URL in error detail when HF returns non-JSON responses (e.g. HTML 429 rate-limit pages); use a generic message instead of falling back to the raw requests library error string

for more information, see https://pre-commit.ci

danielhanchen · 2026-04-12T19:05:05Z

Auto-review verdict: Approved

Adds a backend proxy endpoint for HuggingFace dataset splits metadata, enabling server-side HF_TOKEN fallback so private/gated datasets load correctly when the user has not entered a personal HF token in the Studio UI. The implementation is correct, minimal, and introduces no regressions.

Reason: All real issues (auth 401 remap, env token stripping, error detail sanitization, malformed response handling) were fixed and verified across 25 simulation tests

HadiSDev · 2026-04-13T04:04:53Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a backend proxy for fetching HuggingFace dataset splits, enabling the use of server-side authentication tokens for private or gated datasets. The frontend has been updated to route requests through the new /api/datasets/splits endpoint and handles error responses accordingly. Feedback was provided regarding the backend's error handling, specifically suggesting that silent exception blocks should be replaced with debug logging to facilitate troubleshooting when upstream responses fail to parse.

- Check isinstance(error_data, dict) before calling .get("error") to handle non-dict JSON responses from HuggingFace API - Replace silent except/pass with logger.debug for error response parsing failures Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

HadiSDev · 2026-04-13T04:08:47Z

/gemini review

HadiSDev · 2026-04-13T04:11:00Z

@codex review

chatgpt-codex-connector · 2026-04-13T04:15:23Z

Codex Review: Didn't find any major issues. Can't wait for the next one!

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

gemini-code-assist

Code Review

This pull request implements a backend proxy for the HuggingFace dataset splits API, allowing the application to use server-side authentication for private datasets. It adds the necessary Pydantic models, a new /splits endpoint in the backend routes, and updates the frontend useHfDatasetSplits hook to communicate with the backend instead of the external API directly. The review feedback suggests improving the robustness of the response parsing by verifying the data type before iteration and broadening the exception handling to include all request-related errors.

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

HadiSDev

/gemini review

Catch requests.RequestException instead of bare Exception for the splits endpoint fallback handler, so only network-level errors are caught while unexpected programming errors propagate normally. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

HadiSDev · 2026-04-13T17:19:49Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a backend proxy for fetching HuggingFace dataset splits, which allows the application to use server-side environment variables for authentication when accessing private or gated datasets. The changes include new Pydantic models for request and response validation, a new /splits endpoint in the backend, and an update to the frontend hook to route requests through this new endpoint. Review feedback focused on improving the robustness of the backend by adding explicit type checks for the external API's JSON response and its fields to prevent potential runtime errors or validation failures.

gemini-code-assist · 2026-04-13T17:21:37Z

+        data = resp.json()
+
+        splits_data = data.get("splits")


The code assumes the response from Hugging Face is a dictionary. If the API returns a different JSON type (like a list), data.get("splits") will raise an AttributeError. It is safer to validate the response type before accessing its keys.

Suggested change

data = resp.json()

splits_data = data.get("splits")

data = resp.json()

if not isinstance(data, dict):

logger.error(f"Unexpected response format from HuggingFace for {request.dataset!r}: {data}")

raise HTTPException(status_code = 502, detail = "Invalid response format from HuggingFace")

splits_data = data.get("splits")

gemini-code-assist · 2026-04-13T17:21:38Z

+        entries = [
+            SplitEntry(dataset = s["dataset"], config = s["config"], split = s["split"])
+            for s in splits_data
+            if isinstance(s, dict)
+            and all(k in s for k in ("dataset", "config", "split"))
+        ]


To prevent potential ValidationError from Pydantic when the upstream API returns unexpected data types (e.g., null for a required field), it is recommended to verify that the required fields are present and have the expected types before instantiating SplitEntry.

Suggested change

entries = [

SplitEntry(dataset = s["dataset"], config = s["config"], split = s["split"])

for s in splits_data

if isinstance(s, dict)

and all(k in s for k in ("dataset", "config", "split"))

]

entries = [

SplitEntry(dataset = s["dataset"], config = s["config"], split = s["split"])

for s in splits_data

if isinstance(s, dict)

and all(isinstance(s.get(k), str) for k in ("dataset", "config", "split"))

]

# Conflicts: # studio/backend/routes/datasets.py

HadiSDev · 2026-04-17T20:16:04Z

@danielhanchen I am ready if you are

Add endpoint to fetch dataset metadata with token

926ba30

HadiSDev requested review from Manan17, rolandtannous and wasimysaid as code owners April 10, 2026 19:43

gemini-code-assist Bot reviewed Apr 10, 2026

View reviewed changes

Comment thread studio/backend/routes/datasets.py Outdated

Comment thread studio/backend/routes/datasets.py Outdated

[pre-commit.ci] auto fixes from pre-commit.com hooks

43c37c3

for more information, see https://pre-commit.ci

chatgpt-codex-connector Bot reviewed Apr 10, 2026

View reviewed changes

Comment thread studio/backend/routes/datasets.py Outdated

HadiSDev added 2 commits April 10, 2026 21:47

PR Comment fixes

2d34bde

Merge branch 'main' into main

329dbff

danielhanchen mentioned this pull request Apr 12, 2026

Add endpoint to fetch dataset metadata with token danielhanchen/unsloth-staging-2#35

Closed

danielhanchen added the auto-reviewing PR is being auto-reviewed label Apr 12, 2026

danielhanchen and others added 4 commits April 12, 2026 18:34

[pre-commit.ci] auto fixes from pre-commit.com hooks

d8ae79a

for more information, see https://pre-commit.ci

[pre-commit.ci] auto fixes from pre-commit.com hooks

1356dd8

for more information, see https://pre-commit.ci

danielhanchen added auto-approved Auto-review passed, ready to merge and removed auto-reviewing PR is being auto-reviewed labels Apr 12, 2026

gemini-code-assist Bot reviewed Apr 13, 2026

View reviewed changes

Comment thread studio/backend/routes/datasets.py Outdated

gemini-code-assist Bot reviewed Apr 13, 2026

View reviewed changes

Comment thread studio/backend/routes/datasets.py

Comment thread studio/backend/routes/datasets.py

HadiSDev and others added 2 commits April 13, 2026 15:51

Update studio/backend/routes/datasets.py

cf13d51

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Merge branch 'main' into main

eeae315

HadiSDev commented Apr 13, 2026

View reviewed changes

gemini-code-assist Bot reviewed Apr 13, 2026

View reviewed changes

HadiSDev added 2 commits April 13, 2026 20:43

Merge branch 'main' into main

666ce1a

Merge remote-tracking branch 'upstream/main'

1bc1c48

# Conflicts: # studio/backend/routes/datasets.py

Uh oh!

Conversation

HadiSDev commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

danielhanchen commented Apr 12, 2026

Uh oh!

HadiSDev commented Apr 13, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

HadiSDev commented Apr 13, 2026

Uh oh!

HadiSDev commented Apr 13, 2026

Uh oh!

chatgpt-codex-connector Bot commented Apr 13, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

HadiSDev left a comment

Choose a reason for hiding this comment

Uh oh!

HadiSDev commented Apr 13, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

HadiSDev commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

HadiSDev commented Apr 10, 2026 •

edited

Loading