[data] Improve Ray Data reading from Hugging Face datasets documentation#58832
[data] Improve Ray Data reading from Hugging Face datasets documentation#58832richardliaw merged 3 commits intoray-project:masterfrom
Conversation
Signed-off-by: Robert Nishihara <[email protected]>
There was a problem hiding this comment.
Code Review
This pull request improves the documentation for reading Hugging Face datasets by adding instructions on authentication. The added information about installing huggingface_hub and setting the HF_TOKEN is helpful. However, the explanatory text could be clearer to avoid confusion. I've suggested a rephrasing to improve clarity.
| Set your Hugging Face token to authenticate. While public datasets can be read without | ||
| a token, Hugging Face rate limits are more aggressive without a token. To read Hugging | ||
| Face datasets without a token, simply set the filesystem argument to ``HfFileSystem()``. |
There was a problem hiding this comment.
The current phrasing might be confusing. It implies that setting filesystem=HfFileSystem() is something you do specifically when you don't have a token. However, HfFileSystem is used for reading from the Hugging Face Hub regardless of whether a token is present. The huggingface_hub library, used by HfFileSystem, automatically handles authentication if the HF_TOKEN is set.
I suggest rephrasing this to clarify the benefits of using a token and what happens without one, without conflating it with the use of HfFileSystem.
| Set your Hugging Face token to authenticate. While public datasets can be read without | |
| a token, Hugging Face rate limits are more aggressive without a token. To read Hugging | |
| Face datasets without a token, simply set the filesystem argument to ``HfFileSystem()``. | |
| Set your Hugging Face token to authenticate. This allows you to access private datasets and provides higher rate limits. Public datasets can be read without a token, but you may be subject to stricter rate limits. |
Signed-off-by: Robert Nishihara <[email protected]>
Signed-off-by: Robert Nishihara <[email protected]>
…ion (ray-project#58832) Signed-off-by: YK <[email protected]>
…ion (ray-project#58832) Signed-off-by: peterxcli <[email protected]>
Part 2 of #58492
NOTE: Still making edits