Release OS-Oracle-7B model and training dataset on Hugging Face

Hi @numbmelon 🤗

Niels here from the open-source team at Hugging Face. I discovered your work on Arxiv and your GitHub repository for "OS-Oracle: A Comprehensive Framework for Cross-Platform GUI Critic Models" (https://github.com/numbmelon/OS-Oracle). It's great to see that the OS-Critic Bench dataset is already available on the Hugging Face Hub!

I also noticed from your README's TODO list that you plan to "Release model checkpoints" for OS-Oracle-7B and "Release training datasets" (referring to the 310k critic samples mentioned in the paper). It would be fantastic to have these additional artifacts available on the 🤗 Hub as well, to improve their discoverability and visibility. We can add tags so that people can easily find them when filtering https://huggingface.co/models and https://huggingface.co/datasets.

## Uploading models

See here for a guide: https://huggingface.co/docs/hub/models-uploading.

In this case, we could leverage the [PyTorchModelHubMixin](https://huggingface.co/docs/huggingface_hub/package_reference/mixins#huggingface_hub.PyTorchModelHubMixin) class which adds `from_pretrained` and `push_to_hub` to any custom `nn.Module`. Alternatively, one can leverage the [hf_hub_download](https://huggingface.co/docs/huggingface_hub/en/guides/download#download-a-single-file) one-liner to download a checkpoint from the hub.

We encourage researchers to push each model checkpoint to a separate model repository, so that things like download stats also work. We can then also link the checkpoints to the paper page. For your OS-Oracle-7B model, a pipeline tag like `image-text-to-text` would be highly relevant for discoverability.

## Uploading dataset

Would be awesome to make the "310k critic samples" training dataset available on 🤗 , so that people can do:

```python
from datasets import load_dataset

dataset = load_dataset("your-hf-org-or-username/your-dataset")
```
See here for a guide: https://huggingface.co/docs/datasets/loading. For your training dataset, a task category like `image-text-to-text` would be suitable.

Besides that, there's the [dataset viewer](https://huggingface.co/docs/hub/en/datasets-viewer) which allows people to quickly explore the first few rows of the data in the browser.

Let me know if you're interested/need any help regarding this!

Cheers,

Niels
ML Engineer @ HF 🤗

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release OS-Oracle-7B model and training dataset on Hugging Face #1

Uploading models

Uploading dataset

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Release OS-Oracle-7B model and training dataset on Hugging Face #1

Description

Uploading models

Uploading dataset

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions