Datasets for Instruction Tuning of Large Language Models
-
Updated
Nov 30, 2023
Datasets for Instruction Tuning of Large Language Models
agentic data generation(under refactor!!!)
Proxy server that automatically stores messages exchanged between any OAI-compatible frontend and backend as a ShareGPT dataset to be used for training/finetuning.
Exports a chat as a ShareGPT dataset
Genshin Impact Character Chat Models tuned by Lora on LLM
High-density RAG Semantic Search Engine & Quran Corpus (GEO/SEO Architecture)
AIWG training-complete framework — corpus-to-dataset pipeline with SKILL.md agentic surface and optional Python runtime backend. Marketplace plugin for AIWG.
30 conversational LLM datasets (~7.7M rows) normalized to one unified schema and published as a single HuggingFace dataset with per-source configs.
Fork of GeoAnima's Claude.ai chat exporter userscript, improving button UI and exporting directly to ShareGPT-format JSON
Deepseek-Dataset-Generator creates conversational datasets for LLM fine-tuning via DeepSeek API. Supports various formats (ChatML, ShareGPT, Alpaca, JSON, CSV), easy configuration via YAML and detailed logs. Ideal for generating realistic and customized data quickly.
Merge heterogeneous chat/text sources into a single LLM training format (JSONL)
A JSON viewer/editor for multi-line string values - allows to render and edit strings in plain mode (handles escaping/unescaping). Ideal for editing ShareGPT or Alpaca type LLM training examples.
Add a description, image, and links to the sharegpt topic page so that developers can more easily learn about it.
To associate your repository with the sharegpt topic, visit your repo's landing page and select "manage topics."