SynthForge

🚀 Democratizing Synthetic Data for AI Builders

SynthForge is a privacy-first, lightweight tool that generates high-quality synthetic tabular data to overcome common AI bottlenecks like data scarcity, annotation challenges, and privacy concerns. Built as a Minimum Viable Product (MVP) in just 3 days, it's designed for indie developers, students, and teams with real-world constraints—making "infinite clean data" accessible without heavy compute or budgets.

Live Demo: synthforge.streamlit.app

Why SynthForge?

In AI development, 80% of time is often wasted on data prep (Gartner). SynthForge flips that by enabling quick, ethical data generation:

Solve Data Shortages: Create statistically similar datasets from small uploads.
Prioritize Privacy: Built-in PII detection and differential privacy.
Streamline Annotation: Auto-label with sentiment, binning, or clustering.
Vision: Empower underrepresented builders (e.g., from Jaipur, India) to innovate globally.

Inspired by trends like synthetic data (market: $1-2B by 2025) and tools like Faker/SDV, but focused on simplicity and accessibility.

Features

Upload & Generate: Supports CSV/Excel (auto-samples large files for efficiency).
Customization: Adjust variance for randomness, add differential privacy noise.
Privacy Tools: Scans for emails/phones; optional epsilon-based anonymization.
Auto-Labeling: Rule-based or LLM-enhanced (via free Groq tier) for sentiment on text, binning/clustering on numerics.
Outputs: Download synthetic CSV + HTML report with stats comparisons (means, stds, top values).
Lightweight: Runs smoothly on modest hardware—no GPUs required.

Installation & Setup

Clone the repo:

git clone https://github.com/yourusername/synthforge.git
cd synthforge

Create a virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies:

pip install streamlit pandas numpy faker scikit-learn openpyxl requests

(Optional) For LLM-enhanced labeling: Sign up for a free Groq API key at groq.com and input it in the app sidebar.

Usage

Run the app locally:

streamlit run app.py

Open in your browser (defaults to http://localhost:8501).
Upload a file, tweak settings in the sidebar, and generate!
For production: Deploy to Streamlit Sharing or similar (as done for the live demo).

Example: Upload a CSV with names, emails, ages—watch it generate synthetics while masking PII.

Tech Stack

Frontend: Streamlit (simple, interactive UI)
Core Libraries: Pandas (data handling), NumPy (stats), Faker (heuristic generation), scikit-learn (clustering/imputation)
Privacy/Labeling: Regex for PII, Laplace noise for DP, Requests for optional Groq LLM
Deployment: Streamlit Cloud (free tier)

Roadmap

v1.1: Multi-modal support (text/images).
v1.2: API endpoints for integrations (e.g., Jupyter/HF).
v2.0: Federated privacy, compute optimization, bias auditing.
Long-term: Agentic workflows and a synthetic data marketplace.

We welcome contributions! See CONTRIBUTING.md (add if needed).

Contributing

Fork the repo, create a branch, and submit a PR. Focus areas: Bug fixes, new heuristics, modality expansions. Let's build together!

License

MIT License – See LICENSE for details.

Contact

Hanish (Founder): LinkedIn | [email protected]
Issues/PRs: GitHub Issues
Feedback: Test the app and drop thoughts on LinkedIn or X!

Built with ❤️ from Jaipur, India. Join the forge—let's crush AI bottlenecks! 🚀

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.devcontainer		.devcontainer
venv		venv
README.md		README.md
app.py		app.py
app_day1_backup.py		app_day1_backup.py
app_day2_backup.py		app_day2_backup.py
app_day3_backup.py		app_day3_backup.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SynthForge

Why SynthForge?

Features

Installation & Setup

Usage

Tech Stack

Roadmap

Contributing

License

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SynthForge

Why SynthForge?

Features

Installation & Setup

Usage

Tech Stack

Roadmap

Contributing

License

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages