Takes text prompt as a input and generates a image using runwayml/stable-diffusion-v1-5 model (Diffusion model)
This project is a Stable Diffusion-based text-to-image generator with a Streamlit frontend and an optimized backend using PyTorch and Hugging Face Diffusers.
- Generate images from text prompts using Stable Diffusion.
- Choose inference steps & guidance scale for better results.
- Supports both CPU and GPU execution (automatically selects based on hardware).
- Optimized for low VRAM GPUs (MX450, RTX 3050, etc.).
- Streamlit UI for easy interaction.
git clone https://github.com/your-repo/text-to-image-generator.git
cd text-to-image-generatorpip install -r requirements.txtstreamlit run frontend.pypython img_gen.py- Enter a text prompt in the Streamlit UI.
- Adjust the number of inference steps (default: 20, lower for speed, higher for quality).
- Adjust the guidance scale (default: 7.5, lower for diversity, higher for accuracy).
- Click 'Generate Image' and wait for processing.
- View & save the generated image.
- The script automatically detects GPU availability.
- To force CPU execution (if GPU is slow):
generate_image(prompt, num_steps=15, device="cpu")
- Change model to Stable Diffusion 1.4 (lighter version):
model_id = "CompVis/stable-diffusion-v1-4"
- Reduce
num_stepsto 10-15 for faster generation.
- Python 3.8+
- NVIDIA GPU with CUDA (Recommended)
- Minimum 8GB RAM (16GB Recommended for CPU mode)
- Libraries: See
requirements.txt
| Issue > Solution |
|-----------|-------------|
| Slow generation (10+ min) > Reduce num_steps to 10-15, use CompVis/stable-diffusion-v1-4 |
| GPU crashes due to VRAM > Use CPU mode (device='cpu'), enable enable_attention_slicing() |
| Black images generated > Increase num_steps, try different guidance_scale values |
| xformers error > Remove xformers from requirements.txt and install manually |
This project is open-source under the Appache License.