A Streamlit-based chat interface powered by HuggingFace models with advanced features.
-
Interactive Chat Interface
- Real-time streaming responses
- Code syntax highlighting
- File upload support
- Token usage tracking
-
Model Settings
- Adjustable temperature
- Top-p sampling
- Response length control
- Repetition penalty
-
Advanced Features
- Redis-based caching
- Conversation memory
- File analysis
- Modern UI
- π€ Advanced language models from HuggingFace
- π Real-time streaming responses
- π» Code syntax highlighting and formatting
- π File upload and analysis
- π¬ Persistent conversation history
- π¦ Redis-based caching with fallback
- π¨ Modern, responsive UI
- π Token usage tracking
- Python 3.10 or higher
- Redis server (optional, falls back to in-memory cache)
- HuggingFace API token (Get one here)
# Clone repository
git clone https://github.com/borderlessboy/huggingface-inference-chat.git
cd huggingface-inference-chat
# Create and activate virtual environment
python -m venv venv
.\venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Setup environment
copy .env.example .env
# Edit .env with your HuggingFace API token# Clone repository
git clone https://github.com/borderlessboy/huggingface-inference-chat.git
cd huggingface-inference-chat
# Create and activate virtual environment
python3 -m venv venv
source venv/bin/activate
# Install dependencies
pip install -r requirements.txt
# Setup environment
cp .env.example .env
# Edit .env with your HuggingFace API token
# Optional: Install Redis (macOS with Homebrew)
brew install redis
brew services start redis
# Optional: Install Redis (Ubuntu/Debian)
sudo apt update
sudo apt install redis-server
sudo systemctl start redis# Build the image
docker build -t huggingface-chat .
# Run the container
docker run -p 8501:8501 --env-file .env huggingface-chatThe HuggingFaceAPI class provides the main interface for interacting with HuggingFace's inference API:
from api.huggingface import HuggingFaceAPI
# Initialize client
hf_api = HuggingFaceAPI(model_name="Qwen/Qwen2.5-Coder-32B-Instruct", api_token="your_token")
# Generate response (streaming)
for chunk in hf_api.generate_stream("Your prompt here"):
print(chunk, end="")
# Generate response (non-streaming)
response = hf_api.generate("Your prompt here")temperature(0.0-1.0): Controls randomness in responsestop_p(0.0-1.0): Controls diversity via nucleus samplingmax_tokens(int): Maximum length of generated responserepetition_penalty(1.0-2.0): Prevents repetitive text
The application supports two caching backends:
- Redis (recommended for production)
- In-memory cache (fallback option)
# Redis configuration
REDIS_CONFIG = {
"host": "localhost",
"port": 6379,
"db": 0,
}We welcome contributions! Here's how you can help:
- Fork the repository
- Create a feature branch:
git checkout -b feature/your-feature-name- Make your changes and commit:
git commit -m "Add your feature description"- Push to your fork:
git push origin feature/your-feature-name- Create a Pull Request
-
Code Style
- Follow PEP 8 guidelines
- Use type hints
- Add docstrings for functions and classes
- Use meaningful variable names
-
Testing
- Add tests for new features
- Ensure all tests pass before submitting PR
- Update documentation if needed
-
Commit Messages
- Use clear, descriptive commit messages
- Reference issues if applicable
-
Pull Requests
- Describe changes in detail
- Include screenshots for UI changes
- Update README if needed
- Be respectful and inclusive
- Follow the project's coding standards
- Help others and provide constructive feedback
- Report bugs and issues
- Start the application:
streamlit run src/main.py-
Open your browser and navigate to
http://localhost:8501 -
Start chatting with the assistant!
To remove all cache and temporary files from the project:
python scripts/cleanup.pyThis will remove:
- Python cache files (
__pycache__,.pyc, etc.) - IDE cache directories (
.idea,.vscode) - Test cache (
.pytest_cache,.coverage) - Project cache (
.streamlit,redis-data) - Temporary files (
.swp,.swo,.DS_Store)
