🔗 Live App: https://quora-duplicate-detector-5.onrender.com/
A machine learning powered web application that detects whether two questions are duplicate / semantically similar or different. Built using Streamlit and deployed on Render.
- Detects semantic similarity between two questions
- Machine Learning based prediction
- Web interface using Streamlit
- Cloud deployment on Render
- Proper dependency management using
requirements.txt
-
User enters two questions
-
Text preprocessing is applied:
- Lowercasing
- Stopword removal
- Tokenization
- Cleaning
-
Feature extraction is performed
-
Similarity distance is calculated
-
ML model predicts:
- Duplicate ✅
- Not Duplicate ❌
| Layer | Technology |
|---|---|
| Frontend | Streamlit |
| Backend | Python |
| ML Model | Scikit-learn |
| NLP | NLTK / distance |
| Deployment | Render |
| Version Control | Git & GitHub |
quora-duplicate-detector/
│
├── app.py # Main Streamlit app
├── helper.py # Helper functions
├── model.pkl # Trained ML model
├── vectorizer.pkl # Text vectorizer
├── requirements.txt # Dependencies
├── README.md # Project documentation
└── .gitignore
git clone https://github.com/your-username/quora-duplicate-detector.git
cd quora-duplicate-detector
pip install -r requirements.txt
streamlit run app.pystreamlit
scikit-learn
nltk
numpy
pandas
beautifulsoup4
bs4
distance
Deployed using Render Cloud Platform
Steps:
-
Push project to GitHub
-
Connect GitHub repo to Render
-
Select Web Service
-
Add build command:
pip install -r requirements.txt
-
Add start command:
streamlit run app.py
-
Deploy 🚀
Input: Q1: Machine learning helps computers learn from data.
Q2: Computers can learn from data using machine learning.
This model is trained on limited data and may not be 100% accurate. Predictions should be considered as probabilistic, not absolute.
Vivek Kumar B.Tech Machine Learning Student AI/ML Developer
This project is licensed under the MIT License — free to use, modify, and distribute.
⭐ If you like this project, give it a star on GitHub
