A comprehensive unemployment analysis of India (2019–2020) covering Covid-19 impact, state & zone-wise trends, rural vs urban breakdown, and a polynomial regression forecast — with 9 professional visualizations.
📓 View Notebook • 📊 Key Findings • 🚀 How to Run • 📁 Project Structure
This project is Task 2 of the CodeAlpha Data Science Internship. It performs a full-scale analysis of unemployment data across India from May 2019 to October 2020, with a sharp focus on how the Covid-19 pandemic and the March 2020 national lockdown affected employment across all states, zones, and area types (rural/urban).
The analysis answers real policy questions: Which states were hit hardest? Did cities suffer more than villages? What does the recovery look like?
- ✅ Clean, preprocess, and merge two complementary unemployment datasets
- ✅ Perform thorough EDA with monthly, regional, and area-wise visualizations
- ✅ Quantify and visualize the Covid-19 impact on unemployment rates
- ✅ Analyze zone-wise and state-wise trends using heatmaps and comparisons
- ✅ Compare Rural vs Urban unemployment before and after lockdown
- ✅ Build a time series forecast model to project recovery trajectory
- ✅ Derive actionable policy insights from the data
| Dataset | Records | Coverage | Unique Features |
|---|---|---|---|
Unemployment_in_India.csv |
740 | May 2019 – Jun 2020 | Rural/Urban split, 28 states |
Unemployment_Rate_upto_11_2020.csv |
267 | Jan 2020 – Oct 2020 | Zone classification, GPS coordinates |
Features used across both datasets:
| Feature | Description |
|---|---|
Region |
Indian state name |
Date |
Month-end date |
Unemployment_Rate (%) |
Primary target variable |
Estimated Employed |
Total employed workforce |
Labour Participation Rate (%) |
% of working-age population active |
Area |
Rural / Urban (Dataset 1 only) |
Zone |
North/South/East/West/Northeast (Dataset 2 only) |
| Tool | Version | Purpose |
|---|---|---|
| Python | 3.10+ | Core language |
| Pandas | 2.0+ | Data loading, cleaning, transformation |
| NumPy | 1.24+ | Numerical operations |
| Matplotlib | 3.7+ | Line charts, bar charts, trend plots |
| Seaborn | 0.12+ | Heatmaps, KDE plots |
| Scikit-learn | 1.3+ | Polynomial regression forecasting |
| Jupyter Notebook | — | Development environment |
| Metric | Value |
|---|---|
| Pre-Covid Avg Unemployment | ~7–8% |
| Peak Unemployment (Apr–May 2020) | 📈 Sharply Higher |
| % Increase Post-Lockdown | Significant spike |
| Recovery Start | Jun 2020 onwards |
| Most Affected Zone | East / Northeast India |
| Urban vs Rural | Urban consistently higher |
💡 Exact numbers generated when you run the notebook — results printed in the final summary cell.
-
Covid-19 caused the sharpest unemployment spike in the dataset — The March 2020 lockdown triggered a dramatic and near-instantaneous rise in unemployment across all states.
-
Urban unemployment was consistently higher than rural — India's urban informal workforce (daily-wage, gig workers) bore a disproportionate burden.
-
Labour Force Participation dropped sharply — Many workers, especially women, exited the workforce entirely during the lockdown rather than remaining "unemployed."
-
Recovery was underway by June 2020 — The polynomial regression forecast confirms a declining trend from the peak, though rates remained above pre-Covid levels.
-
State-level variation is significant — Some states maintained relatively low rates throughout, while others hit extreme highs — highlighting the need for targeted interventions.
| # | Plot | File | Description |
|---|---|---|---|
| 1 | 📈 National Trend | national_trend.png |
Monthly unemployment with Covid shading |
| 2 | 📊 Distribution | distribution_analysis.png |
Histogram, boxplot by year, Rural vs Urban KDE |
| 3 | 📅 Seasonal Patterns | seasonal_patterns.png |
Monthly averages + Labour participation trend |
| 4 | 🦠 Covid Impact | covid_impact.png |
4-panel deep dive: before/after, workforce drop, state comparison |
| 5 | 🗺️ Zone Analysis | zone_analysis.png |
Zone trends over time + average by zone bar chart |
| 6 | 🌡️ State Heatmap | state_heatmap.png |
State × Month unemployment heatmap |
| 7 | 🏆 Best/Worst States | top_bottom_states.png |
Top 5 highest and lowest unemployment states |
| 8 | 🏘️ Rural vs Urban | rural_urban_analysis.png |
Rural/Urban trend + pre vs post Covid split |
| 9 | 📈 Forecast | forecast.png |
Polynomial regression fit + 4-month forecast |
- Expand rural job guarantee schemes (like MGNREGS) automatically during national crises
- Portable benefits for urban informal workers — delink social security from employers
- State-specific interventions for consistently high-unemployment states
- Women's workforce re-entry programs to recover dropped Labour Force Participation
- Real-time monthly unemployment dashboards for faster policy responses
CodeAlpha_UnemploymentAnalysis/
│
├── 📓 unemployment_analysis.ipynb ← Main Jupyter Notebook
├── 📄 README.md ← This file
├── 📋 requirements.txt ← Python dependencies
├── 📂 Unemployment_in_India.csv ← Dataset 1
├── 📂 Unemployment_Rate_upto_11_2020.csv ← Dataset 2
│
└── 📊 Generated Plots/
├── national_trend.png
├── distribution_analysis.png
├── seasonal_patterns.png
├── covid_impact.png
├── zone_analysis.png
├── state_heatmap.png
├── top_bottom_states.png
├── rural_urban_analysis.png
└── forecast.png
Important: Upload both CSV files when prompted in Colab (or place them in the same folder as the notebook).
# 1. Clone the repository
git clone https://github.com/MOHAMMED-ABUZAR317/CodeAlpha_UnemploymentAnalysis.git
cd CodeAlpha_UnemploymentAnalysis
# 2. Install dependencies
pip install -r requirements.txt
# 3. Launch notebook
jupyter notebook unemployment_analysis.ipynbpandas>=2.0.0
numpy>=1.24.0
scikit-learn>=1.3.0
matplotlib>=3.7.0
seaborn>=0.12.0
jupyter>=1.0.0pip install -r requirements.txt- How to clean and merge multiple real-world datasets with different structures
- Visualizing time series data with event markers (lockdown dates, shaded periods)
- Quantifying the economic impact of a real-world event (Covid-19) with data
- Building polynomial regression for non-linear time series forecasting
- Translating data insights into actionable policy recommendations
| Platform | Link |
|---|---|
| Mohammed Abuzar | |
| 🐙 GitHub | MOHAMMED-ABUZAR317 |
| 🏢 Internship | CodeAlpha |
📉 Made with ❤️ during the CodeAlpha Data Science Internship
If you found this project helpful, please give it a ⭐ on GitHub!