Skip to content

atul-feyntech/bihar-school-optimization

Repository files navigation

Bihar School Optimization Project

A data-driven optimization system for strategic placement of higher secondary schools in Bihar, India. The project combines demographic modeling, geospatial analysis, and mixed-integer linear programming to generate actionable 5-year construction plans.

🎯 Problem Statement

Bihar faces a critical education infrastructure gap:

  • Population: 13+ crore people, with 5.18 crore children/adolescents under 18
  • Capacity Gap: Only ~2.13 crore school seats available for ~5 crore children
  • Completion Crisis: Only 9.19% of the population clears Class 12 (higher secondary)
  • Equity Challenge: Significant urban-rural disparities in access and quality

🚀 Solution Overview

This system provides:

  • Realistic synthetic data generation at district and pincode levels
  • Optimization modeling for school placement under budget and capacity constraints
  • Geospatial visualization with year-specific construction phasing
  • Interactive dashboards for policy makers and planners

📊 Key Outputs

  • 211 new higher secondary schools planned for 2025-2029
  • Static maps: 5-year overview and annual construction maps
  • Interactive Folium map with year-coded markers and detailed popups
  • Analysis charts: Construction timeline, investment analysis, demand distribution
  • Comprehensive data files: Demand projections, site attributes, distance matrices

🏗️ Architecture

├── data/
│   ├── raw/                 # Source datasets (shapefiles, CSVs)
│   ├── processed/           # Generated demand and site data
│   └── processed_real/      # Realistic synthetic datasets
├── scripts/
│   ├── data_prep.py         # Data preparation and demand synthesis
│   ├── gis_prep.py          # Geospatial preprocessing
│   ├── realistic_data_generator.py  # Realistic synthetic data pipeline
│   ├── school_optimizer.py  # MILP optimization engine
│   └── visualize.py         # Static and interactive visualization
├── docs/
│   └── real_data_catalog.md # Data sources and parameter documentation
├── outputs/
│   ├── *.png                # Static maps and charts
│   ├── *.html               # Interactive Folium map
│   └── *.csv                # Optimization results and plans
└── requirements.txt         # Python dependencies

🛠️ Installation & Setup

Prerequisites

  • Python 3.9+
  • uv package manager (recommended) or pip

Quick Start with uv

# Clone the repository
git clone <repository-url>
cd bihar-school-optimization

# Install dependencies
uv venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
uv pip install -r requirements.txt

# Generate realistic data
uv run python scripts/realistic_data_generator.py

# Run optimization
uv run python scripts/school_optimizer.py

# Generate visualizations
NO_BASEMAP=1 uv run python scripts/visualize.py

Alternative: pip setup

pip install -r requirements.txt
python scripts/realistic_data_generator.py
python scripts/school_optimizer.py
python scripts/visualize.py

📈 Usage

1. Data Generation

# Generate realistic synthetic data (default: 682 pincodes)
python scripts/realistic_data_generator.py

# Custom seed or population
python scripts/realistic_data_generator.py --seed 42 --state-population 130725310

2. Optimization

# Run MILP optimization with default parameters
python scripts/school_optimizer.py

# The optimizer will:
# - Load demand projections and site attributes
# - Build MILP model with budget, capacity, and equity constraints
# - Generate 5-year construction plan
# - Apply fallback heuristic if needed

3. Visualization

# Generate all visualizations
python scripts/visualize.py

# Environment variables for automated runs:
export NO_BASEMAP=1    # Skip online basemap downloads
export DISPLAY_MAPS=0  # Don't show interactive plots

📋 Key Features

Data Engine

  • District profiles: Population, literacy, dropout rates by category
  • Pincode disaggregation: Urban/rural weighted demand distribution
  • Distance matrices: Haversine-based travel cost calculations
  • Equity scoring: Composite metrics for underserved area prioritization

Optimization Engine

  • MILP formulation: PuLP-based mixed-integer programming
  • Multi-objective: Capital cost, operational cost, travel burden, equity gap
  • Constraints: Annual budget, school capacity, maximum travel distance
  • Fallback heuristic: Ensures feasible solution even if MILP fails

Visualization Engine

  • Static maps: Matplotlib/Seaborn with year-specific color coding
  • Interactive maps: Folium with popups and legends
  • Analysis charts: Timeline, investment, and demand distribution
  • Geospatial handling: Point fallbacks for missing polygon boundaries

📊 Model Parameters

Key optimization parameters (tunable in school_optimizer.py):

params = {
    'C_build_cost': 2000000,        # ₹0.2 Crore per new school
    'O_op_cost_per_student': 15000, # ₹15,000 per student annually
    'Cap_school_capacity': 150,     # 150 students per school
    'B_annual_budget': 19482200000, # ₹1,948.22 Crore annual budget
    'max_distance': 50,             # 50 km maximum travel distance
    'coverage_target': 0.15,        # 15% minimum demand coverage
    'demand_scale': 0.02,           # Scale factor for synthetic demand
    # Policy weights
    'W_travel': 0.1,    # Travel distance importance
    'W_peb': 0.4,       # Private education burden
    'W_equity': 0.5,    # Equity gap importance
}

📁 Output Files

Optimization Results

  • outputs/5_YEAR_PLAN_OUTPUT.csv - Detailed construction plan by pincode and year

Visualizations

  • outputs/Bihar_5_Year_School_Plan_Map.png - Main overview map
  • outputs/annual_plan_maps/Year_N_Construction_Map.png - Annual construction maps
  • outputs/analysis_charts/construction_timeline.png - Timeline chart
  • outputs/analysis_charts/investment_analysis.png - Investment analysis
  • outputs/Bihar_School_Plan_Interactive.html - Interactive Folium map

Data Files

  • data/processed_real/pincode_demand_T.csv - Demand projections by pincode
  • data/processed_real/pincode_sites.csv - Site attributes and centroids
  • data/processed_real/distance_matrix_km.csv - Travel distance matrix

🔧 Customization

Adding New Data Sources

  1. Update docs/real_data_catalog.md with source documentation
  2. Modify scripts/realistic_data_generator.py to incorporate new parameters
  3. Adjust optimization weights in scripts/school_optimizer.py

Changing Optimization Objectives

Edit the build_optimization_model() function in school_optimizer.py to:

  • Add new cost components
  • Modify constraint formulations
  • Adjust policy weights

Extending Visualizations

  • Add new chart types in create_analysis_charts()
  • Modify color schemes via YEAR_COLOR_MAP
  • Enhance interactive popups in create_interactive_map()

🧪 Testing

# Test data generation
python scripts/realistic_data_generator.py --dry-run

# Validate optimization output
python -c "
import pandas as pd
df = pd.read_csv('outputs/5_YEAR_PLAN_OUTPUT.csv')
print(f'Total schools: {len(df)}')
print(f'Years: {sorted(df[\"Year_to_Build\"].unique())}')
"

# Verify visualization outputs
ls outputs/*.png outputs/*.html

📈 Performance

  • Data generation: ~2-3 minutes for 682 pincodes
  • Optimization: ~1-2 minutes (MILP) + fallback if needed
  • Visualization: ~30-60 seconds for all outputs
  • Memory usage: <2GB RAM for full pipeline

🤝 Contributing

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature-name
  3. Make changes with tests
  4. Commit with descriptive messages
  5. Push and create a pull request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

  • Bihar Economic Survey and caste survey data
  • UDISE+ education statistics
  • Census of India demographic data
  • Open-source geospatial and optimization libraries

📞 Contact

For questions or collaborations, please open an issue in the repository.


Note: This is a demonstration system using realistic synthetic data. For production use, replace synthetic inputs with actual administrative data and validate model assumptions with domain experts.

About

Bihar has 13 crore people but barely half the classrooms its children need. This project maps every pincode’s demand and plans 211 new higher-secondary schools so students finish Class 12 closer to home.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors