A data-driven optimization system for strategic placement of higher secondary schools in Bihar, India. The project combines demographic modeling, geospatial analysis, and mixed-integer linear programming to generate actionable 5-year construction plans.
Bihar faces a critical education infrastructure gap:
- Population: 13+ crore people, with 5.18 crore children/adolescents under 18
- Capacity Gap: Only ~2.13 crore school seats available for ~5 crore children
- Completion Crisis: Only 9.19% of the population clears Class 12 (higher secondary)
- Equity Challenge: Significant urban-rural disparities in access and quality
This system provides:
- Realistic synthetic data generation at district and pincode levels
- Optimization modeling for school placement under budget and capacity constraints
- Geospatial visualization with year-specific construction phasing
- Interactive dashboards for policy makers and planners
- 211 new higher secondary schools planned for 2025-2029
- Static maps: 5-year overview and annual construction maps
- Interactive Folium map with year-coded markers and detailed popups
- Analysis charts: Construction timeline, investment analysis, demand distribution
- Comprehensive data files: Demand projections, site attributes, distance matrices
├── data/
│ ├── raw/ # Source datasets (shapefiles, CSVs)
│ ├── processed/ # Generated demand and site data
│ └── processed_real/ # Realistic synthetic datasets
├── scripts/
│ ├── data_prep.py # Data preparation and demand synthesis
│ ├── gis_prep.py # Geospatial preprocessing
│ ├── realistic_data_generator.py # Realistic synthetic data pipeline
│ ├── school_optimizer.py # MILP optimization engine
│ └── visualize.py # Static and interactive visualization
├── docs/
│ └── real_data_catalog.md # Data sources and parameter documentation
├── outputs/
│ ├── *.png # Static maps and charts
│ ├── *.html # Interactive Folium map
│ └── *.csv # Optimization results and plans
└── requirements.txt # Python dependencies
- Python 3.9+
uvpackage manager (recommended) orpip
# Clone the repository
git clone <repository-url>
cd bihar-school-optimization
# Install dependencies
uv venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
uv pip install -r requirements.txt
# Generate realistic data
uv run python scripts/realistic_data_generator.py
# Run optimization
uv run python scripts/school_optimizer.py
# Generate visualizations
NO_BASEMAP=1 uv run python scripts/visualize.pypip install -r requirements.txt
python scripts/realistic_data_generator.py
python scripts/school_optimizer.py
python scripts/visualize.py# Generate realistic synthetic data (default: 682 pincodes)
python scripts/realistic_data_generator.py
# Custom seed or population
python scripts/realistic_data_generator.py --seed 42 --state-population 130725310# Run MILP optimization with default parameters
python scripts/school_optimizer.py
# The optimizer will:
# - Load demand projections and site attributes
# - Build MILP model with budget, capacity, and equity constraints
# - Generate 5-year construction plan
# - Apply fallback heuristic if needed# Generate all visualizations
python scripts/visualize.py
# Environment variables for automated runs:
export NO_BASEMAP=1 # Skip online basemap downloads
export DISPLAY_MAPS=0 # Don't show interactive plots- District profiles: Population, literacy, dropout rates by category
- Pincode disaggregation: Urban/rural weighted demand distribution
- Distance matrices: Haversine-based travel cost calculations
- Equity scoring: Composite metrics for underserved area prioritization
- MILP formulation: PuLP-based mixed-integer programming
- Multi-objective: Capital cost, operational cost, travel burden, equity gap
- Constraints: Annual budget, school capacity, maximum travel distance
- Fallback heuristic: Ensures feasible solution even if MILP fails
- Static maps: Matplotlib/Seaborn with year-specific color coding
- Interactive maps: Folium with popups and legends
- Analysis charts: Timeline, investment, and demand distribution
- Geospatial handling: Point fallbacks for missing polygon boundaries
Key optimization parameters (tunable in school_optimizer.py):
params = {
'C_build_cost': 2000000, # ₹0.2 Crore per new school
'O_op_cost_per_student': 15000, # ₹15,000 per student annually
'Cap_school_capacity': 150, # 150 students per school
'B_annual_budget': 19482200000, # ₹1,948.22 Crore annual budget
'max_distance': 50, # 50 km maximum travel distance
'coverage_target': 0.15, # 15% minimum demand coverage
'demand_scale': 0.02, # Scale factor for synthetic demand
# Policy weights
'W_travel': 0.1, # Travel distance importance
'W_peb': 0.4, # Private education burden
'W_equity': 0.5, # Equity gap importance
}outputs/5_YEAR_PLAN_OUTPUT.csv- Detailed construction plan by pincode and year
outputs/Bihar_5_Year_School_Plan_Map.png- Main overview mapoutputs/annual_plan_maps/Year_N_Construction_Map.png- Annual construction mapsoutputs/analysis_charts/construction_timeline.png- Timeline chartoutputs/analysis_charts/investment_analysis.png- Investment analysisoutputs/Bihar_School_Plan_Interactive.html- Interactive Folium map
data/processed_real/pincode_demand_T.csv- Demand projections by pincodedata/processed_real/pincode_sites.csv- Site attributes and centroidsdata/processed_real/distance_matrix_km.csv- Travel distance matrix
- Update
docs/real_data_catalog.mdwith source documentation - Modify
scripts/realistic_data_generator.pyto incorporate new parameters - Adjust optimization weights in
scripts/school_optimizer.py
Edit the build_optimization_model() function in school_optimizer.py to:
- Add new cost components
- Modify constraint formulations
- Adjust policy weights
- Add new chart types in
create_analysis_charts() - Modify color schemes via
YEAR_COLOR_MAP - Enhance interactive popups in
create_interactive_map()
# Test data generation
python scripts/realistic_data_generator.py --dry-run
# Validate optimization output
python -c "
import pandas as pd
df = pd.read_csv('outputs/5_YEAR_PLAN_OUTPUT.csv')
print(f'Total schools: {len(df)}')
print(f'Years: {sorted(df[\"Year_to_Build\"].unique())}')
"
# Verify visualization outputs
ls outputs/*.png outputs/*.html- Data generation: ~2-3 minutes for 682 pincodes
- Optimization: ~1-2 minutes (MILP) + fallback if needed
- Visualization: ~30-60 seconds for all outputs
- Memory usage: <2GB RAM for full pipeline
- Fork the repository
- Create a feature branch:
git checkout -b feature-name - Make changes with tests
- Commit with descriptive messages
- Push and create a pull request
This project is licensed under the MIT License - see the LICENSE file for details.
- Bihar Economic Survey and caste survey data
- UDISE+ education statistics
- Census of India demographic data
- Open-source geospatial and optimization libraries
For questions or collaborations, please open an issue in the repository.
Note: This is a demonstration system using realistic synthetic data. For production use, replace synthetic inputs with actual administrative data and validate model assumptions with domain experts.