NIRF Rank Predictor & Analysis

Machine learning pipeline to predict NIRF (National Institutional Ranking Framework) scores/ranks of Indian institutions using historical data (2017–2024), OCR-extracted parameters from images, fuzzy matching and XGBoost models. Includes SHAP and permutation importance for explainability.

Key Highlights

End-to-End Pipeline: Automated data scraping, cleaning, fuzzy matching, and multi-year merging for consistent analytics.
Predictive Modeling: XGBoost regression trained on NIRF parameters to estimate scores and ranks with high accuracy.
Explainability: SHAP & permutation importance visualizations to interpret feature contributions to rankings.

📊 Project Overview

Scrape NIRF HTML tables and parse ranking tables (scripts/scraper.py + scripts/parser.py)
Download parameter images and extract sub-parameters via OCR (scripts/img_download.py + scripts/image_data_extract.py)
Merge OCR-derived parameter scores into yearly CSVs (scripts/merge_parameter_scores.py)
Consolidate multi-year data and create a combined CSV (scripts/main.py)
Train an XGBoost model and predict 2024 scores/ranks (scripts/nirf_rank_prediction.py)
Explain model predictions using SHAP and permutation importance (scripts/nirf_shap_and_permutation.py)

Important execution order (what you used):

python scripts/img_download.py # download images, extract with image_data_extract

python scripts/main.py # scrape/merge/generate combined CSV (calls merge_parameter_scores)

python scripts/nirf_rank_prediction.py # train & predict (your original script)

python scripts/nirf_shap_and_permutation.py # SHAP & permutation analysis

📂 Project Structure

nirf-rank-predictor/
├── README.md
├── requirements.txt
├── LICENSE
├── .gitignore
├── data/                  # small sample dataset (NOT full/raw data)
├── scripts/               # all python scripts (scraper, parser, OCR, models)
├── outputs/               # generated outputs (predictions, shap figures)
└── notebooks/             # optional analysis notebooks

Quick setup (Linux / macOS / WSL)

# 1) Clone after creating the GitHub repo (or use local folder)
git clone https://github.com/Amang2711/Nirf-Rank-Predictor.git
cd Nirf-Rank-Predictor

# 2) Create virtual env & install
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

# 3) System dependencies (for OCR)
# Ubuntu / Debian:
sudo apt-get install tesseract-ocr libtesseract-dev
# macOS (brew):
brew install tesseract

# 4) Run the pipeline (order matters):
python scripts/img_download.py
python scripts/main.py
python scripts/nirf_rank_prediction.py
python scripts/nirf_shap_and_permutation.py

🛠️ Tech Stack

Languages: Python
Libraries: Pandas, XGBoost, SHAP, scikit-learn, Matplotlib
Data Processing: OCR (pytesseract), FuzzyWuzzy for name matching
Data Source: Official NIRF website (2017–2024)

Notes & tips

Add only sample data to the repo. Do not push large CSVs, downloaded images, or outputs — .gitignore already handles this.
For pushing to GitHub via HTTPS you'll need a Personal Access Token (PAT) instead of a password; or configure SSH keys.
Tweak scripts/nirf_rank_prediction.py if you want to change model hyperparameters or training years.
The scripts/image_data_extract.py contains OCR heuristics for edge cases — preserve if it's working for your dataset.
For future executions, change the year in all the files accordingly, as the current file takes into consideration data only till year 2024.

License

This project is licensed under the MIT License — see the LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NIRF Rank Predictor & Analysis

Key Highlights

📊 Project Overview

📂 Project Structure

Quick setup (Linux / macOS / WSL)

🛠️ Tech Stack

Notes & tips

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
csv_data		csv_data
docs		docs
notebooks		notebooks
outputs		outputs
scripts		scripts
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

NIRF Rank Predictor & Analysis

Key Highlights

📊 Project Overview

📂 Project Structure

Quick setup (Linux / macOS / WSL)

🛠️ Tech Stack

Notes & tips

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages