Who Connects Global Aid? The Hidden Geometry of 10 Million Transactions
Repository: https://github.com/chirindaopensource/who_connects_global_aid
Owner: 2025 Craig Chirinda (Open Source Projects)
This repository contains an independent, professional-grade Python implementation of the research methodology from the 2025 paper entitled "Who Connects Global Aid? The Hidden Geometry of 10 Million Transactions" by:
- Paul X. McCarthy (League of Scholars, Sydney; UNSW)
- Xian Gong (University of Technology Sydney)
- Marian-Andrei Rizoiu (University of Technology Sydney)
- Paolo Boldi (Università degli Studi di Milano)
The project provides a complete, end-to-end computational framework for replicating the paper's findings. It delivers a modular, auditable, and extensible pipeline that executes the entire research workflow: from the ingestion and cleansing of massive IATI transaction logs to the rigorous construction of bipartite networks, high-dimensional structural embedding via Node2Vec, and the identification of critical "knowledge brokers" through centrality analysis.
- Introduction
- Theoretical Background
- Features
- Methodology Implemented
- Core Components (Notebook Structure)
- Key Callable:
run_global_aid_pipeline - Prerequisites
- Installation
- Input Data Structure
- Usage
- Output Structure
- Project Structure
- Customization
- Contributing
- Recommended Extensions
- License
- Citation
- Acknowledgments
This project provides a Python implementation of the analytical framework presented in McCarthy et al. (2025). The core of this repository is the iPython Notebook who_connects_global_aid_draft.ipynb, which contains a comprehensive suite of functions to replicate the paper's findings. The pipeline is designed to map the topological structure of the global aid ecosystem, moving beyond aggregate volume flows to reveal the hidden geometry of influence.
The paper addresses the structural complexity of the modern aid system, characterized by a "triple revolution" of new goals, instruments, and actors. This codebase operationalizes the paper's framework, allowing users to:
- Rigorously validate and manage the entire experimental configuration via a single
config.yamlfile. - Cleanse and normalize over 10 million transaction records from the International Aid Transparency Initiative (IATI).
- Construct a bipartite Provider-Receiver network and project it into a Provider-Provider co-investment graph.
- Learn high-dimensional structural embeddings using Node2Vec to capture functional roles.
- Reveal functional clusters (Humanitarian vs. Development) via UMAP dimensionality reduction.
- Identify the "Solar System" of central actors using Hub Scores (HITS) and Betweenness Centrality.
- Validate findings externally by correlating offline structural influence with online web authority (PageRank).
The implemented methods combine techniques from Network Science, Graph Theory, and Machine Learning.
1. Bipartite Network Construction:
The system is modeled as a bipartite graph
2. One-Mode Projection:
To analyze donor relationships, the bipartite graph is projected into a Provider-Provider co-occurrence network
3. Structural Embedding (Node2Vec):
The topology is encoded into a low-dimensional vector space
4. Centrality and Brokerage: Influence is quantified using HITS Hub Scores (for the "Solar System" ranking) and Betweenness Centrality (for identifying brokers): $$ C_B(v) = \sum_{s \neq v \neq t} \frac{\sigma_{st}(v)}{\sigma_{st}} $$ This metric highlights actors like J-PAL and the Hewlett Foundation that bridge structural holes between disparate clusters.
Below is a diagram which summarizes the proposed approach:
The provided iPython Notebook (who_connects_global_aid_draft.ipynb) implements the full research pipeline, including:
- Modular, Multi-Task Architecture: The pipeline is decomposed into 29 distinct, modular tasks, each with its own orchestrator function.
- Configuration-Driven Design: All study parameters (ER thresholds, Node2Vec hyperparameters, UMAP settings) are managed in an external
config.yamlfile. - Rigorous Data Validation: A multi-stage validation process checks schema integrity, type coercion feasibility, and referential integrity.
- Deterministic Entity Resolution: Implements a robust TF-IDF blocking and Cosine Similarity pipeline to resolve organisation names to canonical identities.
- High-Performance Computing: Utilizes Numba-accelerated random walk generation and sparse matrix algebra for scalability.
- Reproducible Artifacts: Generates structured dictionaries, serializable outputs, and cryptographic manifests for every intermediate result.
The core analytical steps directly implement the methodology from the paper:
- Validation & Cleansing (Tasks 1-8): Ingests raw IATI data, validates schemas, enforces temporal scope (1967-2025), and normalizes multi-valued fields.
- Integration & ER (Tasks 9-10): Joins transactions to activity contexts and performs entity resolution to construct a canonical organisation map.
- Descriptive Analysis (Tasks 11-12): Computes geographic transaction density and longitudinal instrument evolution.
- Network Construction (Tasks 13-15): Builds the bipartite graph and projects it into a provider co-occurrence network.
- Topology & Embeddings (Tasks 16-17): Generates Node2Vec embeddings and projects them to 2D using UMAP to reveal functional clusters.
- Centrality & Ranking (Tasks 18-20): Computes HITS and Betweenness centrality to construct the "Solar System" ranking.
- Analysis & Validation (Tasks 21-26): Analyzes subgroups (Universities/Foundations), characterizes broker networks (Hewlett), and correlates findings with web PageRank.
- Orchestration & Provenance (Tasks 27-29): Manages the end-to-end execution and packages reproducible outputs.
The who_connects_global_aid_draft.ipynb notebook is structured as a logical pipeline with modular orchestrator functions for each of the 29 major tasks. All functions are self-contained, fully documented with type hints and docstrings, and designed for professional-grade execution.
The project is designed around a single, top-level user-facing interface function:
run_global_aid_pipeline: This master orchestrator function runs the entire automated research pipeline from end-to-end. A single call to this function reproduces the entire computational portion of the project, managing data flow between cleansing, graph construction, and analysis modules.
- Python 3.9+
- Core dependencies:
pandas,numpy,scipy,networkx,gensim,scikit-learn,umap-learn,numba,pyyaml.
-
Clone the repository:
git clone https://github.com/chirindaopensource/who_connects_global_aid.git cd who_connects_global_aid -
Create and activate a virtual environment (recommended):
python -m venv venv source venv/bin/activate # On Windows, use `venv\Scripts\activate`
-
Install Python dependencies:
pip install pandas numpy scipy networkx gensim scikit-learn umap-learn numba pyyaml
The pipeline requires four primary DataFrames:
df_transactions_raw: IATI transaction elements (Value, Date, Provider, Receiver).df_activities_raw: Activity metadata (Sectors, Countries, Instruments).df_organisations_raw: Master organisation list for entity resolution.df_web_links_raw: Web crawl data for external validation.
The who_connects_global_aid_draft.ipynb notebook provides a complete, step-by-step guide. The primary workflow is to execute the final cell of the notebook, which demonstrates how to use the top-level execute_master_workflow orchestrator:
# Final cell of the notebook
# This block serves as the main entry point for the entire project.
if __name__ == '__main__':
# 1. Load the master configuration from the YAML file.
import yaml
with open('config.yaml', 'r') as f:
config = yaml.safe_load(f)
# 2. Load raw datasets (Example using synthetic generator provided in the notebook)
# In production, load from CSV/Parquet: pd.read_csv(...)
data_inputs = generate_synthetic_data()
# 3. Execute the entire replication study.
results = execute_master_workflow(
df_transactions_raw=data_inputs["df_transactions_raw"],
df_activities_raw=data_inputs["df_activities_raw"],
df_organisations_raw=data_inputs["df_organisations_raw"],
df_web_links_raw=data_inputs["df_web_links_raw"],
config=config,
output_root="./global_aid_study_output"
)
# 4. Access results
if results.success:
print(f"Validation Correlation: {results.baseline_results.artifacts['correlation'].pearson_r}")The pipeline returns a MasterWorkflowResults object containing:
baseline_results: APipelineResultsobject with all artifacts from the primary run (Graph, Embeddings, Centrality Tables).robustness_results: ARobustnessArtifactcontaining sensitivity analysis metrics.provenance_artifact: AProvenanceArtifactwith cryptographic hashes and metadata summaries.
who_connects_global_aid/
│
├── who_connects_global_aid_draft.ipynb # Main implementation notebook
├── config.yaml # Master configuration file
├── requirements.txt # Python package dependencies
│
├── LICENSE # MIT Project License File
└── README.md # This file
The pipeline is highly customizable via the config.yaml file. Users can modify study parameters such as:
- Scope:
start_year,end_year. - Entity Resolution:
fuzzy_matchingthreshold. - Node2Vec:
p,q,walk_length,num_walks. - UMAP:
n_neighbors,min_dist. - Projection: Context definitions (Country vs. Sector).
Contributions are welcome. Please fork the repository, create a feature branch, and submit a pull request with a clear description of your changes. Adherence to PEP 8, type hinting, and comprehensive docstrings is required.
Future extensions could include:
- Temporal Dynamics: Analyzing the evolution of the network topology over sliding windows.
- Multiplex Networks: Modeling different financial instruments (Grants vs. Loans) as distinct layers.
- Impact Analysis: Correlating network centrality with aid effectiveness outcomes.
This project is licensed under the MIT License. See the LICENSE file for details.
If you use this code or the methodology in your research, please cite the original paper:
@article{mccarthy2025who,
title={Who Connects Global Aid? The Hidden Geometry of 10 Million Transactions},
author={McCarthy, Paul X. and Gong, Xian and Rizoiu, Marian-Andrei and Boldi, Paolo},
journal={arXiv preprint arXiv:2512.17243},
year={2025}
}For the implementation itself, you may cite this repository:
Chirinda, C. (2025). Global Aid Network Analysis Pipeline: An Open Source Implementation.
GitHub repository: https://github.com/chirindaopensource/who_connects_global_aid
- Credit to Paul X. McCarthy, Xian Gong, Marian-Andrei Rizoiu, and Paolo Boldi for the foundational research that forms the entire basis for this computational replication.
- This project is built upon the exceptional tools provided by the open-source community. Sincere thanks to the developers of the scientific Python ecosystem, including Pandas, NumPy, SciPy, NetworkX, Gensim, and UMAP.
--
This README was generated based on the structure and content of the who_connects_global_aid_draft.ipynb notebook and follows best practices for research software documentation.
