This repository contains tools for analyzing institutional collaborations using the OpenAlex API. The tools help visualize collaboration networks, trends, and patterns between academic institutions.
- Institution Data Fetching: Retrieve and save data about academic institutions worldwide
- Collaboration Analysis: Analyze collaboration patterns between institutions
- Visualizations:
- Interactive network graphs of institutional collaborations
- Global collaboration distribution maps
- Time-series analysis of collaboration trends
- Summary dashboards
- Top collaborating institutions networks
First, clone the repository and install the required packages:
git clone https://github.com/Ahanmr/OpenAlex-KG-Analyzer.git
cd OpenAlex-KG-Analyzer
pip install -r requirements.txtRequired packages:
- pandas>=1.5.0
- networkx>=3.0
- plotly>=5.13.0
- requests>=2.28.0
- country_converter>=1.0.0
- numpy>=1.23.0
First, fetch and save institution data from OpenAlex:
python fetch_institutions.pyThis creates a directory institution_data_TIMESTAMP containing:
institutions_full.csv: Complete dataset with all institution detailsinstitutions_simple.csv: Simplified version with key fieldstop_institutions.md: Markdown file listing top 100 institutionssummary.txt: Summary statistics about the institutions
To analyze collaborations for a specific institution:
python run_analysis.pyBefore running, update these parameters in run_analysis.py:
institution_id = "I2802101240" # Your target institution ID
email = "[email protected]" # Your email for API accessThis creates a directory results_INSTITUTION-ID_INSTITUTION-NAME_TIMESTAMP containing:
-
Raw Data
collaboration_data.csv: Raw collaboration data
-
Interactive Visualizations
collaborations_over_time.html: Time series of collaboration countscollaboration_map.html: Global distribution of collaborationscollaboration_trends.html: Trends with top collaborating countriesinstitution_network_top20.html: Network of top 20 collaborating institutionsinstitution_network_top50.html: Network of top 50 collaborating institutionscollaboration_summary.html: Summary dashboard
-
Network Data
collaboration_network.gexf: Network graph file (can be opened with Gephi)
- Shows collaboration relationships between institutions
- Node size indicates collaboration frequency
- Interactive hovering shows collaboration details
- Available in two versions: top 20 and top 50 institutions
- Choropleth map showing global collaboration distribution
- Color intensity indicates collaboration frequency
- Uses country_converter for accurate country naming
- Interactive tooltips with collaboration counts
- Time series visualization of collaboration patterns
- Shows trends for top collaborating countries
- Interactive legend for filtering countries
- Hover information for detailed counts
- Combined view of collaboration metrics
- Top collaborating institutions bar chart
- Yearly collaboration trend line
- Overall collaboration statistics
- Run
fetch_institutions.py - Check generated files:
institutions_simple.csvtop_institutions.md
- Base URL:
https://api.openalex.org/institutions - Search endpoint:
https://api.openalex.org/institutions?search=INSTITUTION_NAME
INSTITUTION_IDS = {
'Harvard University': 'I127803138',
'Stanford University': 'I127803157',
'MIT': 'I127803157',
'University of Oxford': 'I127803157',
'University of Cambridge': 'I127803160'
}max_pages = 50 # Number of pages to fetch
per_page = 200 # Results per pagestart_year = 2020 # Start year for analysis
end_year = 2023 # End year for analysis
top_n = 20 # Number of top institutions to showopenalex_id: Unique identifierdisplay_name: Institution namecountry_code: ISO country codetype: Institution typeworks_count: Number of publicationscited_by_count: Citation count
year: Publication yearcollaborating_institution: Partner institution namecountry: Collaborator's country codework_id: Publication identifier
To construct and analyze co-authorship networks using graph.py:
python graph.pyBefore running, update these parameters in the main() function of graph.py:
email = "[email protected]" # Your email for API access
institution_id = "I97018004" # Target institution ID
start_year = 2022 # Start year for analysis
end_year = 2023 # End year for analysis
max_papers = 500 # Maximum number of papers to analyzeThis creates a directory knowledge_graph_TIMESTAMP containing:
-
Network Data
author_nodes.csv: Node-level data with author metricscoauthorship_edges.csv: Edge data with collaboration strengthscoauthorship_network.gexf: Network file for Gephi visualization
-
Interactive Visualizations
coauthorship_network_top_10.html: Network of top 10 authorscoauthorship_network_top_20.html: Network of top 20 authorscoauthorship_network_top_50.html: Network of top 50 authors
-
Analysis Files
network_metadata.txt: Network statistics and summarygraph_generation.log: Processing log with details
The generated network visualizations include:
- Node size based on publication count
- Edge weight based on collaboration frequency
- Interactive hover information showing:
- Author name and institution
- Publication count
- ORCID (when available)
- Collaboration details
- Color scaling based on publication metrics
- OpenAlex for the comprehensive research data API
- NetworkX for network analysis tools
- country_converter for country code handling



