Skip to content

Ahanmr/OpenAlex-KG-Analyzer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

11 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

OpenAlex Collaboration Analysis Tools

Network Visualization

Example of Institution Collaboration Network

This repository contains tools for analyzing institutional collaborations using the OpenAlex API. The tools help visualize collaboration networks, trends, and patterns between academic institutions.

πŸ“Š Example Visualizations

Global Collaboration Map

Collaboration Map

Collaboration Trends

Collaboration Trends

Analysis Summary

Analysis Summary

πŸš€ Features

  • Institution Data Fetching: Retrieve and save data about academic institutions worldwide
  • Collaboration Analysis: Analyze collaboration patterns between institutions
  • Visualizations:
    • Interactive network graphs of institutional collaborations
    • Global collaboration distribution maps
    • Time-series analysis of collaboration trends
    • Summary dashboards
    • Top collaborating institutions networks

πŸ“‹ Prerequisites

First, clone the repository and install the required packages:

git clone https://github.com/Ahanmr/OpenAlex-KG-Analyzer.git
cd OpenAlex-KG-Analyzer
pip install -r requirements.txt

Required packages:

  • pandas>=1.5.0
  • networkx>=3.0
  • plotly>=5.13.0
  • requests>=2.28.0
  • country_converter>=1.0.0
  • numpy>=1.23.0

πŸ› οΈ Usage

1. Fetch Institution Data

First, fetch and save institution data from OpenAlex:

python fetch_institutions.py

This creates a directory institution_data_TIMESTAMP containing:

  • institutions_full.csv: Complete dataset with all institution details
  • institutions_simple.csv: Simplified version with key fields
  • top_institutions.md: Markdown file listing top 100 institutions
  • summary.txt: Summary statistics about the institutions

2. Run Collaboration Analysis

To analyze collaborations for a specific institution:

python run_analysis.py

Before running, update these parameters in run_analysis.py:

institution_id = "I2802101240"  # Your target institution ID
email = "[email protected]"  # Your email for API access

This creates a directory results_INSTITUTION-ID_INSTITUTION-NAME_TIMESTAMP containing:

Generated Files

  1. Raw Data

    • collaboration_data.csv: Raw collaboration data
  2. Interactive Visualizations

    • collaborations_over_time.html: Time series of collaboration counts
    • collaboration_map.html: Global distribution of collaborations
    • collaboration_trends.html: Trends with top collaborating countries
    • institution_network_top20.html: Network of top 20 collaborating institutions
    • institution_network_top50.html: Network of top 50 collaborating institutions
    • collaboration_summary.html: Summary dashboard
  3. Network Data

    • collaboration_network.gexf: Network graph file (can be opened with Gephi)

πŸ“Š Visualization Types

1. Institution Network

  • Shows collaboration relationships between institutions
  • Node size indicates collaboration frequency
  • Interactive hovering shows collaboration details
  • Available in two versions: top 20 and top 50 institutions

2. Global Collaboration Map

  • Choropleth map showing global collaboration distribution
  • Color intensity indicates collaboration frequency
  • Uses country_converter for accurate country naming
  • Interactive tooltips with collaboration counts

3. Collaboration Trends

  • Time series visualization of collaboration patterns
  • Shows trends for top collaborating countries
  • Interactive legend for filtering countries
  • Hover information for detailed counts

4. Summary Dashboard

  • Combined view of collaboration metrics
  • Top collaborating institutions bar chart
  • Yearly collaboration trend line
  • Overall collaboration statistics

πŸ” Finding Institution IDs

Method 1: Using fetch_institutions.py

  1. Run fetch_institutions.py
  2. Check generated files:
    • institutions_simple.csv
    • top_institutions.md

Method 2: Direct API Access

  • Base URL: https://api.openalex.org/institutions
  • Search endpoint: https://api.openalex.org/institutions?search=INSTITUTION_NAME

Common Institution IDs

INSTITUTION_IDS = {
    'Harvard University': 'I127803138',
    'Stanford University': 'I127803157',
    'MIT': 'I127803157',
    'University of Oxford': 'I127803157',
    'University of Cambridge': 'I127803160'
}

βš™οΈ Configuration Options

fetch_institutions.py

max_pages = 50  # Number of pages to fetch
per_page = 200  # Results per page

run_analysis.py

start_year = 2020  # Start year for analysis
end_year = 2023    # End year for analysis
top_n = 20        # Number of top institutions to show

πŸ“ Data Fields

Institution Data

  • openalex_id: Unique identifier
  • display_name: Institution name
  • country_code: ISO country code
  • type: Institution type
  • works_count: Number of publications
  • cited_by_count: Citation count

Collaboration Data

  • year: Publication year
  • collaborating_institution: Partner institution name
  • country: Collaborator's country code
  • work_id: Publication identifier

πŸ”¬ Knowledge Graph Construction

Building Co-authorship Networks

To construct and analyze co-authorship networks using graph.py:

python graph.py

Before running, update these parameters in the main() function of graph.py:

email = "[email protected]"  # Your email for API access
institution_id = "I97018004"      # Target institution ID
start_year = 2022                 # Start year for analysis
end_year = 2023                   # End year for analysis
max_papers = 500                  # Maximum number of papers to analyze

This creates a directory knowledge_graph_TIMESTAMP containing:

Generated Files

  1. Network Data

    • author_nodes.csv: Node-level data with author metrics
    • coauthorship_edges.csv: Edge data with collaboration strengths
    • coauthorship_network.gexf: Network file for Gephi visualization
  2. Interactive Visualizations

    • coauthorship_network_top_10.html: Network of top 10 authors
    • coauthorship_network_top_20.html: Network of top 20 authors
    • coauthorship_network_top_50.html: Network of top 50 authors
  3. Analysis Files

    • network_metadata.txt: Network statistics and summary
    • graph_generation.log: Processing log with details

Network Visualization Options

The generated network visualizations include:

  • Node size based on publication count
  • Edge weight based on collaboration frequency
  • Interactive hover information showing:
    • Author name and institution
    • Publication count
    • ORCID (when available)
    • Collaboration details
  • Color scaling based on publication metrics

πŸ™ Acknowledgments

About

This repository contains tools for analyzing institutional collaborations using the OpenAlex API. The tools help visualize collaboration networks, trends, and patterns between academic institutions.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors