🔬 Scientific Article Recommender

A hybrid recommendation system for scientific papers that combines semantic embeddings, knowledge graphs, and user behavior analysis to provide personalized research paper recommendations.

🌟 Features

Hybrid Recommendation Engine: Combines content-based filtering using SciBERT embeddings with collaborative filtering
Knowledge Graph Integration: Uses Neo4j to store and query relationships between papers, concepts, and authors
Semantic Search: Leverages SciBERT embeddings for semantic similarity between research papers
Ontology-based Recommendations: Integrates scientific concept hierarchies for better recommendations
Modern Web Interface: Clean, responsive UI built with Flask and TailwindCSS
Real-time Recommendations: Instant paper suggestions based on topics and user preferences

🏗️ Architecture

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   Flask Web App │────│  Recommendation │────│   Neo4j Graph   │
│                 │    │     Engine      │    │    Database     │
└─────────────────┘    └─────────────────┘    └─────────────────┘
         │                       │                       │
         │                       │                       │
    ┌─────────┐          ┌──────────────┐        ┌──────────────┐
    │   UI    │          │   SciBERT    │        │   OpenAlex   │
    │Templates│          │  Embeddings  │        │     API      │
    └─────────┘          └──────────────┘        └──────────────┘

🚀 Complete Setup Guide

Prerequisites

Python 3.8+
Neo4j Database (Desktop or Cloud)
Git
8GB+ RAM (for SciBERT embedding generation)

1. Clone and Setup Repository

git clone https://github.com/anVSS1/Scientific-Article-Recommender.git
cd Scientific-Article-Recommender

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

2. Setup Neo4j Database

Download Neo4j Desktop: https://neo4j.com/download/
Create a new database
Set password (remember this!)
Start the database

3. Configure Environment Variables

# Copy environment template
cp .env.example .env

# Edit .env file with your settings:
NEO4J_URI=bolt://localhost:7687
NEO4J_USER=neo4j
NEO4J_PASSWORD=your_actual_password
FLASK_DEBUG=True
FLASK_PORT=5050

📊 Data Setup (IMPORTANT!)

⚠️ This repository contains NO data files - you need to fetch and generate your own data.

Step 1: Fetch Scientific Articles

Edit the notebook: existing_scripts/openalec-fetcher.ipynb

Add your email (required by OpenAlex API):

EMAIL_FOR_OPENALEX = "[email protected]"  # Replace this!

Configure domains you want to fetch:

DOMAINS = [
    ("Computer Science", "https://openalex.org/C41008148", 1400),
    ("Artificial Intelligence", "https://openalex.org/C154945302", 400),
    ("Physics", "https://openalex.org/C121332964", 200),
    # Add more domains as needed
]

Run the notebook to fetch articles and concepts

Step 2: Update File Paths in Scripts

All scripts have hardcoded paths that you MUST change to match your setup:

`existing_scripts/generate_embeddings.py`

# Change these paths to match your data location:
articles_file = "data/cleaned data/processed_articles.json"  # Update path
concepts_file = "data/cleaned data/processed_concepts.json"  # Update path
output_dir = "data/embeddings/"  # Update path

`existing_scripts/import_data_to_neo4j.py`

# Update these paths:
articles_file = "data/cleaned data/processed_articles.json"
concepts_file = "data/cleaned data/processed_concepts.json"

`existing_scripts/fake_user_generator.py`

# Update these paths:
articles_file = "data/cleaned data/processed_articles.json"
concepts_file = "data/cleaned data/processed_concepts.json"
output_file = "data/fake_user_logs.csv"

`Website/app.py`

# Update paths in the Flask app:
embeddings_file = "data/embeddings/embeddings_articles.csv"
concepts_embeddings_file = "data/embeddings/embeddings_concepts.csv"

Step 3: Generate Embeddings

# Generate SciBERT embeddings for your articles
python existing_scripts/generate_embeddings.py

Note: This process can take several hours depending on your dataset size and hardware.

Step 4: Import Data to Neo4j

# Import articles and concepts to Neo4j
python existing_scripts/import_data_to_neo4j.py

# Load embeddings to Neo4j
python existing_scripts/load_embeddings_to_neo4j.py

# Generate fake user data for testing
python existing_scripts/fake_user_generator.py

# Create user profiles in Neo4j
python existing_scripts/populate_user_profiles_neo4j.py

Step 5: Run the Application

cd Website
python app.py

Navigate to http://localhost:5050

📁 Project Structure

Scientific-Article-Recommender/
├── Website/                          # Flask web application
│   ├── app.py                       # Main Flask app
│   ├── backend/reco.py              # Recommendation engine
│   └── templates/                   # HTML templates
├── existing_scripts/                # Data processing scripts
│   ├── openalec-fetcher.ipynb      # Fetch data from OpenAlex
│   ├── generate_embeddings.py       # Generate SciBERT embeddings
│   ├── import_data_to_neo4j.py     # Import to Neo4j
│   ├── load_embeddings_to_neo4j.py # Load embeddings
│   ├── fake_user_generator.py      # Generate test users
│   └── populate_user_profiles_neo4j.py # Setup user profiles
├── data/                           # YOUR DATA GOES HERE
│   ├── fetched data/               # Raw OpenAlex data
│   ├── cleaned data/               # Processed articles/concepts
│   └── embeddings/                 # Generated embeddings
├── requirements.txt                # Python dependencies
├── .env.example                    # Environment template
└── README.md                      # This file

🔧 Customization Options

Change Research Domains

Edit openalec-fetcher.ipynb to fetch different scientific domains:

DOMAINS = [
    ("Your Domain", "OpenAlex_Concept_ID", target_count),
    ("Biology", "https://openalex.org/C86803240", 500),
    ("Medicine", "https://openalex.org/C71924100", 300),
]

Modify Embedding Model

Change the SciBERT model in generate_embeddings.py:

self.tokenizer = AutoTokenizer.from_pretrained("your-preferred-model")
self.model = AutoModel.from_pretrained("your-preferred-model")

Adjust Recommendation Weights

Modify the hybrid recommendation weights in Website/backend/reco.py:

content_weight = 0.6  # Content-based filtering weight
ontology_weight = 0.4  # Ontology-based weight

🧪 Usage Examples

Topic-based Search

Navigate to the main page
Select "Topic Search"
Enter: "Neural Networks", "Machine Learning", etc.
Get semantically similar papers

Personalized Recommendations

Select "Personalized" mode
Choose a user profile (generated by fake_user_generator.py)
Enter a search query
Receive recommendations based on user history

Ontology Explorer

Click "Explore Ontology" to browse concept hierarchies
Search for concepts and explore relationships

🔍 Troubleshooting

Common Issues

"No module named 'neo4j'"

pip install neo4j

"Connection refused" to Neo4j

Make sure Neo4j is running
Check your .env file credentials
Verify the URI (usually bolt://localhost:7687)

"File not found" errors

Update all file paths in the scripts to match your setup
Make sure you've run the data fetching steps

SciBERT model download fails

# Install transformers properly
pip install transformers torch

Embedding generation is slow

Use GPU if available (install torch with CUDA)
Reduce batch size in generate_embeddings.py
Process smaller datasets first

Performance Tips

Use GPU: Install PyTorch with CUDA for faster embedding generation
Increase RAM: 8GB+ recommended for large datasets
SSD Storage: Faster I/O for large data processing
Batch Processing: Adjust batch sizes based on your hardware

📊 Data Sources & APIs

OpenAlex Integration

Website: https://openalex.org/
API Docs: https://docs.openalex.org/
Rate Limits: Be respectful, use your email
Data License: CC0 (public domain)

Scientific Domains Available

Computer Science
Artificial Intelligence
Physics
Biology
Medicine
Mathematics
Engineering
And many more...

🤝 Contributing

Fork the repository
Create your feature branch (git checkout -b feature/AmazingFeature)
Update file paths in your modifications
Test with your own data
Commit changes (git commit -m 'Add AmazingFeature')
Push to branch (git push origin feature/AmazingFeature)
Open a Pull Request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

OpenAlex for providing open scientific data
SciBERT for scientific text embeddings
Neo4j for graph database technology
Flask for the web framework

📧 Contact

Developer 1: anVSS1
Email: [email protected] LinkedIn: LinkedIn Profile
GitHub: @anVSS1

Developer 2: KAN
LinkedIn: LinkedIn Profile

Project Link: https://github.com/anVSS1/Scientific-Article-Recommender

⚠️ IMPORTANT NOTES

No Data Included: This repository contains only code. You must fetch and generate your own data.
Update All Paths: Every script has hardcoded file paths that need to be updated for your system.
OpenAlex Email Required: You must add your email to the OpenAlex fetcher script.
Hardware Requirements: Embedding generation requires significant computational resources.
Neo4j Setup: You must install and configure Neo4j before running the application.

⭐ Star this repo if you find it helpful!

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
Website		Website
existing_scripts		existing_scripts
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config_template.py		config_template.py
requirements.txt		requirements.txt
scientific_recommender.owl		scientific_recommender.owl
setup.py		setup.py

Folders and files

Latest commit

History

Repository files navigation

🔬 Scientific Article Recommender

🌟 Features

🏗️ Architecture

🚀 Complete Setup Guide

Prerequisites

1. Clone and Setup Repository

2. Setup Neo4j Database

3. Configure Environment Variables

📊 Data Setup (IMPORTANT!)

Step 1: Fetch Scientific Articles

Step 2: Update File Paths in Scripts

existing_scripts/generate_embeddings.py

existing_scripts/import_data_to_neo4j.py

existing_scripts/fake_user_generator.py

Website/app.py

Step 3: Generate Embeddings

Step 4: Import Data to Neo4j

Step 5: Run the Application

📁 Project Structure

🔧 Customization Options

Change Research Domains

Modify Embedding Model

Adjust Recommendation Weights

🧪 Usage Examples

Topic-based Search

Personalized Recommendations

Ontology Explorer

🔍 Troubleshooting

Common Issues

Performance Tips

📊 Data Sources & APIs

OpenAlex Integration

Scientific Domains Available

🤝 Contributing

📄 License

🙏 Acknowledgments

📧 Contact

⚠️ IMPORTANT NOTES

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`existing_scripts/generate_embeddings.py`

`existing_scripts/import_data_to_neo4j.py`

`existing_scripts/fake_user_generator.py`

`Website/app.py`

Packages