Papers with Code Rebuilt

⚠️ IMPORTANT WARNING ⚠️

This application is rebuilt from the discontinued Papers with Code website. All data remains un-updated since the website shut down and should be considered a historical snapshot rather than current research information.

What this means:

Data is frozen as of when Papers with Code was discontinued

No new papers, methods, or datasets are being added

Performance metrics and leaderboards are not current

Use this for historical research and reference purposes only

A modern web application that provides a comprehensive interface for exploring academic papers, code repositories, datasets, methods, and leaderboards from the Papers with Code platform. This project rebuilds the core functionality of Papers with Code with a focus on performance, user experience, and modern web technologies.

🎯 Purpose

This application serves as a research tool for:

Researchers looking for the latest papers in their field
Developers seeking code implementations of research papers
Students exploring datasets and methods for their projects
Anyone interested in staying updated with cutting-edge AI/ML research

🚀 Features

📄 Papers Browser: Search and browse academic papers with abstracts
💻 Code Repositories: Find official and community code implementations
🏆 Leaderboards: View performance evaluations across different datasets and tasks
📊 Datasets: Explore datasets used in research
🔬 Methods: Discover research methods and approaches
🔍 Advanced Search: Search across papers, titles, and abstracts
📱 Responsive Design: Works seamlessly on desktop and mobile devices

🛠️ Technology Stack

Frontend

React 18 with TypeScript
Vite for fast development and building
Tailwind CSS for styling
React Query for data fetching and caching
React Router for navigation
Lucide React for icons

Backend

Node.js with Express.js
SQLite for data storage and efficient querying
Database-driven architecture for fast data access

Data Sources

Papers with Code API data (papers, code links, evaluations, methods, datasets)
SQLite database with optimized schema and indexes

📦 Installation

Prerequisites

Node.js (v16 or higher)
npm or yarn
Python 3.7+ (for database building and cleaning)
wget (for downloading data files)
gunzip (for extracting compressed files)

Setup Instructions

Clone the repository

git clone <repository-url>
cd paperswithcode-rebuilt

Install dependencies
```
npm install
```
Build the database (if not already built)
```
cd data
python build_database.py
```
This will:
- Create the SQLite database from JSON files
- Set up optimized schema with indexes
- Import all data efficiently
Clean the database (recommended for optimal performance)
```
python clean_methods_database.py
```
This will:
- Remove spam entries and irrelevant content
- Clean up customer service spam, phone numbers, and commercial content
- Ensure only legitimate academic content remains
Start the development server
```
npm run dev:full
```
This starts both the backend server (port 3001) and frontend development server (port 5173)

🏃‍♂️ Running the Application

Development Mode

# Start both frontend and backend
npm run dev:full

# Or start them separately
npm run server    # Backend only (port 3001)
npm run dev       # Frontend only (port 5173)

Production Build

# Build the frontend
npm run build

# Start the production server
npm run server

🏗️ Project Structure

paperswithcode-rebuilt/
├── src/                          # Frontend source code
│   ├── components/               # React components
│   │   ├── Header.tsx           # Navigation and search header
│   │   ├── PaperCard.tsx        # Individual paper display
│   │   ├── DatasetCard.tsx      # Dataset information display
│   │   ├── MethodCard.tsx       # Method information display
│   │   ├── LeaderboardTable.tsx # Performance leaderboard
│   │   ├── LeaderboardChart.tsx # Chart visualization for leaderboards
│   │   ├── PerformanceChart.tsx # Performance metrics visualization
│   │   ├── ContentRenderer.tsx  # Content rendering utilities
│   │   ├── MathRenderer.tsx     # Mathematical expression rendering
│   │   └── LoadingSpinner.tsx   # Loading indicator
│   ├── hooks/                   # Custom React hooks
│   │   └── useData.ts          # Data fetching and caching hooks
│   ├── services/                # API service functions
│   ├── types/                   # TypeScript type definitions
│   ├── pages/                   # Page components
│   │   ├── PapersPage.tsx      # Papers listing page
│   │   ├── DatasetsPage.tsx    # Datasets listing page
│   │   ├── MethodsPage.tsx     # Methods listing page
│   │   └── LeaderboardsPage.tsx # Leaderboards page
│   ├── utils/                   # Utility functions
│   │   └── dateUtils.ts        # Date handling utilities
│   ├── App.tsx                  # Main application component
│   └── main.tsx                 # Application entry point
├── data/                        # Data files and database
│   ├── papers_with_code.db     # SQLite database with all data
│   ├── build_database.py       # Database builder script
│   ├── clean_methods_database.py # Database cleaning script
│   ├── clean_dataset_database.py # Dataset cleaning script
│   └── README.md               # Database documentation
├── server.js                   # Express.js backend server
└── package.json                # Project dependencies and scripts

🔧 Components Overview

Frontend Components

Header: Navigation bar with search functionality and tab switching
PaperCard: Displays paper information including title, authors, abstract, and code links
DatasetCard: Shows dataset details and usage statistics
MethodCard: Presents research methods and their applications
LeaderboardTable: Displays performance rankings for different tasks and datasets
LeaderboardChart: Interactive charts for visualizing performance data
PerformanceChart: Performance metrics visualization
ContentRenderer: Utilities for rendering various content types
MathRenderer: Mathematical expression rendering with LaTeX support
LoadingSpinner: Visual feedback during data loading

Backend Services

Database API: Fast SQLite-based data access with optimized queries
Search API: Provides fast search across papers and abstracts using database indexes
Pagination Service: Efficient pagination for large datasets
Leaderboard Service: Real-time performance rankings and evaluations

📊 Data Sources

The application uses data from Papers with Code:

Papers: Academic papers with abstracts and metadata
Code Links: Connections between papers and their code implementations
Evaluations: Performance metrics and leaderboards
Methods: Research methods and approaches
Datasets: Dataset information and usage statistics

🗄️ Database Architecture

The application now uses a SQLite database instead of JSON streaming for improved performance:

Benefits:

Faster queries with database indexes
Reduced memory usage (no need to load large JSON files)
Better search performance with SQL LIKE queries
Efficient pagination for large datasets
Relational data structure with proper foreign keys

Database Schema:

Core tables: papers, authors, tasks, methods, datasets, evaluations, code_links
Relationship tables: paper_authors, paper_tasks, paper_methods, evaluation_categories_rel
Optimized indexes on frequently queried fields

Current Database Statistics:

Total papers: ~2.4M academic papers with abstracts
Methods: 1,940 legitimate research methods (cleaned from 8,725 total)
Datasets: Cleaned and validated research datasets
Code links: Connections between papers and implementations
Evaluations: Performance metrics and leaderboards

Database Cleaning:

Comprehensive spam removal from methods database
Removed 6,706 spam entries (76.8% of original methods were spam)
Cleaned categories: Customer service spam, phone numbers, travel/airline spam, commercial advertising
Preserved legitimate content: All academic methods and research content maintained
Multi-language support: Handles spam in English and Spanish

Migration:

The old JSON streaming approach has been replaced with database queries
All data is now stored in data/papers_with_code.db
Large JSON files have been removed to save disk space (~2.7GB freed)

🔍 Search Functionality

Full-text search across paper titles and abstracts
Real-time results with debounced input
Pagination for large result sets
Filtering by different data types
Mathematical expression rendering with LaTeX support

🎨 UI/UX Features

Modern Design: Clean, responsive interface using Tailwind CSS
Dark/Light Mode: Automatic theme detection
Loading States: Smooth loading indicators
Error Handling: Graceful error messages and recovery
Mobile Responsive: Optimized for all screen sizes
Interactive Charts: Performance visualization with charts
Mathematical Rendering: Beautiful LaTeX math expression display

🚀 Performance Optimizations

SQLite database with optimized schema and indexes for fast queries
React Query caching for API responses
Lazy loading of components and data
Efficient pagination for large datasets
Database-driven search with full-text search capabilities
Cleaned database with only legitimate academic content for faster queries

🧹 Database Maintenance

Cleaning Scripts

The project includes automated cleaning scripts to maintain data quality:

clean_methods_database.py: Removes spam from methods database
- Detects customer service spam, phone numbers, travel content
- Removes commercial advertising and irrelevant content
- Preserves legitimate research methods
- Multi-language spam detection (English/Spanish)
clean_dataset_database.py: Cleans dataset database
- Removes invalid homepages and spam entries
- Ensures dataset quality and relevance

Running Maintenance

cd data
# Clean methods database
python clean_methods_database.py

# Clean datasets database
python clean_dataset_database.py

🤝 Contributing

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

Development Guidelines

Follow TypeScript best practices
Use Tailwind CSS for styling
Ensure responsive design for mobile devices
Add tests for new functionality
Update documentation for new features

📝 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

Papers with Code for providing the data
The open-source community for the amazing tools and libraries used in this project
Contributors who helped clean and maintain the database quality

📞 Support

If you encounter any issues or have questions:

Check the existing issues in the repository
Create a new issue with detailed information about your problem
Include system information and error messages
For database-related issues, check the data/ directory documentation

🔄 Data Updates

The application can be updated with fresh data from Papers with Code:

Download new data from the official sources
Rebuild the database using python build_database.py
Clean the database using the cleaning scripts
Restart the application to use the updated data

Happy researching! 🎓

Papers with code datasets

You can download the full dataset behind paperswithcode.com here:

Download links for the data dumps are:

The last JSON is in the sota-extractor format and the code from there can be used to load in the JSON into a set of Python classes.

At the moment, data is regenerated daily.

Part of the data is coming from the sources listed in the sota-extractor README.

Licence

All data is licenced under CC-BY-SA.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
data		data
src		src
.gitignore		.gitignore
README.md		README.md
index.html		index.html
package-lock.json		package-lock.json
package.json		package.json
postcss.config.js		postcss.config.js
server.js		server.js
tailwind.config.js		tailwind.config.js
tsconfig.json		tsconfig.json
tsconfig.node.json		tsconfig.node.json
vite.config.ts		vite.config.ts

Folders and files

Latest commit

History

Repository files navigation

Papers with Code Rebuilt

🎯 Purpose

🚀 Features

🛠️ Technology Stack

Frontend

Backend

Data Sources

📦 Installation

Prerequisites

Setup Instructions

🏃‍♂️ Running the Application

Development Mode

Production Build

🏗️ Project Structure

🔧 Components Overview

Frontend Components

Backend Services

📊 Data Sources

🗄️ Database Architecture

Benefits:

Database Schema:

Current Database Statistics:

Database Cleaning:

Migration:

🔍 Search Functionality

🎨 UI/UX Features

🚀 Performance Optimizations

🧹 Database Maintenance

Cleaning Scripts

Running Maintenance

🤝 Contributing

Development Guidelines

📝 License

🙏 Acknowledgments

📞 Support

🔄 Data Updates

Papers with code datasets

Licence

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages