β οΈ IMPORTANT WARNINGβ οΈ This application is rebuilt from the discontinued Papers with Code website. All data remains un-updated since the website shut down and should be considered a historical snapshot rather than current research information.
What this means:
- Data is frozen as of when Papers with Code was discontinued
- No new papers, methods, or datasets are being added
- Performance metrics and leaderboards are not current
- Use this for historical research and reference purposes only
A modern web application that provides a comprehensive interface for exploring academic papers, code repositories, datasets, methods, and leaderboards from the Papers with Code platform. This project rebuilds the core functionality of Papers with Code with a focus on performance, user experience, and modern web technologies.
This application serves as a research tool for:
- Researchers looking for the latest papers in their field
- Developers seeking code implementations of research papers
- Students exploring datasets and methods for their projects
- Anyone interested in staying updated with cutting-edge AI/ML research
- π Papers Browser: Search and browse academic papers with abstracts
- π» Code Repositories: Find official and community code implementations
- π Leaderboards: View performance evaluations across different datasets and tasks
- π Datasets: Explore datasets used in research
- π¬ Methods: Discover research methods and approaches
- π Advanced Search: Search across papers, titles, and abstracts
- π± Responsive Design: Works seamlessly on desktop and mobile devices
- React 18 with TypeScript
- Vite for fast development and building
- Tailwind CSS for styling
- React Query for data fetching and caching
- React Router for navigation
- Lucide React for icons
- Node.js with Express.js
- SQLite for data storage and efficient querying
- Database-driven architecture for fast data access
- Papers with Code API data (papers, code links, evaluations, methods, datasets)
- SQLite database with optimized schema and indexes
- Node.js (v16 or higher)
- npm or yarn
- Python 3.7+ (for database building and cleaning)
- wget (for downloading data files)
- gunzip (for extracting compressed files)
-
Clone the repository
git clone <repository-url> cd paperswithcode-rebuilt
-
Install dependencies
npm install
-
Build the database (if not already built)
cd data python build_database.pyThis will:
- Create the SQLite database from JSON files
- Set up optimized schema with indexes
- Import all data efficiently
-
Clean the database (recommended for optimal performance)
python clean_methods_database.py
This will:
- Remove spam entries and irrelevant content
- Clean up customer service spam, phone numbers, and commercial content
- Ensure only legitimate academic content remains
-
Start the development server
npm run dev:full
This starts both the backend server (port 3001) and frontend development server (port 5173)
# Start both frontend and backend
npm run dev:full
# Or start them separately
npm run server # Backend only (port 3001)
npm run dev # Frontend only (port 5173)# Build the frontend
npm run build
# Start the production server
npm run serverpaperswithcode-rebuilt/
βββ src/ # Frontend source code
β βββ components/ # React components
β β βββ Header.tsx # Navigation and search header
β β βββ PaperCard.tsx # Individual paper display
β β βββ DatasetCard.tsx # Dataset information display
β β βββ MethodCard.tsx # Method information display
β β βββ LeaderboardTable.tsx # Performance leaderboard
β β βββ LeaderboardChart.tsx # Chart visualization for leaderboards
β β βββ PerformanceChart.tsx # Performance metrics visualization
β β βββ ContentRenderer.tsx # Content rendering utilities
β β βββ MathRenderer.tsx # Mathematical expression rendering
β β βββ LoadingSpinner.tsx # Loading indicator
β βββ hooks/ # Custom React hooks
β β βββ useData.ts # Data fetching and caching hooks
β βββ services/ # API service functions
β βββ types/ # TypeScript type definitions
β βββ pages/ # Page components
β β βββ PapersPage.tsx # Papers listing page
β β βββ DatasetsPage.tsx # Datasets listing page
β β βββ MethodsPage.tsx # Methods listing page
β β βββ LeaderboardsPage.tsx # Leaderboards page
β βββ utils/ # Utility functions
β β βββ dateUtils.ts # Date handling utilities
β βββ App.tsx # Main application component
β βββ main.tsx # Application entry point
βββ data/ # Data files and database
β βββ papers_with_code.db # SQLite database with all data
β βββ build_database.py # Database builder script
β βββ clean_methods_database.py # Database cleaning script
β βββ clean_dataset_database.py # Dataset cleaning script
β βββ README.md # Database documentation
βββ server.js # Express.js backend server
βββ package.json # Project dependencies and scripts
- Header: Navigation bar with search functionality and tab switching
- PaperCard: Displays paper information including title, authors, abstract, and code links
- DatasetCard: Shows dataset details and usage statistics
- MethodCard: Presents research methods and their applications
- LeaderboardTable: Displays performance rankings for different tasks and datasets
- LeaderboardChart: Interactive charts for visualizing performance data
- PerformanceChart: Performance metrics visualization
- ContentRenderer: Utilities for rendering various content types
- MathRenderer: Mathematical expression rendering with LaTeX support
- LoadingSpinner: Visual feedback during data loading
- Database API: Fast SQLite-based data access with optimized queries
- Search API: Provides fast search across papers and abstracts using database indexes
- Pagination Service: Efficient pagination for large datasets
- Leaderboard Service: Real-time performance rankings and evaluations
The application uses data from Papers with Code:
- Papers: Academic papers with abstracts and metadata
- Code Links: Connections between papers and their code implementations
- Evaluations: Performance metrics and leaderboards
- Methods: Research methods and approaches
- Datasets: Dataset information and usage statistics
The application now uses a SQLite database instead of JSON streaming for improved performance:
- Faster queries with database indexes
- Reduced memory usage (no need to load large JSON files)
- Better search performance with SQL LIKE queries
- Efficient pagination for large datasets
- Relational data structure with proper foreign keys
- Core tables: papers, authors, tasks, methods, datasets, evaluations, code_links
- Relationship tables: paper_authors, paper_tasks, paper_methods, evaluation_categories_rel
- Optimized indexes on frequently queried fields
- Total papers: ~2.4M academic papers with abstracts
- Methods: 1,940 legitimate research methods (cleaned from 8,725 total)
- Datasets: Cleaned and validated research datasets
- Code links: Connections between papers and implementations
- Evaluations: Performance metrics and leaderboards
- Comprehensive spam removal from methods database
- Removed 6,706 spam entries (76.8% of original methods were spam)
- Cleaned categories: Customer service spam, phone numbers, travel/airline spam, commercial advertising
- Preserved legitimate content: All academic methods and research content maintained
- Multi-language support: Handles spam in English and Spanish
- The old JSON streaming approach has been replaced with database queries
- All data is now stored in
data/papers_with_code.db - Large JSON files have been removed to save disk space (~2.7GB freed)
- Full-text search across paper titles and abstracts
- Real-time results with debounced input
- Pagination for large result sets
- Filtering by different data types
- Mathematical expression rendering with LaTeX support
- Modern Design: Clean, responsive interface using Tailwind CSS
- Dark/Light Mode: Automatic theme detection
- Loading States: Smooth loading indicators
- Error Handling: Graceful error messages and recovery
- Mobile Responsive: Optimized for all screen sizes
- Interactive Charts: Performance visualization with charts
- Mathematical Rendering: Beautiful LaTeX math expression display
- SQLite database with optimized schema and indexes for fast queries
- React Query caching for API responses
- Lazy loading of components and data
- Efficient pagination for large datasets
- Database-driven search with full-text search capabilities
- Cleaned database with only legitimate academic content for faster queries
The project includes automated cleaning scripts to maintain data quality:
-
clean_methods_database.py: Removes spam from methods database- Detects customer service spam, phone numbers, travel content
- Removes commercial advertising and irrelevant content
- Preserves legitimate research methods
- Multi-language spam detection (English/Spanish)
-
clean_dataset_database.py: Cleans dataset database- Removes invalid homepages and spam entries
- Ensures dataset quality and relevance
cd data
# Clean methods database
python clean_methods_database.py
# Clean datasets database
python clean_dataset_database.py- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
- Follow TypeScript best practices
- Use Tailwind CSS for styling
- Ensure responsive design for mobile devices
- Add tests for new functionality
- Update documentation for new features
This project is licensed under the MIT License - see the LICENSE file for details.
- Papers with Code for providing the data
- The open-source community for the amazing tools and libraries used in this project
- Contributors who helped clean and maintain the database quality
If you encounter any issues or have questions:
- Check the existing issues in the repository
- Create a new issue with detailed information about your problem
- Include system information and error messages
- For database-related issues, check the
data/directory documentation
The application can be updated with fresh data from Papers with Code:
- Download new data from the official sources
- Rebuild the database using
python build_database.py - Clean the database using the cleaning scripts
- Restart the application to use the updated data
Happy researching! π
You can download the full dataset behind paperswithcode.com here:
Download links for the data dumps are:
The last JSON is in the sota-extractor format and the code from there can be used to load in the JSON into a set of Python classes.
At the moment, data is regenerated daily.
Part of the data is coming from the sources listed in the sota-extractor README.
All data is licenced under CC-BY-SA.