This project is a self-learning AI system that scrapes educational content from open educational resources, summarizes the content, evaluates the quality of summaries, and continuously improves its summarization capabilities through training.
- Web Scraping: Fetches articles from educational websites like MIT OpenCourseWare, Open UMN, and OER Commons
- Content Processing: Handles both PDF and HTML content
- Text Summarization: Creates concise summaries using extractive techniques
- Quality Evaluation: Assesses summary quality based on multiple metrics
- Self-Improvement: Learns from high-quality summaries to improve future results
- Visualization: Tracks improvement metrics over time
- Scraper: Fetches and extracts content from educational websites
- Cleaner: Preprocesses and cleans raw text
- Summarizer: Generates concise summaries from text
- Evaluator: Assesses the quality of generated summaries
- Trainer: Learns from high-quality examples to improve future summaries
- Pipeline: Orchestrates the entire process
- Clone this repository
- Run the setup script to create the Python environment and install dependencies:
.\setup.ps1- Start the Next.js development server:
npm run dev- Access the web interface at http://localhost:3000
- View AI improvement metrics at http://localhost:3000/ai-improvement
- Use the API endpoints for programmatic access:
/api/summarize: Generate a summary for provided text/api/train: Submit a training pair (text and summary)/api/metrics: Get system performance metrics
/self_learning_ai: Core Python modules for the AI system/app: Next.js web application/data: Storage for articles, summaries, and training data/raw: Raw scraped content/summaries: Generated summaries/fine_tune: Training pairs/reports: Performance visualizations
MIT