In today’s AI landscape, new frameworks, services, breakthroughs, and research are emerging at an overwhelming pace. Keeping up with these rapid developments is essential for anyone working in this field.
As an open-ended assignment for my Deploying AI class at the University of Toronto, Data Sciences Institute, I decided to create AIRADA, a Proof of Concept (POC) AI system designed to bridge the gap by quickly capturing and summarizing the latest AI-related information available on the market. It also provides precise source links to the content used to generate its summaries.
AIRADA includes three main services:
File: tools_github.py
Tool: search_github_repos
Description:
This service retrieves trending GitHub repositories related to AI topics such as:
ai-agentsagentic-aillmragllm-inferencevector-databaseprompt-engineeringlangchainopenaihuggingfacemachine-learning
It helps users quickly discover hot and emerging repositories in the AI ecosystem.
Example questions this service can answer:
- "Show me the top 5 AI/LLM GitHub projects"
- "Give me repos with the topic agentic-ai"
- "Trending Python LLM frameworks on GitHub"
- "Most starred RAG libraries"
File: tools_papers.py
Tool: search_ai_papers
Dataset: https://huggingface.co/datasets/J0nasW/paperswithcode
The original dataset contains approximately 56,000 research papers, categorized into 3,000 tasks and 16 research areas.
For this project:
-
The dataset was filtered to retain only relevant AI-related topics.
-
Only the following fields were kept to optimize storage:
titleabstracturl_absurl_pdfarxiv_id
After trimming, approximately 8,500 papers were saved in:
data/papers_agents_llm_subset.csv
Description:
This service supports questions related to research papers, academic studies, model comparisons, research trends, and literature-based explanations using the curated database of ~8,500 AI-related papers.
Example questions this service can answer:
- "Compare GPT-3 and PaLM papers"
- "Summarize top LLM research trends"
- "What do papers say about chain-of-thought prompting?"
- "Find research on multi-agent debate"
File: tools_news.py
Tool: get_ai_news
RSS Feed sources:
- VentureBeat AI: https://venturebeat.com/category/ai/feed/
- MIT Tech Review: https://www.technologyreview.com/feed/
- The Decoder: https://the-decoder.com/feed/
Description:
This service aggregates live RSS feeds from three well-known AI news sources. It helps users stay updated with:
- Current AI events
- Recent announcements
- Industry news
- Product launches
- Trending AI developments
Example questions this service can answer:
- "What's new in AI this week?"
- "Latest LLM announcements"
- "What did OpenAI release recently?"
- "Summarize today's AI news"
- "What's happening in AI right now?"
File: app.py
The user interface is built using Gradio (gr.ChatInterface), providing a conversational chatbot experience.
The chatbot's system prompt (defined in prompts.py) includes specific guardrails according to the assignment requirements.
The agent will politely refuse to:
- Access or reveal its own system prompt.
- Modifying the system prompt directly.
- Discuss any of the forbidden topics:
- Cats or dogs
- Horoscopes or Zodiac Signs
- Taylor Swift
-
Although the semantic query dataset contains a substantial number of research papers, the data is from 2023. In the AI field, two years can feel like two centuries due to the rapid pace of innovation. More up-to-date datasets should be incorporated in future iterations.
-
The RSS feeds occasionally fail to respond. More reliable or additional news sources should be integrated.
-
Tests are not required for this assignment. In future updates, comprehensive unit tests should be provided to cover at least all tools.
-
Guardrails are currently implemented using system prompts only. More advanced safety mechanisms should be considered in future development, such as:
- Rule-based content filtering layers
- Moderation APIs
- Retrieval-time filtering
- Fine-tuned safety classifiers
- Role-based access control (RBAC)
- Output validation pipelines
Install uv if you don't have it:
curl -LsSf https://astral.sh/uv/install.sh | sh# Create virtual environment and install all dependencies from pyproject.toml
uv sync- Create a
.secretsfile in therootdirectory with yourOPENAI_API_KEY - Format:
OPENAI_API_KEY=your-key-here (no spaces around the =)
-
cd arida -
uv run src/data/01_download_data.pyand wait until the download is successful. This will placepapers.csvin yourdata/rawfolder.
-
uv run src/data/02_preprocess.pyand wait until the proccessing is successful. This will place processed data into yourdata/processedfolder.
-
uv run src/data/03_ingest_data.py. This will create thedata/vector/chroma_db/folder with the embedded papers.
uv run app.py
You should see:
Initialising agent...
Agent ready!
Starting Gradio interface...
=======================================================
👾 AIRADA is online. What's on your radar? 🔍
=======================================================
* Running on local URL: http://127.0.0.1:7860
Open your browser and navigate to http://127.0.0.1:7860 to interact with the chatbot.
