Inspiration

Large Language Models are increasingly used to answer questions about the world, yet they remain fundamentally out of date and rely heavily on web scraping to access fresh information. This creates two major problems: unreliable answers for users, and lost revenue for journalists and news organizations whose content is consumed without attribution or compensation.

Recent industry reports and analyses estimate that news media outlets are losing billions of dollars in advertising revenue as AI systems summarize and surface their content directly, reducing traffic to original sources. Some estimates place these losses at over $2B, highlighting a growing economic imbalance between AI platforms and journalism.

MCPress was inspired by this growing tension between AI systems and news media. We wanted to explore a better alternative—one where LLMs can access up-to-date, trusted news legally, while journalists and publishers are fairly compensated.


What MCPress Does

MCPress is a licensed, queryable API that gives LLMs access to up-to-date, journalist-written news without scraping.

Journalists or news organizations can publish their articles to MCPress, where the content is:

  • cleaned and structured,
  • summarized and categorized,
  • embedded for semantic search,
  • and made accessible to AI agents through a single API (via MCP).

LLMs and AI agents can then query MCPress to retrieve fresh, trusted information—while usage is tracked so publishers can be remunerated per access.


How We Built It

The project is intentionally minimal and focused on core functionality:

  • Frontend: A lightweight Next.js interface where journalists submit article URLs and review extracted content.
  • Backend: A FastAPI service that:
    • fetches articles using Jina Reader,
    • extracts structured fields (title, author, date, body),
    • generates summaries, keywords, and categories using LLMs,
    • stores content and embeddings in Supabase.
  • Database: Supabase with pgvector for efficient semantic search.
  • MCP Server: An MCP-compatible API that allows AI agents to search and retrieve articles programmatically.
  • Demo: A simple AI chat demo (separate app) showing how an agent can query live news via MCPress.

Challenges We Faced

  • Noisy web content: News pages contain a lot of irrelevant information (navigation, ads, related links). Extracting only the meaningful article body required careful filtering.
  • Keeping the scope small: Building ingestion, search, and agent access in a short hackathon required strict prioritization.
  • Balancing simplicity and realism: We focused on a realistic pipeline while keeping the UI and architecture intentionally minimal.

Why It Matters

MCPress proposes a shift away from scraping toward a licensed, trust-based information layer for AI. It allows LLMs to stay current and reliable, while giving journalists and newsrooms a new, sustainable way to distribute their work in the age of AI.

Built With

Share this project:

Updates