Skip to content

cenktekin/newscrux

 
 

Repository files navigation

📡 Newscrux

AI-powered news aggregator with structured multilingual summaries and push notifications

Node.js TypeScript License: MIT Languages ESM


What It Does

Newscrux monitors 13 AI/ML RSS feeds, filters articles by relevance using AI, extracts full article content when needed, generates structured summaries in your chosen language, and delivers them as rich push notifications to your phone via Pushover or displays them in the terminal (--no-push).

Every notification tells you what happened, why it matters, and one key detail — in English, Turkish, German, French, or Spanish.

You can choose between OpenRouter (cloud-based AI) or Ollama (local AI) for all AI operations, ensuring privacy and cost control when using local models.


Notification Examples

English (--lang=en):

Title: OpenAI announces enterprise agent toolkit

📰 TechCrunch AI

What happened: OpenAI released a new suite of tools for building
enterprise-grade autonomous agents, including improved function
calling, a persistent memory API, and a new orchestration layer.

Why it matters: This could significantly accelerate agent-based
automation in large organizations by reducing integration complexity.

💡 Initial access is being rolled out to select enterprise customers.

Turkish (--lang=tr):

Title: AGI'ye doğru ilerlemeyi ölçmek: Bilişsel bir çerçeve

📰 Google DeepMind

Ne oldu: Google DeepMind, yapay genel zeka (AGI) yolunda ilerlemeyi
değerlendirmek için bilişsel bilim temelli bir çerçeve yayınladı.
10 temel bilişsel yeteneği tanımlıyor ve AI sistemlerinin yeteneklerini
sınıflandırmaya yönelik bir taksonomi sunuyor.

Neden önemli: Bu çerçeve, AI sistemlerinin genel zeka yeteneklerini
bilişsel perspektiften değerlendirmek için ortak bir temel sağlayabilir.

💡 200.000 dolar ödüllü Kaggle hackathonu başlatıldı.

  • 🤖 Flexible AI providers — Use OpenRouter (cloud) or Ollama (local) for AI operations
  • 🧠 Structured summaries — What happened + Why it matters + Key detail, generated by AI
  • 📰 13 RSS sources — OpenAI, Google AI, DeepMind, TechCrunch, arXiv, and more
  • 🔍 AI relevance filtering — Only delivers news that matters; irrelevant articles are dropped before summarization
  • 📄 Hybrid content extraction — RSS snippet first, full-text scraping (via cheerio) when snippet is too short
  • Article state pipelinediscovered → enriched → summarized → sent with persistence
  • 🔒 No data loss — Atomic queue writes, retry on transient failure, articles survive restarts
  • 📊 Operational metrics — Per-cycle stats logged (discovered, enriched, sent, failed, truncated)
  • 🏷️ Feed typing — Official blogs (official_blog) bypass relevance filter automatically
  • 🔁 Cross-source deduplication — Title similarity check prevents same story from multiple sources
  • 🖥️ Terminal-only mode — Use --no-push to skip notifications and display results in console

Quick Start

# Clone the repository
git clone https://github.com/alicankiraz1/newscrux.git
cd newscrux
npm install
cp .env.example .env        # Edit with your API keys (optional for --no-push)
npm run build

# Option 1: Run with OpenRouter (requires API key)
npm start -- --lang=en      # or: tr, de, fr, es

# Option 2: Run with Ollama (local, requires Ollama installed)
ollama pull deepseek-qwen-8b:latest
ollama serve
npm start -- -p ollama --lang=tr

# Option 3: Run once, show in terminal (no Pushover needed)
npm start -- --no-push -p ollama --lang=tr

Prerequisites:


Architecture

RSS Feeds (13 sources)
        │
        ▼
  Fetch + Parse
        │
        ▼
  Cross-source Dedup (title similarity)
        │
        ▼
  Discover → Queue (persistent JSON)
        │
        ├─ high priority (official_blog) ────────────────────┐
        │                                                     │
        ▼                                                     │
  Relevance Filter (AI scores 1-10)                          │
  Drop below threshold                                        │
        │                                                     │
        └─────────────────────────────────────────────────── ▼
                                                   Enrich (snippet or scrape)
                                                             │
                                                             ▼
                                                   Summarize (DeepSeek JSON)
                                                             │
                                                             ▼
                                                   Render Notification
                                                   (HTML, smart truncation)
                                                             │
                                                             ▼
                                                   Send via Pushover
                                                             │
                                                             ▼
                                                   Mark Sent in Queue

Supported Languages

Code Language Notification labels
en English "What happened:" / "Why it matters:" / "Read More"
tr Turkish "Ne oldu:" / "Neden önemli:" / "Devamını Oku"
de German "Was passiert ist:" / "Warum es wichtig ist:" / "Weiterlesen"
fr French "Ce qui s'est passé :" / "Pourquoi c'est important :" / "Lire la suite"
es Spanish "Qué pasó:" / "Por qué importa:" / "Leer más"

Each language pack includes a full AI system prompt in that language, feed kind labels, and all notification UI strings. The AI model produces translated_title, what_happened, why_it_matters, and key_detail in the selected language.


Configuration

CLI Options

Flag Description Default
--lang <code>, -l <code> Summary language: en, tr, de, fr, es en
--provider <type>, -p <type> AI provider: openrouter, ollama openrouter
--no-push Skip Pushover notifications, show results in terminal only

| --help, -h | Show help message and exit | — | | --version, -v | Show version number and exit | — |

newscrux --lang=tr      # Start with Turkish summaries
newscrux -l de          # Start with German summaries
newscrux -p ollama      # Use local Ollama model
newscrux --no-push      # Run once, show results in terminal only
newscrux                # Start with English summaries (default)

newscrux -l de # Start with German summaries newscrux # Start with English summaries (default)


### Environment Variables (`.env`)

| Variable | Required | Default | Description |
|----------|----------|---------|-------------|
| `OPENROUTER_API_KEY` | Yes | — | OpenRouter API key |
| `PUSHOVER_USER_KEY` | Yes | — | Pushover user key |
| `AI_PROVIDER` | No | `openrouter` | AI provider: `openrouter`, `ollama` |
| `OLLAMA_BASE_URL` | No | `http://localhost:11434/v1` | Ollama API URL |
| `OLLAMA_MODEL` | No | `deepseek-qwen-8b:latest` | Ollama model name |
| `OLLAMA_TEMPERATURE` | No | `0.2` | Ollama sampling temperature |
| `OLLAMA_THINK` | No | `false` | Enable/disable reasoning-heavy output |
| `OLLAMA_SUMMARY_MAX_TOKENS` | No | `260` | Max tokens for summary generation |
| `OLLAMA_RELEVANCE_MAX_TOKENS` | No | `1200` | Max tokens for relevance scoring output |
| `OLLAMA_TIMEOUT_MS` | No | `45000` | Ollama request timeout in milliseconds |
| `OLLAMA_MAX_RETRIES` | No | `2` | Retry count for failed/empty Ollama responses |

| `PUSHOVER_APP_TOKEN` | Yes | — | Pushover app token |
| `OPENROUTER_MODEL` | No | `deepseek/deepseek-v3.2-speciale` | AI model for summarization |
| `POLL_INTERVAL_MINUTES` | No | `15` | Minutes between feed polls |
| `MAX_ARTICLES_PER_POLL` | No | `10` | Max regular articles processed per cycle |
| `ARXIV_MAX_PER_POLL` | No | `15` | Max arXiv papers processed per cycle |
| `ENRICH_CONCURRENCY` | No | `4` | Parallel workers for enrichment stage |
| `SUMMARIZE_CONCURRENCY` | No | `2` | Parallel workers for summarization stage |
| `SEND_CONCURRENCY` | No | `3` | Parallel workers for send stage |
| `SUMMARIZE_DELAY_MS` | No | `0` | Delay after each summary (ms) |
| `SEND_DELAY_MS` | No | `0` | Delay after each send (ms) |
| `SNIPPET_MIN_LENGTH` | No | `300` | Skip scraping when snippet length is at least this many chars |
| `ENRICHED_CONTENT_MAX_LENGTH` | No | `3000` | Max content chars passed to summarizer |
| `SCRAPING_ENABLED` | No | `true` | Enable/disable full-page scraping fallback |
| `SCRAPING_TIMEOUT_MS` | No | `10000` | Scraping request timeout in ms |
| `SCRAPING_DOMAIN_DELAY_MS` | No | `2000` | Delay between requests to same domain in ms |
| `RELEVANCE_THRESHOLD` | No | `6` | Minimum AI relevance score (1–10) |
| `RELEVANCE_BATCH_SIZE` | No | `100` | Max discovered entries scored by relevance per cycle |
| `LOG_LEVEL` | No | `info` | Log verbosity: `debug`, `info`, `warn`, `error` |

---

## RSS Sources

| Source | Type | Priority |
|--------|------|----------|
| OpenAI News | `official_blog` | high (bypasses filter) |
| Google AI Blog | `official_blog` | high (bypasses filter) |
| Google DeepMind | `official_blog` | high (bypasses filter) |
| Hugging Face Blog | `official_blog` | normal |
| TechCrunch AI | `media` | normal |
| MIT Technology Review AI | `media` | normal |
| The Verge AI | `media` | normal |
| Ars Technica | `media` | normal |
| arXiv cs.CL | `research` | normal |
| arXiv cs.LG | `research` | normal |
| arXiv cs.AI | `research` | normal |
| Import AI | `newsletter` | normal |
| Ahead of AI | `newsletter` | normal |

To add or remove feeds, edit the `feeds` array in `src/config.ts`.

---

## Deployment

### Raspberry Pi / Linux server (systemd)

```bash
# 1. Clone and build
git clone https://github.com/alicankiraz1/newscrux.git ~/newscrux
cd ~/newscrux
npm install
cp .env.example .env
nano .env                                       # fill in your API keys
npm run build

# 2. Install and configure service
cp newscrux.service ~/.config/systemd/user/
nano ~/.config/systemd/user/newscrux.service    # adjust --lang flag if needed

# 3. Enable and start (user-level systemd)
systemctl --user daemon-reload
systemctl --user enable newscrux
systemctl --user start newscrux

# 4. View live logs
journalctl --user -u newscrux -f

Note: The service file uses %h (systemd home directory specifier) so paths are automatically resolved to your home directory. No root access needed.


How It Works

  1. Fetch — Polls all 13 RSS feeds every 15 minutes (configurable) using rss-parser
  2. Deduplicate — Cross-source title similarity check prevents the same story from appearing twice
  3. Discover — New articles are added to a persistent JSON queue (data/article-queue.json) with state discovered
  4. Filter — AI scores each article's relevance 1–10; articles below the threshold are dropped before any summarization cost is incurred. High-priority (official_blog) sources bypass this step entirely.
  5. Enrich — Checks RSS snippet length; if shorter than 300 characters, scrapes the full article using cheerio. Content is capped at 3,000 characters for the summarizer.
  6. Summarize — Sends article content to the configured AI provider (OpenRouter or Ollama) with a structured JSON prompt in the selected language. Output: translated_title, what_happened, why_it_matters, key_detail, source_type.
  7. Render — If --no-push is not set, builds the Pushover notification message with HTML formatting and smart truncation to stay within the 1,024-character limit. Otherwise, displays results in the terminal.
  8. Send — POSTs the notification to the Pushover API (or skips if --no-push is set). The article is only marked sent after a confirmed successful delivery (or after terminal display).
  9. Retry — Articles that fail enrichment, summarization, or sending remain in the queue as failed and are retried on the next cycle.

Contributing

See CONTRIBUTING.md for how to add languages, submit fixes, or suggest features.


Author

Alican Kiraz

LinkedIn X Medium HuggingFace GitHub


License

MIT — see LICENSE

About

AI-powered news aggregator with structured multilingual summaries and push notifications

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • TypeScript 70.2%
  • JavaScript 22.6%
  • HTML 6.3%
  • Shell 0.9%