Skip to content

la-dev05/Reality-Web-Intelligence

Repository files navigation

macOS Rewebin Safari License

Reality Web Intelligence (RWI)

A Browser-Mediated, On-Device Intelligence Layer for the Open Web

Author: Lakshya Gupta · Reality Play

Open-source project — contributions welcome. See Contributing below.


Abstract

Modern web applications increasingly incorporate large language models (LLMs) for summarization, search, reasoning, personalization, and content generation. Today, most AI integration in web applications is server-dependent: user data is transmitted off-device for inference, developers pay per-request API costs, and latency and availability depend on external services. Meanwhile, alternative on-device approaches each leave a critical gap — per-origin model downloads, vendor lock-in, or no web-origin-scoped permission surface.

Reality Web Intelligence (RWI) is a browser-mediated architecture that exposes on-device LLM capabilities to websites through an explicit, origin-scoped permission model and a shared model runtime. This repository contains two working implementations:

  1. Rewebin — a custom macOS browser built on WebKit that integrates RWI natively
  2. RWI macOS App + Safari Web Extension — an extension-mediated implementation for Safari

Both expose the same JavaScript SDK (window.rwi) and demonstrate zero-cloud AI for web applications while preserving privacy, improving reliability, and lowering barriers to LLM-powered functionality.


Table of Contents


Introduction

Web applications increasingly rely on LLMs and AI services to perform tasks such as text summarization, classification, retrieval assistance, and generation. This capability has accelerated product development, but the prevailing architecture is server-dependent: inference runs on remote servers and web applications communicate via third-party APIs.

Meanwhile, modern consumer devices — especially Apple Silicon Macs — can now run mid-scale language models (on the order of a few gigabytes) locally. This creates an opportunity: what if the intelligence a website needs were already present on the user's device, and the browser could mediate access to it the same way it mediates access to the camera or location?


Landscape of Existing Approaches and Their Gaps

Several approaches to web AI exist today, spanning cloud-hosted services, in-browser inference, browser-vendor APIs, local runtimes, and platform-level intelligence. Each addresses part of the problem; none addresses all of it.

Cloud-Based Web AI

Most contemporary web applications integrate AI through cloud-hosted APIs. While this enables rapid deployment, it introduces four recurring limitations:

  1. Privacy and data ownership. Sensitive user content is transmitted to external servers.
  2. Cost and accessibility. API-based inference introduces recurring costs, rate limits, and barriers for small developers, students, and offline users.
  3. Latency and reliability. Quality of experience depends on network conditions and service uptime.
  4. Fragmented intelligence. Each website reimplements similar AI logic, leading to duplicated computation and inconsistent behavior.

In-Browser LLMs via WebGPU (WebLLM)

An alternative is to execute LLM inference directly inside the page using WebGPU (e.g., WebLLM). This avoids cloud dependency and keeps data on-device, but shifts model download, caching, sandboxed compute constraints, and performance variability to each site individually. Critically, Google's own web.dev documentation explicitly states that WebLLM's model cache "cannot be shared across origins, so another web app may have to download the same model again." If a user visits five websites that all want the same model, the model is downloaded and cached five separate times.

Browser-Provided AI Surfaces (Chrome Prompt API)

Browser vendors have begun exploring web APIs that expose on-device AI capabilities behind permissions and origin trials. Chrome's built-in Prompt API runs inference on-device using Gemini Nano and is now stable for Chrome extensions, though still in origin trial for the broader web platform as of early 2026. While this approach keeps data on-device, it introduces three significant constraints:

  1. Single-vendor, single-model lock-in. The API ships only a fixed vendor model (Gemini Nano) and is available only in Chrome and Edge — meaning developers targeting Safari, Firefox, or other browsers cannot use it.
  2. Model instability under vendor control. Because the browser vendor controls the model, it can be silently updated, swapped, or deprecated across browser versions. A website developer who has prompt-engineered for one model version may find their application behavior changes unexpectedly after a browser update.
  3. Browser-vendor participation required. The approach requires integration by each browser vendor; it cannot be adopted independently by developers or extended to other browsers.

By contrast, in RWI the model is packaged with the runtime and does not change once deployed, giving developers a stable, predictable inference target.

Local LLM Runtimes (Ollama, llamafile)

General-purpose local runtimes such as Ollama and llamafile simplify running models on a user's machine. However, they expose no web-origin-scoped permission surface for arbitrary websites: any local application (or any website that can reach the local HTTP endpoint) can access them without per-origin consent or browser-enforced isolation.

Platform On-Device Intelligence (Apple Intelligence)

Platform-level assistants and OS-provided intelligence features can deliver user-facing functionality but are not programmable as a web API for arbitrary websites. A web developer cannot call Apple Intelligence from their site; it is a system feature, not a web platform primitive.

Web AI and Low-Level ML APIs (WebNN)

Emerging web platform efforts such as WebNN expose low-level machine learning primitives and hardware acceleration to web developers. These approaches require applications to manage models, execution graphs, and lifecycle concerns directly, placing substantial engineering burden on each site.

The Collective Gap

Taken together, these approaches leave a clear gap. No existing system simultaneously provides:

  • A single model instance shared across origins (eliminating redundant downloads)
  • Explicit, per-origin, user-controlled permissions analogous to camera and location access
  • Zero cloud dependency by design
  • A deployment path that does not require browser-vendor participation

This is the gap that Reality Web Intelligence (RWI) addresses.

Structured Comparison

Dimension WebLLM (WebGPU) Chrome Prompt API Ollama / llamafile RWI (this work)
Inference location In-page (sandboxed tab) Browser process (in-process) Local daemon (separate process) Browser/ext. mediator (in-process in Rewebin)
Model shared across origins? No — each origin downloads & loads independently Yes — browser manages one shared model No — app-level, not web-origin-scoped Yes — single model shared across permitted origins
Developer installs model? Yes — each site manages download & caching No — vendor ships model (Gemini Nano only) User installs separately No — prepackaged in Rewebin / companion app
Model choice Open (developer selects) Fixed (Gemini Nano; vendor-controlled) Open (user selects) Open (prepackaged; extensible)
Origin-scoped permissions? No — runs inside untrusted page context Yes — Chrome permission prompt per origin No — all local apps have access Yes — explicit per-origin consent, browser-enforced
Browser vendor required? No — pure web APIs (WebGPU) Yes — Chrome/Edge only, origin trial No No for extension; Yes for native (Rewebin)
Works in Safari? Partial (WebGPU in tech preview) No N/A (native app) Yes — ext. targets Safari
Cross-origin prompt isolation? Partial (tab sandboxing) Not yet specified N/A Yes — per-(origin, sessionId) session store
Offline capable? Yes (after first download) Yes (after model download) Yes Yes (model prepackaged in Rewebin)
Backend runtime WebGPU + WASM Proprietary (Chrome-internal) llama.cpp / various llama.cpp (Metal-accelerated)

What RWI Enables for Web Developers

With RWI, a developer adds a few lines of JavaScript to their website and gains access to on-device LLM inference — without bundling a model, without an API key, without per-request cloud costs, and without slowing down page load. The intelligence is already on the user's device; the browser mediates access.

// Check availability
const available = await rwi.isAvailable();

// Request permission (user sees a browser-native prompt)
const { granted } = await rwi.requestPermission();

// Generate text with streaming
const result = await rwi.generate({
  prompt: "What's RWI?",
  onToken: (token) => updateUI(token)
});

The website itself loads as fast as any normal site — no multi-gigabyte model download on first visit. Every site that uses RWI shares the same model already present on the device. The user grants permission once per origin (like camera or location access), and all inference happens locally: no data leaves the device, no cloud bill accumulates, and the feature works offline.

As on-device models improve and consumer hardware becomes more capable, this architecture becomes more powerful without any changes to the websites that use it.


Design Goals

Given the gaps identified above, RWI is designed around five core requirements:

  1. On-device by default: inference runs locally, without sending user content to the network.
  2. Shared across websites: the intelligence layer is installed once and reused, not duplicated per site.
  3. Browser-mediated access: websites cannot access the model directly; the browser enforces isolation.
  4. Explicit permissions: access is origin-scoped and user-controlled.
  5. No new browser required: the approach must fit within existing browser extension and OS app ecosystems (demonstrated via the Safari extension), while also showing what native integration looks like (Rewebin).

System Overview

Concept

RWI can be realized through different integration points in the browser stack. This project describes two approaches:

  1. Extension-mediated RWI (Safari): a macOS container app hosts the on-device LLM runtime, while a Safari Web Extension injects the web SDK and securely mediates origin-scoped requests to the app.
  2. Native RWI (Rewebin browser): a custom WebKit-based browser integrates the RWI runtime directly into the browser process and exposes the same origin-scoped web API without requiring an extension bridge or external companion process.

The two designs share the same core idea — websites request intelligence through a permissioned API — but differ in where trust and execution live: in the Safari implementation, the trusted runtime is a separate app; in Rewebin, the trusted runtime is the browser itself.

Websites do not run AI models — the browser does. Websites request intelligence — the user grants access.

Components

  1. Rewebin browser (WebKit, native RWI): a custom macOS browser built with a SwiftUI shell and a WebKit rendering layer (WKWebView). Rewebin injects the RWI web SDK (window.rwi) at document start using WKUserScript and mediates calls via WKScriptMessageHandler. LLM inference runs in-process via a native Swift service layer that integrates llama.cpp as an XCFramework.

  2. RWI macOS app + Safari Web Extension (extension-mediated RWI): the container app hosts and manages the local LLM runtime (integrating llama.cpp via an XCFramework), while the Safari Web Extension injects the RWI SDK, bridges communication, and manages per-origin permissions. The app must be running for inference requests to succeed.

  3. Web JavaScript SDK (rwiSDK.js): a developer-facing API exposed as window.rwi; the same surface is implemented in both Rewebin and the Safari extension.

  4. Demo web application (Next.js): a chat-style UI showcasing streaming inference and connectivity monitoring. The demo requires no server-side component for inference — it functions fully offline once the page is loaded and the local runtime is available. Reference testing was conducted on an Apple M1 MacBook Pro (8 GB unified memory) using the LFM2-2.6B model (Liquid AI) quantized to Q4_K_M GGUF format (~1.5 GB memory footprint), with inference accelerated via llama.cpp's Metal backend.

  5. RWI Analyzer (Rewebin): a browser-native feature that uses the on-device LLM to infer a website's intent — what the page is trying to do — from structural and behavioral signals, rather than relying on visible text. See the RWI Analyzer section below for details.


Architecture

High-Level Structure

RWI separates concerns into a web-facing SDK and a mediator that hosts the on-device runtime. The mediator acts as a secure intermediary between untrusted websites and the trusted inference runtime.

Communication and Execution Flow

RWI can mediate requests through an extension bridge (Safari) or directly inside the browser process (Rewebin). Both present the same web-facing API but differ in execution and message routing.

The distinction between Rewebin and the Safari extension is architectural, not primarily about inference speed. In both implementations, the same underlying llama.cpp runtime and LFM2-2.6B model are used. The key difference is the mediation path: in Safari, the extension bridges the web page to a companion app via inter-process communication, while in Rewebin, the browser is the trusted runtime and calls are handled in-process via WKScriptMessageHandler. This eliminates IPC overhead and simplifies the trust boundary, but the dominant cost in both systems is token generation itself, not the messaging layer.

Safari Extension-Mediated Flow

Website  ──►  Content.js  ──►  Background.js  ──►  Extension Handler  ──►  RWI App (LLM)
   ◄────────────────────────────────────────────────────────────────────────────┘
                                (tokens streamed back)
  1. A website calls rwi.generate(...) via the injected SDK.
  2. The page communicates with the content script using postMessage.
  3. The content script forwards the request to the background service worker.
  4. The background forwards to the native extension handler.
  5. The native handler relays to the RWI macOS app via DistributedNotificationCenter.
  6. The app performs on-device inference and streams tokens back through the chain.

Rewebin Native (In-Process) Flow

Website  ──►  WKScriptMessageHandler  ──►  LlamaService (in-process)
   ◄────────────────────────────────────────────┘
              (tokens streamed back)
  1. At document start, the browser injects the RWI SDK (window.rwi) via WKUserScript.
  2. A website calls window.rwi.generate(...).
  3. The SDK forwards the request via webkit.messageHandlers.rwi.postMessage(...).
  4. The RWIScriptMessageHandler validates the caller's origin and checks permissions.
  5. If allowed, the handler calls LlamaService which runs generation via llama.cpp (Metal-accelerated).
  6. Tokens are streamed back to the page via polling or JavaScript evaluation.

Implementation Comparison

Aspect Rewebin (WebKit Browser) RWI Safari Extension
Type Custom browser app Safari extension + container app
Engine WebKit (WKWebView) Safari (WebKit)
SDK Injection Native WKUserScript at document start Content script injection
Message Handling Direct WKScriptMessageHandler Multi-hop: Content.js → Background.js → Native Handler → IPC
LLM Execution In-process (zero-latency) Cross-process via DistributedNotificationCenter
External Dependencies None — self-contained Requires RWI App to be running

Web API

Developer-Facing Interface

RWI exposes a JavaScript API as window.rwi. The current implementation supports:

  • Availability checks to detect whether the extension/runtime is installed
  • Permission prompts for explicit user approval
  • Text generation with streaming token callbacks

isAvailable(): Promise<boolean>

if (await rwi.isAvailable()) {
  console.log('RWI is available!');
}

getStatus(): Promise<RWIStatus>

const status = await rwi.getStatus();
// { available: true, modelLoaded: true, modelLoading: false, version: "1.0.0" }

requestPermission(options?): Promise<{granted: boolean}>

const { granted } = await rwi.requestPermission({
  task: 'summarize articles'
});

generate(options): Promise<GenerateResult>

const result = await rwi.generate({
  prompt: 'Explain quantum computing in simple terms',
  maxTokens: 512,
  temperature: 0.7,
  onToken: (token) => { /* streaming callback */ }
});

cancel(): Promise<void>

await rwi.cancel();

getModelInfo(): Promise<ModelInfo>

const info = await rwi.getModelInfo();
// { modelName: "LFM2-2.6B", quantization: "Q4_K_M", info: "..." }

API Design Considerations

Beyond the current API surface, a production-grade API must address practical concerns:

  • Capability negotiation: request a capability class (e.g., generate, summarize, classify) and receive a clear supported/unsupported response.
  • Model selection: allow requesting a user-approved profile (e.g., "small/fast" vs. "larger/accurate") rather than exposing raw model files to pages.
  • Structured errors: return machine-readable error codes (permission denied, runtime unavailable, quota exceeded, input too large).
  • Quotas and rate limiting: enforce per-origin budgets and expose backpressure signals.
  • Streaming and cancellation: define time-to-first-token expectations, max token limits, and explicit cancellation.

Security and Privacy Considerations

Threat Model

RWI assumes websites are untrusted and may attempt to:

  • Exfiltrate sensitive user data
  • Prompt the model to reveal private information from previous interactions
  • Abuse compute resources by issuing high-volume requests
  • Impersonate other origins

Mitigations by Design

  1. No direct model access: websites never receive handles to model weights, files, or raw runtime interfaces.
  2. Origin isolation: permissions are scoped per origin; requests are labeled and validated.
  3. Origin-scoped context: inference sessions and prompt context are scoped per requesting origin, preventing cross-site prompt leakage by construction.
  4. Explicit consent: all compute is user-visible and permission-gated.
  5. Local-only processing: content remains on-device by architecture (no cloud inference).

Security Analysis: Required Mechanisms

  • Cross-origin prompt leakage and injection. "Origin-scoped context" is enforced by maintaining a per-origin session store keyed by (origin, sessionId), with explicit session creation/deletion and a safe default of no retention unless the user opts in.
  • Timing and resource side-channels. Mitigations include per-origin queues, strict rate limits, coarse-grained scheduling, and optional jitter insertion.
  • DistributedNotificationCenter broadcast risk. A production design should use authenticated point-to-point IPC (e.g., XPC) with code-signing checks.
  • Origin authentication. The mediator derives origin from trusted browser state and does not accept caller-provided origin strings.

Open Issues

A production-grade implementation requires careful policy design, including rate limiting, robust identity/origin verification, prompt/data retention controls, and transparency controls (e.g., a local audit log of which origins requested which tasks).


Implementation Details

Rewebin Browser (Native RWI)

Rewebin is a custom macOS browser that implements RWI natively. It is built as a single, self-contained application: a SwiftUI user interface shell hosts a WebKit rendering layer (WKWebView), and the on-device LLM runtime is integrated in-process.

Note: Rewebin is not designed to be a production browser. It is an intentionally minimal browser developed to demonstrate how browsers can incorporate RWI natively.

Browser integration surface:

  • SDK injection at document start: the browser injects the RWI SDK into each page using WKUserScript, ensuring window.rwi is available before application scripts run.
  • Privileged call mediation: JavaScript calls are routed to native code via WKScriptMessageHandler. This creates a browser-controlled choke point where policy can be enforced.
  • Origin-scoped permissions: the handler computes the requesting origin from the page context and consults a persistent permission store.

Inference runtime placement:

  • A browser-level service (LlamaService) owns model lifecycle (load/unload), request scheduling, and cancellation.
  • The service delegates to an isolated inference context (a Swift actor wrapping a llama.cpp context) to keep generation thread-safe.
  • The native inference backend is integrated as a llama.cpp XCFramework, enabling Metal acceleration.

Streaming design:

  1. Create session: a generation request allocates a session identifier scoped to the requesting tab and origin.
  2. Generate and buffer: as tokens are produced, they are appended to a per-session buffer.
  3. Deliver incrementally: the page polls for new tokens or receives callbacks via browser-initiated JavaScript evaluation.
  4. Finalize and clean up: on completion or cancellation, the browser marks the session complete and releases buffers.

App-Hosted Runtime (Safari Extension)

The on-device runtime is hosted by a companion RWI macOS container application. The app loads and manages a local LLM using llama.cpp via an XCFramework for performance and portability.

The runtime runs a quantized mid-scale model (LFM2-2.6B in Q4_K_M GGUF format). The Safari extension does not run inference itself; requests are relayed from the extension's native handler to the app via DistributedNotificationCenter, and tokens are streamed back through the extension messaging pipeline.

Safari Extension

The Safari Web Extension is implemented using Manifest V3 concepts. The extension:

  • Injects the RWI SDK into pages
  • Bridges page-to-extension messaging
  • Enforces per-origin permissions and policy
  • Routes generation requests to the running RWI app via the native handler and cross-process IPC

RWI Analyzer

RWI Analyzer is a browser-native intelligence feature in Rewebin that uses the on-device LLM to understand and explain the purpose of any website — even when the site contains little or no readable text.

Instead of asking "What does this page say?", it asks "What is this page trying to do?"

How It Works

RWI Analyzer runs a multi-stage pipeline entirely on-device:

  1. Signal Extraction — JavaScript is injected into the page to collect structural, behavioral, and metadata signals including page structure (forms, inputs, navigation landmarks), metadata (title, Open Graph), client-side behavior (presence of fetch, WebSocket, OAuth, analytics, payment patterns), ARIA roles, and external script sources.

  2. Prompt Construction — The extracted signals are structured into a compact prompt (~3000 characters) that provides the LLM with a behavioral fingerprint of the page.

  3. LLM Analysis — The prompt is sent to the on-device LLM, which generates a concise analysis describing the website's purpose and notable behavioral patterns. Tokens are streamed in real-time to the UI.

  4. Caching — Results are cached per-URL with a 24-hour TTL, so revisiting a site loads the analysis instantly.

Potential Applications

The same approach of extracting behavioral signals and reasoning over them with an on-device LLM could power:

  • Phishing & scam detection — identifying pages that mimic login forms of legitimate services
  • Accessibility auditing — assessing ARIA roles, semantic structure, and form labels
  • Privacy & tracking transparency — detecting analytics scripts and summarizing data collection
  • Browser-level site summaries — AI-generated descriptions alongside search results or tab tooltips
  • Parental controls — instant descriptions of what a website does

Validation

The following capabilities have been verified end-to-end:

  • End-to-end text generation: the RWI JavaScript SDK (window.rwi.generate()) successfully invoked llama.cpp inference and streamed tokens to the Next.js demo application in both Rewebin (in-process) and the Safari extension (IPC-mediated) configurations.
  • Origin-scoped permission enforcement: access requests from unpermitted origins were rejected before reaching the inference layer.
  • Consumer hardware verification: verified on Apple M1 MacBook Pro, 8 GB unified memory, running the LFM2-2.6B model at Q4_K_M quantization (~1.5 GB memory footprint).
  • Offline operation: both implementations produced responses with the network interface disabled, confirming zero-cloud architecture.
  • RWI Analyzer: the browser-native site analysis feature successfully extracted DOM/ARIA signals, constructed a compact prompt, and streamed an intent inference result.

Limitations and Scope

The current implementation intentionally limits scope:

  • macOS-only (initially): The current implementation targets macOS exclusively because building a custom browser on Chromium requires substantial compute resources that were not available during this phase. WebKit (WKWebView) on macOS offered a pragmatic path to a functional custom browser runtime. Cross-browser portability is a future goal, not a fundamental architectural limitation.
  • Safari-only for the extension implementation (initially)
  • Mid-scale models only (not frontier models)
  • Explicit user consent required
  • Performance varies by hardware

These constraints are design choices aligned with feasibility and privacy goals.


Future Work

Key directions include:

  • Generalizing the permission model to richer "task" primitives (summarize, classify, extract, etc.)
  • Adding policy controls (rate limiting, quotas, and background execution constraints)
  • Improving isolation boundaries and verifiable origin attestation
  • Cross-browser portability (where extension ecosystems permit)
  • Standardized APIs that could evolve into a web platform feature

Conclusion

Reality Web Intelligence proposes a new architectural role for the browser: not merely a renderer of content, but a trusted mediator for local intelligence. By moving inference onto the user's device and mediating access through explicit origin-scoped permissions, RWI reduces privacy risks, removes recurring API costs, improves reliability, and enables offline-capable AI web applications.


🚀 Getting Started

Prerequisites

  • macOS 14.0+ (Sonoma or later)
  • Xcode 15.0+
  • Node.js 18+ and npm (for the demo web app)
  • Safari 17.0+ (only for the Safari extension)

Step 1: Clone the Repository

git clone https://github.com/la-dev05/Reality-Web-Intelligence.git
cd Reality-Web-Intelligence

Step 2: Download the LLM Model

The LFM2-2.6B model file is too large for Git. Download it separately:

  1. Download LFM2-2.6B-Q4_K_M.gguf (or similar GGUF model)
  2. Place it in:
    • For Rewebin: Rewebin/Rewebin/LLM/LFM2-2.6B-Q4_K_M.gguf
    • For RWI Extension: RWI/RWI/LLM/LFM2-2.6B-Q4_K_M.gguf

Note: The model should be named exactly LFM2-2.6B-Q4_K_M.gguf or you'll need to update the code.

Option A: Run Rewebin Browser (Recommended)

  1. Open Rewebin/Rewebin.xcodeproj in Xcode
  2. Select the Rewebin scheme and your Mac as the target
  3. Press ⌘ + R to build and run

The LLM model auto-loads on startup (~10-30 seconds). Navigate to any website that uses window.rwi — no extensions or separate apps needed.

Option B: Run RWI Safari Extension

  1. Open RWI/RWI.xcodeproj in Xcode
  2. Build and run (⌘ + R)
  3. Enable the extension in SafariSettingsExtensions"Reality Web Intelligence"
  4. Grant "Allow on All Websites" permission
  5. Keep the RWI App running (it hosts the LLM)

Step 3: Try the Demo

Navigate to the demo (in Rewebin or Safari with the extension enabled):

🌐 https://rwi-web-test.vercel.app

Or run it locally:

cd RWI-Web-Test
npm install
npm run dev
# Open http://localhost:3000 in Rewebin or Safari

📁 Project Structure

RWI/
├── README.md                    # This file (research paper + getting started)
├── LICENSE                      # MIT License
├── .gitignore
│
├── Rewebin/                     # Custom WebKit Browser (Primary Implementation)
│   ├── Rewebin.xcodeproj/
│   ├── llama.xcframework/       # llama.cpp compiled framework
│   └── Rewebin/
│       ├── RewebinApp.swift     # App entry, menu commands, BrowserState
│       ├── Browser/             # Browser UI (SwiftUI)
│       │   ├── BrowserWindow.swift, Tab.swift, StartPageView.swift
│       │   ├── BookmarksListView.swift, HistoryListView.swift, DownloadsListView.swift
│       ├── RWI/                 # RWI Web API Integration
│       │   ├── RWIScriptMessageHandler.swift   # WKScriptMessageHandler for window.rwi
│       │   ├── RWIUserScript.swift             # SDK injection (embedded JS)
│       │   ├── RWIAnalyzerService.swift        # Signal extraction + LLM analysis
│       │   ├── RWIAnalysisView.swift           # Analysis UI
│       │   └── RWIAnalysisCache.swift          # 24h TTL cache
│       ├── Services/            # LLM Runtime
│       │   ├── LlamaService.swift              # High-level LLM service
│       │   └── LibLlama.swift                  # llama.cpp Swift bindings
│       ├── Permissions/
│       │   └── PermissionManager.swift
│       ├── Data/                # BookmarkManager, HistoryManager, DownloadManager
│       └── LLM/
│           └── LFM2-2.6B-Q4_K_M.gguf          # Quantized model (not in Git)
│
├── RWI/                         # Safari Extension Implementation
│   ├── RWI.xcodeproj/
│   ├── RWI/                     # Container App (LLM Host)
│   │   ├── RWIApp.swift, Services/, LLM/, Views/
│   │   └── llama.xcframework/
│   └── RWI Extension/           # Safari Web Extension
│       ├── SafariWebExtensionHandler.swift
│       └── Resources/           # manifest.json, background.js, content.js, rwiSDK.js
│
├── RWI-Web-Test/                # Demo Next.js Web App
│   ├── app/                     # page.tsx (Chat UI), layout.tsx, globals.css
│   └── package.json
│
├── RWI Analyzer.md              # RWI Analyzer documentation
└── Research paper.tex           # Full research paper (LaTeX)

Model Specifications

Property Value
Model LFM2-2.6B (Liquid Foundation Model)
Model Size ~1.5 GB
Parameters 2.6B
Quantization Q4_K_M
Context Length 2048 tokens
Inference CPU + Metal (Apple Silicon)

Technology Stack

Component Technology
Rewebin Browser SwiftUI, AppKit, WebKit (WKWebView)
SDK Injection WKUserScript (JavaScript)
Message Bridge WKScriptMessageHandler
LLM Inference llama.cpp via XCFramework, Metal
Data Persistence UserDefaults (JSON-encoded)
Demo Web App Next.js, TypeScript, React

🛠️ Development

Building from Source

# Clone the repository
git clone https://github.com/la-dev05/Reality-Web-Intelligence.git

# Rewebin Browser:
open Rewebin/Rewebin.xcodeproj
# Build and run (⌘ + R)

# RWI Safari Extension:
open RWI/RWI.xcodeproj
# Build and run (⌘ + R)

Debugging

Rewebin: View logs in Xcode console. Filter by [RWI Handler], [RWI UserScript], [LlamaService], [Rewebin].

Safari Extension: Safari → Develop → Web Extension Background Pages → RWI. Content script debugging via Web Inspector.

Common Issues

Issue Solution
Model not loading Verify LFM2-2.6B-Q4_K_M.gguf exists in the LLM/ directory
"RWI not available" Ensure the model has finished loading (check toolbar indicator in Rewebin)
Slow first response Normal — model loads on first launch (~10-30s depending on hardware)
Safari extension not visible Restart Safari, check Extensions in Settings
Safari: "RWI not available" Ensure RWI App is running + extension has "Allow on All Websites"

🎥 Demo

A demo video is included in the repository: RWI Prototype Demo.mov


🤝 Contributing

Contributions are welcome! This is an open-source project and we encourage community involvement.

  • Bug reports — open an issue describing the problem
  • Feature requests — open an issue with a description of the proposed feature
  • Pull requests — fork the repo, make your changes, and submit a PR

Acknowledgments

This work describes an implementation built as part of the Reality Web Intelligence (RWI) open-source project at Reality Play.

References

  1. W3C, "Permissions," W3C Recommendation. https://www.w3.org/TR/permissions/
  2. G. Gerganov et al., "llama.cpp," GitHub. https://github.com/ggerganov/llama.cpp
  3. Apple Inc., "Safari Web Extensions," Apple Developer Documentation. https://developer.apple.com/documentation/safariservices/safari_web_extensions
  4. Google, "Manifest V3," Chrome Extensions Documentation. https://developer.chrome.com/docs/extensions/develop/migrate/what-is-mv3
  5. W3C, "Web Neural Network API," W3C Working Draft. https://www.w3.org/TR/webnn/
  6. W3C, "WebGPU," W3C Working Draft. https://www.w3.org/TR/webgpu/
  7. Apple Inc., "WKWebView," Apple Developer Documentation. https://developer.apple.com/documentation/webkit/wkwebview
  8. Ollama, "Ollama," GitHub. https://github.com/ollama/ollama
  9. Mozilla, "llamafile," GitHub. https://github.com/Mozilla-Ocho/llamafile
  10. MLC AI, "WebLLM," GitHub. https://github.com/mlc-ai/web-llm
  11. web.dev, "Build a local and offline-capable chatbot with WebLLM," Google, Jan. 2025. https://web.dev/articles/ai-chatbot-webllm
  12. MLC AI, "WebLLM: A High-Performance In-Browser LLM Inference Engine," arXiv:2412.15803, Dec. 2024.
  13. Google Chrome for Developers, "Built-in AI / Prompt API," 2025. https://developer.chrome.com/docs/ai/built-in
  14. Liquid AI, "LFM2: Liquid Foundation Models," 2025. https://www.liquid.ai

📜 License

MIT License — see LICENSE for details.

© 2026 Lakshya Gupta. Reality Play.

Reality Web Intelligence — An Open-Source Project by Reality Play

About

Reality Web Intelligence — a browser-native intelligence framework by Reality Play.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors