Author: Lakshya Gupta · Reality Play
Open-source project — contributions welcome. See Contributing below.
Modern web applications increasingly incorporate large language models (LLMs) for summarization, search, reasoning, personalization, and content generation. Today, most AI integration in web applications is server-dependent: user data is transmitted off-device for inference, developers pay per-request API costs, and latency and availability depend on external services. Meanwhile, alternative on-device approaches each leave a critical gap — per-origin model downloads, vendor lock-in, or no web-origin-scoped permission surface.
Reality Web Intelligence (RWI) is a browser-mediated architecture that exposes on-device LLM capabilities to websites through an explicit, origin-scoped permission model and a shared model runtime. This repository contains two working implementations:
- Rewebin — a custom macOS browser built on WebKit that integrates RWI natively
- RWI macOS App + Safari Web Extension — an extension-mediated implementation for Safari
Both expose the same JavaScript SDK (window.rwi) and demonstrate zero-cloud AI for web applications while preserving privacy, improving reliability, and lowering barriers to LLM-powered functionality.
- Introduction
- Landscape of Existing Approaches
- What RWI Enables for Web Developers
- Design Goals
- System Overview
- Architecture
- Web API
- Security and Privacy
- Implementation Details
- RWI Analyzer
- Validation
- Limitations and Scope
- Future Work
- Getting Started
- Project Structure
- Contributing
- License
Web applications increasingly rely on LLMs and AI services to perform tasks such as text summarization, classification, retrieval assistance, and generation. This capability has accelerated product development, but the prevailing architecture is server-dependent: inference runs on remote servers and web applications communicate via third-party APIs.
Meanwhile, modern consumer devices — especially Apple Silicon Macs — can now run mid-scale language models (on the order of a few gigabytes) locally. This creates an opportunity: what if the intelligence a website needs were already present on the user's device, and the browser could mediate access to it the same way it mediates access to the camera or location?
Several approaches to web AI exist today, spanning cloud-hosted services, in-browser inference, browser-vendor APIs, local runtimes, and platform-level intelligence. Each addresses part of the problem; none addresses all of it.
Most contemporary web applications integrate AI through cloud-hosted APIs. While this enables rapid deployment, it introduces four recurring limitations:
- Privacy and data ownership. Sensitive user content is transmitted to external servers.
- Cost and accessibility. API-based inference introduces recurring costs, rate limits, and barriers for small developers, students, and offline users.
- Latency and reliability. Quality of experience depends on network conditions and service uptime.
- Fragmented intelligence. Each website reimplements similar AI logic, leading to duplicated computation and inconsistent behavior.
An alternative is to execute LLM inference directly inside the page using WebGPU (e.g., WebLLM). This avoids cloud dependency and keeps data on-device, but shifts model download, caching, sandboxed compute constraints, and performance variability to each site individually. Critically, Google's own web.dev documentation explicitly states that WebLLM's model cache "cannot be shared across origins, so another web app may have to download the same model again." If a user visits five websites that all want the same model, the model is downloaded and cached five separate times.
Browser vendors have begun exploring web APIs that expose on-device AI capabilities behind permissions and origin trials. Chrome's built-in Prompt API runs inference on-device using Gemini Nano and is now stable for Chrome extensions, though still in origin trial for the broader web platform as of early 2026. While this approach keeps data on-device, it introduces three significant constraints:
- Single-vendor, single-model lock-in. The API ships only a fixed vendor model (Gemini Nano) and is available only in Chrome and Edge — meaning developers targeting Safari, Firefox, or other browsers cannot use it.
- Model instability under vendor control. Because the browser vendor controls the model, it can be silently updated, swapped, or deprecated across browser versions. A website developer who has prompt-engineered for one model version may find their application behavior changes unexpectedly after a browser update.
- Browser-vendor participation required. The approach requires integration by each browser vendor; it cannot be adopted independently by developers or extended to other browsers.
By contrast, in RWI the model is packaged with the runtime and does not change once deployed, giving developers a stable, predictable inference target.
General-purpose local runtimes such as Ollama and llamafile simplify running models on a user's machine. However, they expose no web-origin-scoped permission surface for arbitrary websites: any local application (or any website that can reach the local HTTP endpoint) can access them without per-origin consent or browser-enforced isolation.
Platform-level assistants and OS-provided intelligence features can deliver user-facing functionality but are not programmable as a web API for arbitrary websites. A web developer cannot call Apple Intelligence from their site; it is a system feature, not a web platform primitive.
Emerging web platform efforts such as WebNN expose low-level machine learning primitives and hardware acceleration to web developers. These approaches require applications to manage models, execution graphs, and lifecycle concerns directly, placing substantial engineering burden on each site.
Taken together, these approaches leave a clear gap. No existing system simultaneously provides:
- A single model instance shared across origins (eliminating redundant downloads)
- Explicit, per-origin, user-controlled permissions analogous to camera and location access
- Zero cloud dependency by design
- A deployment path that does not require browser-vendor participation
This is the gap that Reality Web Intelligence (RWI) addresses.
| Dimension | WebLLM (WebGPU) | Chrome Prompt API | Ollama / llamafile | RWI (this work) |
|---|---|---|---|---|
| Inference location | In-page (sandboxed tab) | Browser process (in-process) | Local daemon (separate process) | Browser/ext. mediator (in-process in Rewebin) |
| Model shared across origins? | No — each origin downloads & loads independently | Yes — browser manages one shared model | No — app-level, not web-origin-scoped | Yes — single model shared across permitted origins |
| Developer installs model? | Yes — each site manages download & caching | No — vendor ships model (Gemini Nano only) | User installs separately | No — prepackaged in Rewebin / companion app |
| Model choice | Open (developer selects) | Fixed (Gemini Nano; vendor-controlled) | Open (user selects) | Open (prepackaged; extensible) |
| Origin-scoped permissions? | No — runs inside untrusted page context | Yes — Chrome permission prompt per origin | No — all local apps have access | Yes — explicit per-origin consent, browser-enforced |
| Browser vendor required? | No — pure web APIs (WebGPU) | Yes — Chrome/Edge only, origin trial | No | No for extension; Yes for native (Rewebin) |
| Works in Safari? | Partial (WebGPU in tech preview) | No | N/A (native app) | Yes — ext. targets Safari |
| Cross-origin prompt isolation? | Partial (tab sandboxing) | Not yet specified | N/A | Yes — per-(origin, sessionId) session store |
| Offline capable? | Yes (after first download) | Yes (after model download) | Yes | Yes (model prepackaged in Rewebin) |
| Backend runtime | WebGPU + WASM | Proprietary (Chrome-internal) | llama.cpp / various | llama.cpp (Metal-accelerated) |
With RWI, a developer adds a few lines of JavaScript to their website and gains access to on-device LLM inference — without bundling a model, without an API key, without per-request cloud costs, and without slowing down page load. The intelligence is already on the user's device; the browser mediates access.
// Check availability
const available = await rwi.isAvailable();
// Request permission (user sees a browser-native prompt)
const { granted } = await rwi.requestPermission();
// Generate text with streaming
const result = await rwi.generate({
prompt: "What's RWI?",
onToken: (token) => updateUI(token)
});The website itself loads as fast as any normal site — no multi-gigabyte model download on first visit. Every site that uses RWI shares the same model already present on the device. The user grants permission once per origin (like camera or location access), and all inference happens locally: no data leaves the device, no cloud bill accumulates, and the feature works offline.
As on-device models improve and consumer hardware becomes more capable, this architecture becomes more powerful without any changes to the websites that use it.
Given the gaps identified above, RWI is designed around five core requirements:
- On-device by default: inference runs locally, without sending user content to the network.
- Shared across websites: the intelligence layer is installed once and reused, not duplicated per site.
- Browser-mediated access: websites cannot access the model directly; the browser enforces isolation.
- Explicit permissions: access is origin-scoped and user-controlled.
- No new browser required: the approach must fit within existing browser extension and OS app ecosystems (demonstrated via the Safari extension), while also showing what native integration looks like (Rewebin).
RWI can be realized through different integration points in the browser stack. This project describes two approaches:
- Extension-mediated RWI (Safari): a macOS container app hosts the on-device LLM runtime, while a Safari Web Extension injects the web SDK and securely mediates origin-scoped requests to the app.
- Native RWI (Rewebin browser): a custom WebKit-based browser integrates the RWI runtime directly into the browser process and exposes the same origin-scoped web API without requiring an extension bridge or external companion process.
The two designs share the same core idea — websites request intelligence through a permissioned API — but differ in where trust and execution live: in the Safari implementation, the trusted runtime is a separate app; in Rewebin, the trusted runtime is the browser itself.
Websites do not run AI models — the browser does. Websites request intelligence — the user grants access.
-
Rewebin browser (WebKit, native RWI): a custom macOS browser built with a SwiftUI shell and a WebKit rendering layer (
WKWebView). Rewebin injects the RWI web SDK (window.rwi) at document start usingWKUserScriptand mediates calls viaWKScriptMessageHandler. LLM inference runs in-process via a native Swift service layer that integratesllama.cppas an XCFramework. -
RWI macOS app + Safari Web Extension (extension-mediated RWI): the container app hosts and manages the local LLM runtime (integrating
llama.cppvia an XCFramework), while the Safari Web Extension injects the RWI SDK, bridges communication, and manages per-origin permissions. The app must be running for inference requests to succeed. -
Web JavaScript SDK (
rwiSDK.js): a developer-facing API exposed aswindow.rwi; the same surface is implemented in both Rewebin and the Safari extension. -
Demo web application (Next.js): a chat-style UI showcasing streaming inference and connectivity monitoring. The demo requires no server-side component for inference — it functions fully offline once the page is loaded and the local runtime is available. Reference testing was conducted on an Apple M1 MacBook Pro (8 GB unified memory) using the LFM2-2.6B model (Liquid AI) quantized to Q4_K_M GGUF format (~1.5 GB memory footprint), with inference accelerated via
llama.cpp's Metal backend. -
RWI Analyzer (Rewebin): a browser-native feature that uses the on-device LLM to infer a website's intent — what the page is trying to do — from structural and behavioral signals, rather than relying on visible text. See the RWI Analyzer section below for details.
RWI separates concerns into a web-facing SDK and a mediator that hosts the on-device runtime. The mediator acts as a secure intermediary between untrusted websites and the trusted inference runtime.
RWI can mediate requests through an extension bridge (Safari) or directly inside the browser process (Rewebin). Both present the same web-facing API but differ in execution and message routing.
The distinction between Rewebin and the Safari extension is architectural, not primarily about inference speed. In both implementations, the same underlying llama.cpp runtime and LFM2-2.6B model are used. The key difference is the mediation path: in Safari, the extension bridges the web page to a companion app via inter-process communication, while in Rewebin, the browser is the trusted runtime and calls are handled in-process via WKScriptMessageHandler. This eliminates IPC overhead and simplifies the trust boundary, but the dominant cost in both systems is token generation itself, not the messaging layer.
Website ──► Content.js ──► Background.js ──► Extension Handler ──► RWI App (LLM)
◄────────────────────────────────────────────────────────────────────────────┘
(tokens streamed back)
- A website calls
rwi.generate(...)via the injected SDK. - The page communicates with the content script using
postMessage. - The content script forwards the request to the background service worker.
- The background forwards to the native extension handler.
- The native handler relays to the RWI macOS app via
DistributedNotificationCenter. - The app performs on-device inference and streams tokens back through the chain.
Website ──► WKScriptMessageHandler ──► LlamaService (in-process)
◄────────────────────────────────────────────┘
(tokens streamed back)
- At document start, the browser injects the RWI SDK (
window.rwi) viaWKUserScript. - A website calls
window.rwi.generate(...). - The SDK forwards the request via
webkit.messageHandlers.rwi.postMessage(...). - The
RWIScriptMessageHandlervalidates the caller's origin and checks permissions. - If allowed, the handler calls
LlamaServicewhich runs generation viallama.cpp(Metal-accelerated). - Tokens are streamed back to the page via polling or JavaScript evaluation.
| Aspect | Rewebin (WebKit Browser) | RWI Safari Extension |
|---|---|---|
| Type | Custom browser app | Safari extension + container app |
| Engine | WebKit (WKWebView) | Safari (WebKit) |
| SDK Injection | Native WKUserScript at document start |
Content script injection |
| Message Handling | Direct WKScriptMessageHandler |
Multi-hop: Content.js → Background.js → Native Handler → IPC |
| LLM Execution | In-process (zero-latency) | Cross-process via DistributedNotificationCenter |
| External Dependencies | None — self-contained | Requires RWI App to be running |
RWI exposes a JavaScript API as window.rwi. The current implementation supports:
- Availability checks to detect whether the extension/runtime is installed
- Permission prompts for explicit user approval
- Text generation with streaming token callbacks
if (await rwi.isAvailable()) {
console.log('RWI is available!');
}const status = await rwi.getStatus();
// { available: true, modelLoaded: true, modelLoading: false, version: "1.0.0" }const { granted } = await rwi.requestPermission({
task: 'summarize articles'
});const result = await rwi.generate({
prompt: 'Explain quantum computing in simple terms',
maxTokens: 512,
temperature: 0.7,
onToken: (token) => { /* streaming callback */ }
});await rwi.cancel();const info = await rwi.getModelInfo();
// { modelName: "LFM2-2.6B", quantization: "Q4_K_M", info: "..." }Beyond the current API surface, a production-grade API must address practical concerns:
- Capability negotiation: request a capability class (e.g.,
generate,summarize,classify) and receive a clear supported/unsupported response. - Model selection: allow requesting a user-approved profile (e.g., "small/fast" vs. "larger/accurate") rather than exposing raw model files to pages.
- Structured errors: return machine-readable error codes (permission denied, runtime unavailable, quota exceeded, input too large).
- Quotas and rate limiting: enforce per-origin budgets and expose backpressure signals.
- Streaming and cancellation: define time-to-first-token expectations, max token limits, and explicit cancellation.
RWI assumes websites are untrusted and may attempt to:
- Exfiltrate sensitive user data
- Prompt the model to reveal private information from previous interactions
- Abuse compute resources by issuing high-volume requests
- Impersonate other origins
- No direct model access: websites never receive handles to model weights, files, or raw runtime interfaces.
- Origin isolation: permissions are scoped per origin; requests are labeled and validated.
- Origin-scoped context: inference sessions and prompt context are scoped per requesting origin, preventing cross-site prompt leakage by construction.
- Explicit consent: all compute is user-visible and permission-gated.
- Local-only processing: content remains on-device by architecture (no cloud inference).
- Cross-origin prompt leakage and injection. "Origin-scoped context" is enforced by maintaining a per-origin session store keyed by (origin, sessionId), with explicit session creation/deletion and a safe default of no retention unless the user opts in.
- Timing and resource side-channels. Mitigations include per-origin queues, strict rate limits, coarse-grained scheduling, and optional jitter insertion.
DistributedNotificationCenterbroadcast risk. A production design should use authenticated point-to-point IPC (e.g., XPC) with code-signing checks.- Origin authentication. The mediator derives origin from trusted browser state and does not accept caller-provided origin strings.
A production-grade implementation requires careful policy design, including rate limiting, robust identity/origin verification, prompt/data retention controls, and transparency controls (e.g., a local audit log of which origins requested which tasks).
Rewebin is a custom macOS browser that implements RWI natively. It is built as a single, self-contained application: a SwiftUI user interface shell hosts a WebKit rendering layer (WKWebView), and the on-device LLM runtime is integrated in-process.
Note: Rewebin is not designed to be a production browser. It is an intentionally minimal browser developed to demonstrate how browsers can incorporate RWI natively.
Browser integration surface:
- SDK injection at document start: the browser injects the RWI SDK into each page using
WKUserScript, ensuringwindow.rwiis available before application scripts run. - Privileged call mediation: JavaScript calls are routed to native code via
WKScriptMessageHandler. This creates a browser-controlled choke point where policy can be enforced. - Origin-scoped permissions: the handler computes the requesting origin from the page context and consults a persistent permission store.
Inference runtime placement:
- A browser-level service (
LlamaService) owns model lifecycle (load/unload), request scheduling, and cancellation. - The service delegates to an isolated inference context (a Swift actor wrapping a
llama.cppcontext) to keep generation thread-safe. - The native inference backend is integrated as a
llama.cppXCFramework, enabling Metal acceleration.
Streaming design:
- Create session: a generation request allocates a session identifier scoped to the requesting tab and origin.
- Generate and buffer: as tokens are produced, they are appended to a per-session buffer.
- Deliver incrementally: the page polls for new tokens or receives callbacks via browser-initiated JavaScript evaluation.
- Finalize and clean up: on completion or cancellation, the browser marks the session complete and releases buffers.
The on-device runtime is hosted by a companion RWI macOS container application. The app loads and manages a local LLM using llama.cpp via an XCFramework for performance and portability.
The runtime runs a quantized mid-scale model (LFM2-2.6B in Q4_K_M GGUF format). The Safari extension does not run inference itself; requests are relayed from the extension's native handler to the app via DistributedNotificationCenter, and tokens are streamed back through the extension messaging pipeline.
The Safari Web Extension is implemented using Manifest V3 concepts. The extension:
- Injects the RWI SDK into pages
- Bridges page-to-extension messaging
- Enforces per-origin permissions and policy
- Routes generation requests to the running RWI app via the native handler and cross-process IPC
RWI Analyzer is a browser-native intelligence feature in Rewebin that uses the on-device LLM to understand and explain the purpose of any website — even when the site contains little or no readable text.
Instead of asking "What does this page say?", it asks "What is this page trying to do?"
RWI Analyzer runs a multi-stage pipeline entirely on-device:
-
Signal Extraction — JavaScript is injected into the page to collect structural, behavioral, and metadata signals including page structure (forms, inputs, navigation landmarks), metadata (title, Open Graph), client-side behavior (presence of
fetch,WebSocket,OAuth,analytics,paymentpatterns), ARIA roles, and external script sources. -
Prompt Construction — The extracted signals are structured into a compact prompt (~3000 characters) that provides the LLM with a behavioral fingerprint of the page.
-
LLM Analysis — The prompt is sent to the on-device LLM, which generates a concise analysis describing the website's purpose and notable behavioral patterns. Tokens are streamed in real-time to the UI.
-
Caching — Results are cached per-URL with a 24-hour TTL, so revisiting a site loads the analysis instantly.
The same approach of extracting behavioral signals and reasoning over them with an on-device LLM could power:
- Phishing & scam detection — identifying pages that mimic login forms of legitimate services
- Accessibility auditing — assessing ARIA roles, semantic structure, and form labels
- Privacy & tracking transparency — detecting analytics scripts and summarizing data collection
- Browser-level site summaries — AI-generated descriptions alongside search results or tab tooltips
- Parental controls — instant descriptions of what a website does
The following capabilities have been verified end-to-end:
- End-to-end text generation: the RWI JavaScript SDK (
window.rwi.generate()) successfully invokedllama.cppinference and streamed tokens to the Next.js demo application in both Rewebin (in-process) and the Safari extension (IPC-mediated) configurations. - Origin-scoped permission enforcement: access requests from unpermitted origins were rejected before reaching the inference layer.
- Consumer hardware verification: verified on Apple M1 MacBook Pro, 8 GB unified memory, running the LFM2-2.6B model at Q4_K_M quantization (~1.5 GB memory footprint).
- Offline operation: both implementations produced responses with the network interface disabled, confirming zero-cloud architecture.
- RWI Analyzer: the browser-native site analysis feature successfully extracted DOM/ARIA signals, constructed a compact prompt, and streamed an intent inference result.
The current implementation intentionally limits scope:
- macOS-only (initially): The current implementation targets macOS exclusively because building a custom browser on Chromium requires substantial compute resources that were not available during this phase. WebKit (
WKWebView) on macOS offered a pragmatic path to a functional custom browser runtime. Cross-browser portability is a future goal, not a fundamental architectural limitation. - Safari-only for the extension implementation (initially)
- Mid-scale models only (not frontier models)
- Explicit user consent required
- Performance varies by hardware
These constraints are design choices aligned with feasibility and privacy goals.
Key directions include:
- Generalizing the permission model to richer "task" primitives (summarize, classify, extract, etc.)
- Adding policy controls (rate limiting, quotas, and background execution constraints)
- Improving isolation boundaries and verifiable origin attestation
- Cross-browser portability (where extension ecosystems permit)
- Standardized APIs that could evolve into a web platform feature
Reality Web Intelligence proposes a new architectural role for the browser: not merely a renderer of content, but a trusted mediator for local intelligence. By moving inference onto the user's device and mediating access through explicit origin-scoped permissions, RWI reduces privacy risks, removes recurring API costs, improves reliability, and enables offline-capable AI web applications.
- macOS 14.0+ (Sonoma or later)
- Xcode 15.0+
- Node.js 18+ and npm (for the demo web app)
- Safari 17.0+ (only for the Safari extension)
git clone https://github.com/la-dev05/Reality-Web-Intelligence.git
cd Reality-Web-IntelligenceThe LFM2-2.6B model file is too large for Git. Download it separately:
- Download LFM2-2.6B-Q4_K_M.gguf (or similar GGUF model)
- Place it in:
- For Rewebin:
Rewebin/Rewebin/LLM/LFM2-2.6B-Q4_K_M.gguf - For RWI Extension:
RWI/RWI/LLM/LFM2-2.6B-Q4_K_M.gguf
- For Rewebin:
Note: The model should be named exactly
LFM2-2.6B-Q4_K_M.ggufor you'll need to update the code.
- Open
Rewebin/Rewebin.xcodeprojin Xcode - Select the Rewebin scheme and your Mac as the target
- Press ⌘ + R to build and run
The LLM model auto-loads on startup (~10-30 seconds). Navigate to any website that uses window.rwi — no extensions or separate apps needed.
- Open
RWI/RWI.xcodeprojin Xcode - Build and run (⌘ + R)
- Enable the extension in Safari → Settings → Extensions → "Reality Web Intelligence"
- Grant "Allow on All Websites" permission
- Keep the RWI App running (it hosts the LLM)
Navigate to the demo (in Rewebin or Safari with the extension enabled):
🌐 https://rwi-web-test.vercel.app
Or run it locally:
cd RWI-Web-Test
npm install
npm run dev
# Open http://localhost:3000 in Rewebin or SafariRWI/
├── README.md # This file (research paper + getting started)
├── LICENSE # MIT License
├── .gitignore
│
├── Rewebin/ # Custom WebKit Browser (Primary Implementation)
│ ├── Rewebin.xcodeproj/
│ ├── llama.xcframework/ # llama.cpp compiled framework
│ └── Rewebin/
│ ├── RewebinApp.swift # App entry, menu commands, BrowserState
│ ├── Browser/ # Browser UI (SwiftUI)
│ │ ├── BrowserWindow.swift, Tab.swift, StartPageView.swift
│ │ ├── BookmarksListView.swift, HistoryListView.swift, DownloadsListView.swift
│ ├── RWI/ # RWI Web API Integration
│ │ ├── RWIScriptMessageHandler.swift # WKScriptMessageHandler for window.rwi
│ │ ├── RWIUserScript.swift # SDK injection (embedded JS)
│ │ ├── RWIAnalyzerService.swift # Signal extraction + LLM analysis
│ │ ├── RWIAnalysisView.swift # Analysis UI
│ │ └── RWIAnalysisCache.swift # 24h TTL cache
│ ├── Services/ # LLM Runtime
│ │ ├── LlamaService.swift # High-level LLM service
│ │ └── LibLlama.swift # llama.cpp Swift bindings
│ ├── Permissions/
│ │ └── PermissionManager.swift
│ ├── Data/ # BookmarkManager, HistoryManager, DownloadManager
│ └── LLM/
│ └── LFM2-2.6B-Q4_K_M.gguf # Quantized model (not in Git)
│
├── RWI/ # Safari Extension Implementation
│ ├── RWI.xcodeproj/
│ ├── RWI/ # Container App (LLM Host)
│ │ ├── RWIApp.swift, Services/, LLM/, Views/
│ │ └── llama.xcframework/
│ └── RWI Extension/ # Safari Web Extension
│ ├── SafariWebExtensionHandler.swift
│ └── Resources/ # manifest.json, background.js, content.js, rwiSDK.js
│
├── RWI-Web-Test/ # Demo Next.js Web App
│ ├── app/ # page.tsx (Chat UI), layout.tsx, globals.css
│ └── package.json
│
├── RWI Analyzer.md # RWI Analyzer documentation
└── Research paper.tex # Full research paper (LaTeX)
| Property | Value |
|---|---|
| Model | LFM2-2.6B (Liquid Foundation Model) |
| Model Size | ~1.5 GB |
| Parameters | 2.6B |
| Quantization | Q4_K_M |
| Context Length | 2048 tokens |
| Inference | CPU + Metal (Apple Silicon) |
| Component | Technology |
|---|---|
| Rewebin Browser | SwiftUI, AppKit, WebKit (WKWebView) |
| SDK Injection | WKUserScript (JavaScript) |
| Message Bridge | WKScriptMessageHandler |
| LLM Inference | llama.cpp via XCFramework, Metal |
| Data Persistence | UserDefaults (JSON-encoded) |
| Demo Web App | Next.js, TypeScript, React |
# Clone the repository
git clone https://github.com/la-dev05/Reality-Web-Intelligence.git
# Rewebin Browser:
open Rewebin/Rewebin.xcodeproj
# Build and run (⌘ + R)
# RWI Safari Extension:
open RWI/RWI.xcodeproj
# Build and run (⌘ + R)Rewebin: View logs in Xcode console. Filter by [RWI Handler], [RWI UserScript], [LlamaService], [Rewebin].
Safari Extension: Safari → Develop → Web Extension Background Pages → RWI. Content script debugging via Web Inspector.
| Issue | Solution |
|---|---|
| Model not loading | Verify LFM2-2.6B-Q4_K_M.gguf exists in the LLM/ directory |
| "RWI not available" | Ensure the model has finished loading (check toolbar indicator in Rewebin) |
| Slow first response | Normal — model loads on first launch (~10-30s depending on hardware) |
| Safari extension not visible | Restart Safari, check Extensions in Settings |
| Safari: "RWI not available" | Ensure RWI App is running + extension has "Allow on All Websites" |
A demo video is included in the repository: RWI Prototype Demo.mov
Contributions are welcome! This is an open-source project and we encourage community involvement.
- Bug reports — open an issue describing the problem
- Feature requests — open an issue with a description of the proposed feature
- Pull requests — fork the repo, make your changes, and submit a PR
This work describes an implementation built as part of the Reality Web Intelligence (RWI) open-source project at Reality Play.
- W3C, "Permissions," W3C Recommendation. https://www.w3.org/TR/permissions/
- G. Gerganov et al., "llama.cpp," GitHub. https://github.com/ggerganov/llama.cpp
- Apple Inc., "Safari Web Extensions," Apple Developer Documentation. https://developer.apple.com/documentation/safariservices/safari_web_extensions
- Google, "Manifest V3," Chrome Extensions Documentation. https://developer.chrome.com/docs/extensions/develop/migrate/what-is-mv3
- W3C, "Web Neural Network API," W3C Working Draft. https://www.w3.org/TR/webnn/
- W3C, "WebGPU," W3C Working Draft. https://www.w3.org/TR/webgpu/
- Apple Inc., "WKWebView," Apple Developer Documentation. https://developer.apple.com/documentation/webkit/wkwebview
- Ollama, "Ollama," GitHub. https://github.com/ollama/ollama
- Mozilla, "llamafile," GitHub. https://github.com/Mozilla-Ocho/llamafile
- MLC AI, "WebLLM," GitHub. https://github.com/mlc-ai/web-llm
- web.dev, "Build a local and offline-capable chatbot with WebLLM," Google, Jan. 2025. https://web.dev/articles/ai-chatbot-webllm
- MLC AI, "WebLLM: A High-Performance In-Browser LLM Inference Engine," arXiv:2412.15803, Dec. 2024.
- Google Chrome for Developers, "Built-in AI / Prompt API," 2025. https://developer.chrome.com/docs/ai/built-in
- Liquid AI, "LFM2: Liquid Foundation Models," 2025. https://www.liquid.ai
MIT License — see LICENSE for details.
© 2026 Lakshya Gupta. Reality Play.
Reality Web Intelligence — An Open-Source Project by Reality Play