Skip to content

clocksmith/doppler

@simulatte/doppler

Inference and training on raw WebGPU. Pure JS + WGSL.

Try the live demo | npm | docs

Phase-latency comparison on one workload across models

Quick start

import { doppler } from '@simulatte/doppler';

// Stream tokens
const model = await doppler.load('gemma3-270m');
for await (const token of model.generate('Describe WebGPU briefly')) {
  process.stdout.write(token);
}

// One-shot
const text = await model.generateText('Explain WebGPU in one sentence');

// LoRA hot-swap
await model.loadLoRA('https://oneshift-twoshift-redshift-blueshift.com/manifest.json');

Registry IDs resolve to hosted RDRR artifacts from Clocksmith/rdrr by default. Tokens stream from a native AsyncGenerator. See the canonical Root API guide.

Why Doppler

JS → WGSL → WebGPU. Direct JavaScript orchestration into native WebGPU kernels, avoiding ONNX runtimes, WASM blobs, and bridge layers.

for await streaming. Generation uses a native AsyncGenerator that fits normal app control flow.

LoRA hot-swap. Swap adapters at runtime without reloading the base model.

Independent model instances. Run multiple models concurrently. Each owns its pipeline, buffers, and KV cache.

Supported models

All models below are verified with deterministic greedy decoding on WebGPU hardware. Registry IDs resolve to hosted RDRR artifacts automatically.

Model Registry ID Quant Params
Gemma 3 270M IT gemma3-270m Q4K 270M
Gemma 3 1B IT gemma3-1b Q4K 1B
Gemma 3 1B IT (F16) gemma-3-1b-it-f16-af32 F16 1B
TranslateGemma 4B IT translategemma-4b-it-q4k-ehf16-af32 Q4K 4B
TranslateGemma 4B 1B EN-ES translategemma-4b-1b-enes-q4k-ehf16-af32 Q4K 1B
EmbeddingGemma 300M google-embeddinggemma-300m-q4k-ehf16-af32 Q4K 300M
Qwen 3.5 0.8B qwen-3-5-0-8b-q4k-ehaf16 Q4K 0.8B
Qwen 3.5 2B qwen-3-5-2b-q4k-ehaf16 Q4K 2B
LFM2.5 1.2B Instruct lfm2-5-1-2b-instruct-q4k-ehf16-af32 Q4K 1.2B

Additional model families (Llama 3, DeepSeek, Gemma 4 MoE, Mixtral, and others) have conversion configs ready but are not yet cataloged. See the full model support matrix for details.

Under the hood

  • Sharded weight loading via OPFS moves multi-GB weights into VRAM without blocking the main thread.
  • Quantized inference (Q4K, F16) runs practical model sizes on consumer GPUs.
  • Kernel hot-swap between prefill and decode paths with zero graph recompilation.
  • Config-driven runtime with explicit profiles, kernel-path selection, and sampling.

Documentation

Environment requirements

  • WebGPU is required.
  • Supported runtimes: WebGPU-capable browsers, or Node with a WebGPU provider.
  • Chrome / Edge 113+ supported.
  • Firefox support varies (typically behind a flag).
  • Safari support is evolving.

License

Apache License 2.0 (Apache-2.0). See LICENSE and NOTICE.

About

Zero-dependency JS+WGSL runtime for AI workloads

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors