Skip to content

ruthwikj/LightShield

Repository files navigation

LightShield

LightShield is a lightweight Python middleware that reduces prompt injection risk by tagging system and user content and enforcing a strict instruction hierarchy before prompts are sent to an LLM. It does not modify model weights or call a separate classifier—only prompt construction and response sanitization.

Install

pip install ollama   # required for the Ollama shield

Clone or install the LightShield package so you can import it.

Use (Ollama)

  1. Import and create a shield

    Choose the engine (e.g. "ollama"). The shield wraps that backend so every chat call is tagged and responses are sanitized.

    from lightshieldai import Shield
    
    shield = Shield()
  2. Call chat

    Pass the same model and messages you would use with Ollama. Use standard role and content keys. LightShield uses system and user messages only (RAG/retrieved layers are for a later release).

    response = shield.chat(
        model="qwen2.5",
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "What is 2+2?"},
        ],
    )
  3. Read the response

    The returned object has the same shape as Ollama’s response. The message content is sanitized so internal LightShield tags are not exposed.

    print(response["message"]["content"])
  4. Other Ollama features

    Other calls are passed through to the underlying backend (e.g. shield.list(), shield.pull("qwen2.5")). Only chat() is wrapped and sanitized. Streaming is disabled so that responses can be sanitized reliably.

What LightShield does

  • Before the call: Builds a system message that includes an authority/hierarchy paragraph and wraps your system and user text in unique tags so the model sees clear boundaries and priorities (system over user).
  • After the call: Strips those tags from the model’s reply so they never reach your application.

Building blocks (advanced)

If you want to plug LightShield into another API or build your own flow, you can use the lower-level pieces:

  • LayerPrompt — Creates one tag per layer (system, user, retrieved) and returns authority_text() plus a tags dict. Use tags["system"].wrap(...) and tags["user"].wrap(...) to build the strings you send.
  • Tag — Single tag with short_id and wrap(content) for <LS_id>content</LS_id>.

You would then call your own LLM with the wrapped prompts and implement sanitization (strip <LS_*> and </LS_*>) using the same tag ids from that LayerPrompt instance.

About

IrvineHacks2026

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors