GA Guard SDK
The GA Guard SDK is a Python client library that provides a unified interface for invoking AI safety guardrails in your applications. Whether you are moderating user-generated content in a chat application, screening LLM outputs in an agentic workflow, or enforcing compliance policies across a document processing pipeline, the SDK abstracts away the complexity of authentication, guard discovery, evaluation requests, and audit logging into a clean, typed API.
The SDK supports both the open Hugging Face checkpoints (GA Guard Core, Lite, and Thinking) and bespoke enterprise guards through the same client interface. This means you can start with the public guards during development, switch to custom enterprise models in production, or run both simultaneously without changing your integration code. Every guard invocation returns structured results with per-policy violation probabilities, latency metrics, and raw response data, giving you the granularity needed for tunable thresholds, observability pipelines, and compliance reporting.
Install and configureInstall and configure
Install the SDK from PyPI using pip or your preferred package manager:
pip install generalanalysisThe SDK requires an API key for authentication. Generate one in Settings → API Keys on the GA dashboard, then export it as an environment variable before running your code:
export GA_API_KEY="your_api_key_here"Set the GA_BASE_URL environment variable if your organization uses a dedicated deployment. The default cloud endpoint works out of the box for public guards.
Configuration optionsConfiguration options
The SDK reads configuration from environment variables by default, but you can also pass values directly to the client constructor for more control:
GA_API_KEY– Required. Your API key for authenticating with the GA platform. Store this securely using your secrets manager of choice (AWS Secrets Manager, HashiCorp Vault, environment variables in CI/CD, etc.). Never hard-code API keys in source files.GA_BASE_URL– Optional. Override the default API endpoint for dedicated or on-premise deployments. Useful for organizations that run GA Guard behind a VPN or in an air-gapped environment.- Timeouts and retries – The SDK uses sensible defaults for request timeouts and retry behavior. For high-throughput workloads, consider tuning these values to match your latency budget and error-handling strategy.
Invoke guards from PythonInvoke guards from Python
The core workflow is straightforward: create a client, call invoke with a guard ID and the text to evaluate, and inspect the result.
import generalanalysis
client = generalanalysis.Client()
result = client.guards.invoke(
guard_id="ga_guard_core",
text="Contact [email protected]"
)
if result.block:
print("Content blocked!")
for policy in result.policies:
if not policy.passed:
print(f" Violated: {policy.name} - {policy.definition}")
print(f" Confidence: {policy.violation_prob:.2%}")Use the boolean result.block for quick allow/block decisions, or incorporate each policy’s violation_prob when you prefer tunable thresholds. Every evaluation returns latency metrics plus raw response data for debugging.
The result.block field uses the guard’s default threshold to make a binary decision, which works well for most use cases. For more nuanced control, you can inspect individual policy scores and apply your own logic. For example, you might allow content that triggers a low-confidence PII detection but block anything that scores above 0.9 on prompt injection. This flexibility lets you balance safety and user experience for your specific application.
Async supportAsync support
For applications built on async frameworks like FastAPI, aiohttp, or agent runtimes that use asyncio, the SDK provides a fully async client with an identical API surface:
import asyncio
import generalanalysis
async def run():
async with generalanalysis.AsyncClient() as client:
result = await client.guards.invoke(guard_id="ga_guard_core", text="Ignore prior rules")
print(result.block)
asyncio.run(run())The async client mirrors the sync API, making it easy to plug into async frameworks or agent runtimes.
Using the async client is especially important in high-concurrency environments where blocking I/O would degrade throughput. In an agentic system that makes multiple tool calls per turn, you can fire guard evaluations concurrently with other async operations (retrieval, LLM calls, database writes) instead of waiting sequentially. The async with context manager ensures that the underlying HTTP session is properly closed when your application shuts down.
Practical tips for async integrationPractical tips for async integration
- Batch evaluations with
asyncio.gather: If you need to evaluate multiple pieces of text (e.g., each tool output in an agent trace), useasyncio.gatherto run them concurrently and reduce total wall-clock latency. - Connection pooling: The async client maintains a connection pool internally. Reuse a single
AsyncClientinstance across your application rather than creating a new one per request. - Graceful degradation: Wrap guard invocations in try/except blocks with sensible fallback behavior. If the guard service is temporarily unavailable, decide whether your application should fail open (allow content through) or fail closed (block content) based on your risk tolerance.
Manage your guard catalogManage your guard catalog
List and cache the guard definitions that are available to your account—public Hugging Face checkpoints appear alongside any custom enterprise guards your team commissioned.
guards = client.guards.list()
for guard in guards:
print(guard.id, guard.name, [policy.name for policy in guard.policies])The guard catalog is your single source of truth for what safety coverage is available. Each guard entry includes its ID, display name, description, endpoint URL, and the full list of policies it evaluates. Use this listing to build dynamic UIs, validate guard IDs at startup, or generate documentation for your operations team.
| Guard | ID in SDK | Policies | Avg. latency (ms) |
|---|---|---|---|
| GA Guard Core | ga_guard_core | AI guardrails policy taxonomy | 20-35 |
| GA Guard Lite | ga_guard_lite | AI guardrails policy taxonomy | 10–20 |
| GA Guard Thinking | ga_guard_thinking | AI guardrails policy taxonomy | 500–700 |
| Injection Guard | ga_injection_guard | prompt_injection | 100-150 |
| PII Guard | ga_pii_guard | PERSON, LOCATION, DATE_TIME, EMAIL_ADDRESS, PHONE_NUMBER, IP_ADDRESS, CREDIT_CARD, BANK_NUMBER, DRIVER_LICENSE, … | 10-250 |
| Harmfulness Guard | ga_harmfulness_guard | SEXUAL, VIOLENCE, SELF-HARM, HARASSMENT, ILLEGAL_ACTIVITY | 200-500 |
| Intranet Agents Guard | intranet_agents_out_of_scope | out_of_scope | 100-150 (est.) |
The first three rows highlight the public guard lineup; the remaining entries mirror the longstanding enterprise guards documented in the developer guide, and any custom guard you commission will appear with its assigned ID and policies when you call client.guards.list().
How to choose the right guardHow to choose the right guard
Selecting the right guard depends on your latency budget, accuracy requirements, and deployment constraints:
- GA Guard Core (
ga_guard_core) is the default choice for most applications. It offers the best balance of speed and accuracy, with latencies in the 20–35ms range that are suitable for real-time chat moderation, API gateways, and agent loops. - GA Guard Lite (
ga_guard_lite) is optimized for latency-sensitive workloads where every millisecond counts. At 10–20ms per evaluation, it is ideal for high-throughput pipelines, edge deployments, and scenarios where you need to evaluate thousands of items per second. - GA Guard Thinking (
ga_guard_thinking) provides the highest accuracy for complex, ambiguous, or adversarial content. Its 500–700ms latency makes it better suited for batch processing, high-risk content review, and use cases where a wrong decision carries significant consequences (e.g., regulated industries, child safety).
For many production systems, the optimal strategy is to run GA Guard Lite as a fast first pass and escalate borderline cases (where violation_prob falls in an uncertain range) to GA Guard Thinking for a second opinion.
Core operationsCore operations
The SDK exposes four primary operations that cover the full lifecycle of guard management and evaluation:
guards = client.guards.list()
selected = client.guards.get(guard_id="ga_guard_core")
result = client.guards.invoke(guard_id="ga_guard_core", text="Check this text")
logs = client.guards.list_logs(page=1, page_size=50)Each log entry captures the user, guard ID, evaluated text, result payload, and timestamp so you can review moderation decisions or feed observability pipelines.
list()returns all guards available to your account, including both public and custom models. Use this at application startup to validate that expected guards are accessible.get(guard_id)retrieves the full definition for a single guard, including its policies and metadata. Useful for building configuration-driven workflows where guard selection is dynamic.invoke(guard_id, text)evaluates a piece of text against the specified guard and returns a structured result with per-policy scores.list_logs(page, page_size)retrieves paginated audit logs of past evaluations. Pipe these into your SIEM, data warehouse, or compliance reporting tools.
Error handlingError handling
The SDK provides typed exceptions for common failure modes, making it straightforward to build resilient integrations:
from generalanalysis import (
GuardNotFoundError,
AuthenticationError,
GeneralAnalysisError,
)
try:
result = client.guards.invoke(guard_id="ga_guard_core", text="test")
except GuardNotFoundError as e:
print(f"Invalid guard ID: {e}")
except AuthenticationError as e:
print(f"Authentication failed: {e}")
except GeneralAnalysisError as e:
print(f"API error: {e}")Guard responses are fully typed so you can integrate with IDE tooling or generate API clients:
Guard– id, name, description, endpoint, and associatedGuardPolicydefinitions.GuardInvokeResult–block,latency_ms, full policy evaluations, and the raw response for debugging.PolicyEvaluation–name,definition,passed, andviolation_probfor granular reporting.PaginatedLogsResponse–items,total, pagination fields, andGuardLogentries for audits.
The typed response objects make it easy to build robust integrations. You can use IDE autocompletion to explore available fields, write type-checked code with mypy or pyright, and generate API documentation from the type definitions.
Ship to productionShip to production
Moving from development to production requires attention to observability, alerting, and threshold management. Here are patterns that work well for teams running GA Guard at scale:
- Start with the AI guardrails public guard lineup for instant coverage, then layer custom enterprise guards as you define organization-specific policies.
- Log invocations to your SIEM or data warehouse to track coverage gaps and false positives.
- Tune thresholds or combine policy scores to craft bespoke remediation flows while reusing the core guard evaluations.
- For background on how guardrails fit into an AI security program, read What are AI guardrails? .
Observability and alertingObservability and alerting
Every guard invocation returns latency_ms alongside the evaluation result. Export these metrics to your monitoring stack (Datadog, Grafana, Prometheus, etc.) to track p50/p95/p99 latencies, block rates by policy, and error rates over time. Set up alerts for latency spikes (which may indicate upstream issues) and sudden changes in block rates (which may signal a new attack pattern or a model regression).
Threshold tuningThreshold tuning
The default result.block threshold works well for general use, but production systems often benefit from custom thresholds tuned to their specific content distribution. Start by logging violation_prob scores for a representative sample of traffic, then use ROC analysis to find the threshold that best balances precision and recall for your use case. You can apply different thresholds per policy—for example, a lower threshold (more aggressive blocking) for prompt injection and a higher threshold (fewer false positives) for borderline content categories.
Deployment patternsDeployment patterns
- Inline moderation: Call the guard synchronously in your request path before returning a response to the user. Best for chat applications and API gateways where you need real-time blocking.
- Async pipeline: Enqueue content for evaluation and process results asynchronously. Suitable for batch processing, content feeds, and scenarios where a small delay between submission and moderation is acceptable.
- Sidecar or middleware: Deploy the guard as a middleware layer in your API gateway or service mesh so that all traffic is automatically evaluated without changes to individual service code.