This blueprint deploys OpenClaw AI agents with sandbox isolation using Kata containers on Amazon EKS, integrated with LiteLLM proxy for Claude Opus 4.6 access via AWS Bedrock. The solution provides secure, isolated agent environments for Slack and Feishu integrations with enterprise-grade security, scalability, and observability.
Key Features:
- VM-level Isolation: Kata containers provide lightweight VM isolation for each sandbox
- Autoscaling: Karpenter automatically provisions bare-metal instances on-demand
- Observability: Prometheus and Grafana for metrics, with pre-built LiteLLM dashboard
- Sandbox Lifecycle: CRD-based sandbox management with warm pool and template support
- Secure AI Access: EKS Pod Identity for AWS Bedrock authentication (no static credentials)
- Multi-channel: Support for Slack and Feishu integrations
- Persistent Workspaces: EBS-backed storage for agent state and data
- EKS Cluster: Managed Kubernetes cluster (v1.31) with core node group and Karpenter autoscaling
- Kata Containers: Lightweight VM isolation using QEMU hypervisor on bare-metal instances
- LiteLLM Proxy: OpenAI-compatible API gateway to AWS Bedrock with PostgreSQL backend
- OpenClaw Sandboxes: CRD-managed isolated agent environments with persistent storage
- Storage: EBS CSI driver with gp3 volumes for workspace persistence
- Networking: VPC with public/private subnets across 3 AZs
- Security: EKS Pod Identity for AWS Bedrock access (no long-lived credentials)
- Observability: Prometheus (50Gi, 15d retention) and Grafana with LiteLLM metrics
Ensure that you have installed the following tools on your machine:
Clone the repository:
git clone https://github.com/hitsub2/openclaw-on-eks
cd openclaw-on-eks
export OPENCLAW_HOME=$(pwd)If OPENCLAW_HOME is ever unset, you can always set it manually using export OPENCLAW_HOME=$(pwd) from your openclaw-kata-eks directory.
Run the installation script:
chmod +x install.sh
./install.shDeploy to a different region:
# Deploy to ap-southeast-1
./install.sh --region ap-southeast-1
# Deploy to us-east-1 with custom cluster name
./install.sh --region us-east-1 --cluster-name my-openclaw
# Or use environment variables
AWS_REGION=ap-southeast-1 ./install.shAvailable options:
--region REGION- AWS region (default: us-west-2)--cluster-name NAME- EKS cluster name (default: openclaw-kata-eks)--help- Show help message
The script will:
- Check prerequisites (aws cli, kubectl, terraform, helm)
- Initialize Terraform
- Plan and apply infrastructure
- Configure kubectl
- Wait for cluster to be ready
- Display next steps
After successful deployment, Terraform will display:
Outputs:
cluster_endpoint = "https://XXXXX.gr7.us-west-2.eks.amazonaws.com"
cluster_name = "openclaw-kata-eks"
configure_kubectl = "aws eks --region us-west-2 update-kubeconfig --name openclaw-kata-eks"
grafana_access = "kubectl port-forward -n monitoring svc/kube-prometheus-stack-grafana 3000:80"
grafana_admin_password = <sensitive>
grafana_service_name = "kube-prometheus-stack-grafana"
kata_namespace = "kata-system"
litellm_db_admin_password = <sensitive>
litellm_db_password = <sensitive>
openclaw_namespace = "openclaw"
This deployment creates:
Infrastructure:
- EKS cluster (v1.31) with core node group (m5.xlarge, 1-3 nodes)
- VPC with 3 availability zones, public/private subnets, NAT gateway
- EBS CSI driver with IRSA role and default StorageClass (gp3, 100Gi root volume)
Compute & Runtime:
- Kata containers runtime (QEMU hypervisor) on bare-metal instances
- Karpenter NodePool for autoscaling (m5.metal, m5d.metal, c5.metal, c5d.metal)
- Bare-metal nodes with RAID0 NVMe setup (200Gi root volume)
- Agent sandbox controller with CRDs
AI & Proxy:
- LiteLLM proxy with standalone PostgreSQL (50Gi EBS storage)
- Claude Opus 4.6 model via AWS Bedrock cross-region inference (us-east-1)
- EKS Pod Identity for secure Bedrock access
Monitoring:
- Prometheus with 50Gi EBS storage (15 day retention)
- Grafana with 10Gi EBS storage (ClusterIP service)
- ServiceMonitor for LiteLLM metrics collection
Configure kubectl:
aws eks --region us-west-2 update-kubeconfig --name openclaw-kata-eksVerify the deployment:
# Check cluster
kubectl get nodes
# Check Kata runtime
kubectl get runtimeclass
# Check LiteLLM
kubectl get pods -n litellm
# Check sandboxes
kubectl get sandbox -ALiteLLM is deployed as an internal service accessible within the cluster:
- Service:
litellm.litellm.svc.cluster.local:4000 - Database: Standalone PostgreSQL with
STORE_MODEL_IN_DB=True - UI: Port-forward to access the admin UI:
kubectl port-forward -n litellm svc/litellm 4000:4000
Retrieve the master key:
kubectl get secret litellm-masterkey -n litellm -o jsonpath='{.data.masterkey}' | base64 -d- Port-forward the LiteLLM service:
kubectl port-forward -n litellm svc/litellm 4000:4000-
Open
http://localhost:4000/uiin your browser -
Login with the master key (retrieved above)
-
Navigate to Models → Add Model and fill in:
- Model Name: e.g.
Qwen/Qwen2.5-72B-Instruct - Provider:
openai(for any OpenAI-compatible API) - API Base: your provider's endpoint, e.g.
https://api.siliconflow.cn/v1 - API Key: your provider's API key
- Model Name: e.g.
-
Click Save
Models can also be configured directly in litellm.tf:
set {
name = "proxy_config.model_list[0].model_name"
value = "Qwen/Qwen2.5-72B-Instruct"
}
set {
name = "proxy_config.model_list[0].litellm_params.model"
value = "openai/Qwen/Qwen2.5-72B-Instruct"
}
set {
name = "proxy_config.model_list[0].litellm_params.api_base"
value = "https://api.siliconflow.cn/v1"
}
set {
name = "proxy_config.model_list[0].litellm_params.api_key"
value = "<your-api-key>"
}Then run terraform apply to update.
Generate a dedicated API key for OpenClaw sandboxes:
MASTER_KEY=$(kubectl get secret litellm-masterkey -n litellm -o jsonpath='{.data.masterkey}' | base64 -d)
YOUR_LITELLM_API_KEY=$(kubectl run -n litellm gen-key --rm -i --restart=Never \
--image=public.ecr.aws/docker/library/busybox:1.33.1 -- \
wget -qO- --post-data='{"models": ["Qwen/Qwen2.5-72B-Instruct"], "duration": "30d"}' \
--header="Authorization: Bearer $MASTER_KEY" \
--header="Content-Type: application/json" \
http://litellm:4000/key/generate | grep -o '"key":"[^"]*"' | cut -d'"' -f4)
echo "YOUR_LITELLM_API_KEY: $YOUR_LITELLM_API_KEY"Use the returned key value as the apiKey in OpenClaw's LiteLLM provider configuration.
MASTER_KEY=$(kubectl get secret litellm-masterkey -n litellm -o jsonpath='{.data.masterkey}' | base64 -d)
kubectl run -n litellm test --rm -i --restart=Never \
--image=public.ecr.aws/docker/library/busybox:1.33.1 -- \
wget -qO- --post-data='{"model": "Qwen/Qwen2.5-72B-Instruct", "messages": [{"role": "user", "content": "Hi"}], "max_tokens": 20}' \
--header="Authorization: Bearer $MASTER_KEY" \
--header="Content-Type: application/json" \
http://litellm:4000/v1/chat/completionsCreate a Slack app at https://api.slack.com/apps:
- Enable Socket Mode
- Add Bot Token Scopes:
chat:writeim:writeim:historychannels:historygroups:historympim:history
- Install app to workspace
- Copy Bot Token (
xoxb-...) and App Token (xapp-...)
Update examples/openclaw-slack-sandbox.yaml with your credentials:
cd ${OPENCLAW_HOME}/examples
# Set Slack tokens
export SLACK_BOT_TOKEN="xoxb-..."
export SLACK_APP_TOKEN="xapp-..."
# Replace placeholders (LITELLM_API_KEY already set from previous step)
sed -i.bak \
-e "s/YOUR_LITELLM_API_KEY/${LITELLM_API_KEY}/g" \
-e "s/YOUR_BOT_TOKEN/${SLACK_BOT_TOKEN}/g" \
-e "s/YOUR_APP_TOKEN/${SLACK_APP_TOKEN}/g" \
openclaw-slack-sandbox.yamlkubectl apply -f openclaw-slack-sandbox.yaml
# Monitor deployment
kubectl get sandbox openclaw-slack-sandbox
kubectl get pods -l sandbox=openclaw-slack-sandbox -w
# Check logs
kubectl logs -f openclaw-slack-sandboxThe sandbox will:
- Create a Kata VM-isolated pod on bare-metal nodes
- Mount a 2Gi EBS volume for workspace persistence
- Connect to LiteLLM proxy for Claude Opus 4.6 access
- Connect to Slack via Socket Mode
Once deployed, test the integration:
- Open your Slack workspace
- Find the bot in the Apps section or direct message it
- Send a message like "Hello" or "What can you do?"
- The bot should respond using Claude Opus 4.6 via LiteLLM
Troubleshooting:
- If no response, check pod logs:
kubectl logs -f openclaw-slack-sandbox - Verify Socket Mode is enabled in Slack app settings
- Ensure bot has correct permissions and is installed to workspace
Create a Feishu app at https://open.feishu.cn/:
- Get App ID and App Secret
- Configure event subscriptions
- Add required permissions
Update examples/openclaw-feishu-sandbox.yaml with your credentials:
cd ${OPENCLAW_HOME}/examples
# Set Feishu credentials
export FEISHU_APP_ID="cli_..."
export FEISHU_APP_SECRET="..."
# Replace placeholders (LITELLM_API_KEY already set from previous step)
sed -i.bak \
-e "s/YOUR_LITELLM_API_KEY/${LITELLM_API_KEY}/g" \
-e "s/YOUR_FEISHU_APP_ID/${FEISHU_APP_ID}/g" \
-e "s/YOUR_FEISHU_APP_SECRET/${FEISHU_APP_SECRET}/g" \
openclaw-feishu-sandbox.yamlkubectl apply -f openclaw-feishu-sandbox.yaml
# Monitor deployment
kubectl get sandbox openclaw-feishu-sandbox
kubectl get pods -l sandbox=openclaw-feishu-sandbox -w
# Check logs
kubectl logs -f openclaw-feishu-sandboxThe sandbox will:
- Create a Kata VM-isolated pod on bare-metal nodes
- Mount a 2Gi EBS volume for workspace persistence
- Connect to LiteLLM proxy for Claude Opus 4.6 access
- Connect to Feishu via webhook
Once deployed, test the integration:
- Open your Feishu app
- Find the bot in the app list or direct message it
- Send a message like "你好" or "What can you do?"
- The bot should respond using Claude Opus 4.6 via LiteLLM
Troubleshooting:
- If no response, check pod logs:
kubectl logs -f openclaw-feishu-sandbox - Verify webhook URL is correctly configured in Feishu app settings
- Ensure app has correct permissions and event subscriptions
- Check that App ID and App Secret are correct
Hermes Agent is an open-source AI agent by Nous Research with persistent memory, self-improving skills, and multi-platform messaging support. This section deploys Hermes Agent as a Kata-isolated sandbox with Feishu integration, using LiteLLM for model access.
- LiteLLM is deployed and accessible (see LiteLLM Configuration)
- A Feishu app created at open.feishu.cn with:
- Bot capability enabled
- Permissions:
im:message,im:message:send_as_bot - Event subscription:
im.message.receive_v1 - App published and approved
MASTER_KEY=$(kubectl get secret litellm-masterkey -n litellm -o jsonpath='{.data.masterkey}' | base64 -d)
kubectl port-forward svc/litellm -n litellm 4000:4000 &
PF_PID=$!
sleep 3
HERMES_API_KEY=$(curl -s -X POST "http://localhost:4000/key/generate" \
-H "Authorization: Bearer $MASTER_KEY" \
-H "Content-Type: application/json" \
-d '{"key_alias": "hermes-agent"}' | python3 -c "import sys,json;print(json.load(sys.stdin)['key'])")
kill $PF_PID 2>/dev/null
echo "HERMES_API_KEY: $HERMES_API_KEY"kubectl create ns hermes
# LiteLLM API key
kubectl create secret generic hermes-litellm-key -n hermes \
--from-literal=api-key="${HERMES_API_KEY}"
# Feishu credentials
kubectl create secret generic hermes-feishu -n hermes \
--from-literal=app-id='YOUR_FEISHU_APP_ID' \
--from-literal=app-secret='YOUR_FEISHU_APP_SECRET'kubectl apply -f - <<'EOF'
apiVersion: v1
kind: ConfigMap
metadata:
name: hermes-config
namespace: hermes
data:
config.yaml: |
model:
default: "claude-opus-4-6"
provider: "custom"
base_url: "http://litellm.litellm.svc.cluster.local:4000/v1"
terminal:
backend: "local"
cwd: "/mnt/workspace"
timeout: 180
agent:
max_turns: 60
memory:
memory_enabled: true
user_profile_enabled: true
display:
streaming: true
EOFAdjust model.default to match a model name configured in your LiteLLM instance. List available models:
MASTER_KEY=$(kubectl get secret litellm-masterkey -n litellm -o jsonpath='{.data.masterkey}' | base64 -d)
kubectl run -n litellm list-models --rm -i --restart=Never \
--image=curlimages/curl -- \
curl -s http://litellm:4000/v1/models -H "Authorization: Bearer $MASTER_KEY" | python3 -c "import sys,json;[print(m['id']) for m in json.load(sys.stdin)['data']]"Save the following as examples/hermes-feishu-sandbox.yaml:
apiVersion: agents.x-k8s.io/v1alpha1
kind: Sandbox
metadata:
name: hermes-feishu-sandbox
namespace: hermes
spec:
podTemplate:
metadata:
labels:
sandbox: hermes-feishu-sandbox
spec:
runtimeClassName: kata-qemu
automountServiceAccountToken: true
securityContext:
runAsUser: 1000
runAsGroup: 1000
fsGroup: 1000
nodeSelector:
katacontainers.io/kata-runtime: "true"
tolerations:
- key: kata
operator: Equal
value: "true"
effect: NoSchedule
containers:
- name: hermes
image: nousresearch/hermes-agent:latest
imagePullPolicy: IfNotPresent
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: false
runAsNonRoot: true
capabilities:
drop:
- ALL
command:
- sh
- -c
- |
mkdir -p /mnt/workspace/.hermes
cp /config/config.yaml /mnt/workspace/.hermes/config.yaml
exec /opt/hermes/.venv/bin/hermes gateway
env:
- name: HERMES_HOME
value: "/mnt/workspace/.hermes"
- name: OPENAI_API_KEY
valueFrom:
secretKeyRef:
name: hermes-litellm-key
key: api-key
- name: FEISHU_APP_ID
valueFrom:
secretKeyRef:
name: hermes-feishu
key: app-id
- name: FEISHU_APP_SECRET
valueFrom:
secretKeyRef:
name: hermes-feishu
key: app-secret
- name: FEISHU_DOMAIN
value: "feishu"
- name: FEISHU_CONNECTION_MODE
value: "websocket"
- name: GATEWAY_ALLOW_ALL_USERS
value: "true"
resources:
requests:
cpu: "2"
memory: "4Gi"
limits:
cpu: "2"
memory: "4Gi"
volumeMounts:
- mountPath: /mnt/workspace
name: workspaces-pvc
- mountPath: /config
name: config
volumes:
- name: config
configMap:
name: hermes-config
volumeClaimTemplates:
- metadata:
name: workspaces-pvc
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10GiDeploy:
kubectl apply -f examples/hermes-feishu-sandbox.yaml# Check sandbox status
kubectl get sandbox hermes-feishu-sandbox -n hermes
# Watch pod come up
kubectl get pods -n hermes -l sandbox=hermes-feishu-sandbox -w
# Check logs for Feishu WebSocket connection
kubectl logs -n hermes -l sandbox=hermes-feishu-sandbox --tail=20You should see:
⚕ Hermes Gateway Starting...
[Lark] connected to wss://msg-frontier.feishu.cn/ws/v2?...
- Open Feishu and find the bot in your app list
- Send a direct message like "你好" or "Hello"
- Hermes should respond via Claude through LiteLLM
| Feature | OpenClaw | Hermes Agent |
|---|---|---|
| Config format | JSON in command | ConfigMap + K8s Secrets |
| Credentials | Inline in pod spec | K8s Secrets via secretKeyRef |
| Feishu connection | Webhook | WebSocket (no public URL needed) |
| Memory | Stateless | Persistent memory + skills |
| Gateway command | node dist/index.js gateway |
hermes gateway |
| Model config | openclaw.json providers |
config.yaml with provider: custom |
- Feishu App Secret and LiteLLM API key are stored as K8s Secrets, not inline
- Set
GATEWAY_ALLOW_ALL_USERS=falseand useFEISHU_ALLOWED_USERS=ou_xxx,ou_yyyfor production - Kata VM isolation provides kernel-level separation from the host
- All Linux capabilities are dropped; privilege escalation is disabled
- Hermes WebSocket mode requires only outbound connectivity — no ingress needed
| Problem | Solution |
|---|---|
| No response in Feishu | Check im.message.receive_v1 event subscription is enabled in Feishu console |
All unauthorized users will be denied |
Set GATEWAY_ALLOW_ALL_USERS=true or configure FEISHU_ALLOWED_USERS |
Invalid model name |
Run the list-models command above and update model.default in ConfigMap |
hermes: executable file not found |
Binary is at /opt/hermes/.venv/bin/hermes, not in PATH |
| Pod stuck in Pending | Check Karpenter is provisioning bare-metal nodes: kubectl logs -n karpenter -l app.kubernetes.io/name=karpenter |
| PVC Multi-Attach error | Scale deployment to 0 first, wait for old pod to terminate, then scale back up |
Get the admin password:
# Via Terraform
terraform output -raw grafana_admin_password
# Or via kubectl
kubectl get secret -n monitoring kube-prometheus-stack-grafana -o jsonpath='{.data.admin-password}' | base64 -d && echoAccess Grafana via port-forward:
kubectl port-forward -n monitoring svc/kube-prometheus-stack-grafana 3000:80Open http://localhost:3000 in your browser:
- Username:
admin - Password: Use the password from the command above
A pre-configured Grafana dashboard for LiteLLM is provided in examples/grafana/grafana_dashboard.json.
To import:
- Access Grafana (see above)
- Navigate to Dashboards → Import
- Upload the dashboard file:
examples/grafana/grafana_dashboard.json
- Select the Prometheus data source
- Click Import
The dashboard includes:
- Proxy Level Metrics: Total requests, failed requests, request rates
- Token Metrics: Input/output tokens, total tokens per model
- Latency Metrics: Request latency, LLM API latency, time to first token
- Deployment Metrics: Success/failure rates per deployment
- Budget Metrics: Spend tracking per team, user, and API key
- Rate Limit Metrics: Remaining requests and tokens
Check that Prometheus is scraping LiteLLM metrics:
# Check ServiceMonitor
kubectl get servicemonitor -n litellm
# Check Prometheus targets (via port-forward)
kubectl port-forward -n monitoring svc/kube-prometheus-stack-prometheus 9090:9090
# Open http://localhost:9090/targets and look for litellmTest metrics endpoint directly:
kubectl run -n litellm test-metrics --rm -i --restart=Never --image=curlimages/curl -- \
curl -s http://litellm:4000/metrics | head -20# List all sandboxes
kubectl get sandbox -A
# Describe sandbox
kubectl describe sandbox openclaw-slack-sandbox
# Check pod status
kubectl get pods -l sandbox=openclaw-slack-sandbox
# View logs
kubectl logs openclaw-slack-sandbox --tail=50
# Check events
kubectl get events --sort-by='.lastTimestamp' | grep openclaw# Check pods
kubectl get pods -n litellm
# View LiteLLM logs
kubectl logs -n litellm deployment/litellm --tail=50
# View PostgreSQL logs
kubectl logs -n litellm litellm-postgresql-0 --tail=50
# Test health endpoint
kubectl run -n litellm test --rm -i --restart=Never --image=curlimages/curl -- \
curl -s http://litellm:4000/health/readiness# Check Kata nodes
kubectl get nodes -l katacontainers.io/kata-runtime=true
# Check nodepool
kubectl get nodepool kata-bare-metal -o yaml
# Check Karpenter logs
kubectl logs -n karpenter -l app.kubernetes.io/name=karpenter --tail=50Symptom: Sandbox pod remains in Pending state
Solution: Check if Karpenter provisioned bare-metal nodes
kubectl get nodes
kubectl describe nodepool kata-bare-metal
kubectl logs -n karpenter -l app.kubernetes.io/name=karpenter | grep kataSymptom: Pod crashes with "Invalid config" errors
Solution: Check OpenClaw logs for specific field errors
kubectl logs openclaw-slack-sandbox | grep -A 5 "Invalid"Common fixes:
- Ensure
apiKey(notauth) is used for LiteLLM - Use
openai-completionsAPI type - Verify API key is valid and not expired
Symptom: "Authentication Error" or "Invalid API key"
Solution: Verify API key and regenerate if needed
# Check if key exists
MASTER_KEY=$(kubectl get secret litellm-masterkey -n litellm -o jsonpath='{.data.masterkey}' | base64 -d)
# List existing keys
kubectl run -n litellm list-keys --rm -i --restart=Never --image=curlimages/curl -- \
curl -s http://litellm:4000/key/info \
-H "Authorization: Bearer $MASTER_KEY"
# Generate new key if needed
kubectl run -n litellm gen-key --rm -i --restart=Never --image=curlimages/curl -- \
curl -s -X POST http://litellm:4000/key/generate \
-H "Authorization: Bearer $MASTER_KEY" \
-H "Content-Type: application/json" \
-d '{"models": ["claude-opus-4-6"], "duration": "30d"}'Symptom: "AccessDenied" errors when calling Bedrock
Solution: Verify Pod Identity association
# Check Pod Identity associations
aws eks list-pod-identity-associations \
--cluster-name openclaw-kata-eks \
--region us-west-2
# Check IAM role
aws iam get-role \
--role-name openclaw-kata-eks-litellm-pod-identity \
--region us-west-2
# Verify role has Bedrock permissions
aws iam list-attached-role-policies \
--role-name openclaw-kata-eks-litellm-pod-identity \
--region us-west-2Symptom: Config changes not taking effect
Solution: Delete PVC to force config recreation
# Delete sandbox and PVC
kubectl delete sandbox openclaw-slack-sandbox
kubectl delete pvc workspaces-pvc-openclaw-slack-sandbox
# Recreate
kubectl apply -f examples/openclaw-slack-sandbox.yamlRemove all resources:
# Delete sandboxes first
kubectl delete sandbox --all
# Wait for pods to terminate
kubectl get pods -A | grep openclaw
# Destroy infrastructure
terraform destroyNote: Terraform destroy will remove all resources including EBS volumes and data.
- Bare-metal instances (m5.metal, c5.metal) are expensive (~$4-6/hour)
- Only provisioned when sandboxes are created
- Karpenter automatically scales down after 30 seconds of inactivity
- Core node group runs continuously (~$0.04/hour per t3.medium)
- EBS volumes charged per GB-month (~$0.08/GB-month for gp3)
- LiteLLM PostgreSQL uses 8Gi EBS volume
- Bedrock charges per token (input/output)
Recommendations:
- Use Spot instances for non-production workloads
- Set appropriate Karpenter consolidation policies
- Monitor Bedrock usage via CloudWatch
- Delete unused sandboxes and PVCs
- Secrets Management: Master key and API keys stored in Kubernetes secrets
- IAM: EKS Pod Identity for Bedrock access (no long-lived credentials)
- Isolation: Sandboxes run in Kata VMs with separate kernel
- Network: Consider adding NetworkPolicies for additional isolation
- Tokens: Rotate Slack/Feishu tokens regularly
- API Keys: Set expiration on LiteLLM API keys (default 30 days)
- Audit: Enable EKS audit logging for compliance
- OpenClaw Documentation
- LiteLLM Documentation
- Kata Containers
- Karpenter
- EKS Best Practices
- AWS Bedrock
Each OpenClaw sandbox runs with:
- Runtime:
kata-qemu- Lightweight VM isolation using QEMU hypervisor - Node Selector:
katacontainers.io/kata-runtime: "true"- Scheduled only on bare-metal instances - Tolerations:
kata=true:NoSchedule- Ensures placement on dedicated Kata nodes - Service Account:
openclaw-sandbox- No IAM permissions (LiteLLM handles Bedrock access) - Security Context:
- Runs as non-root user (UID 1000, GID 1000)
- All Linux capabilities dropped
- Privilege escalation disabled
- Read-only root filesystem disabled (required for OpenClaw)
- Volume Type: EBS gp3 via CSI driver
- Size: 2Gi per sandbox
- Mount Path:
/home/node/.openclaw - Access Mode: ReadWriteOnce
- Lifecycle: Persists across pod restarts, deleted with sandbox
- Config Behavior:
- Created on first run if not exists
- Preserved on pod restart
- Manual edits persist
OpenClaw sandboxes connect to LiteLLM proxy with:
{
"models": {
"providers": {
"litellm": {
"baseUrl": "http://litellm.litellm.svc.cluster.local:4000",
"apiKey": "<generated-key>",
"api": "openai-completions",
"models": [{
"id": "claude-opus-4-6",
"name": "Claude Opus 4.6 (LiteLLM)",
"reasoning": true,
"input": ["text", "image"],
"contextWindow": 200000,
"maxTokens": 8192
}]
}
}
},
"agents": {
"defaults": {
"model": { "primary": "litellm/claude-opus-4-6" }
}
}
}Key Points:
- Uses cluster-internal service DNS (no external network required)
- API key generated via LiteLLM master key
- OpenAI-compatible API format
- LiteLLM handles AWS Bedrock authentication via Pod Identity
Per Sandbox Pod:
- CPU: No limits (burstable)
- Memory: No limits (burstable)
- Storage: 2Gi EBS volume
- Network: Cluster network access
Bare-metal Node (when provisioned):
- Instance Types: m5.metal, m5d.metal, c5.metal, c5d.metal
- CPU: 96 cores (m5.metal), 48 cores (c5.metal)
- Memory: 384GB (m5.metal), 192GB (c5.metal)
- NVMe: RAID0 configured for containerd devicemapper
- Root Volume: 200Gi gp3
Slack:
- Connection: Socket Mode (WebSocket)
- Authentication: Bot Token + App Token
- Permissions: chat:write, im:write, im:history, channels:history
- DM Policy: Open (accepts DMs from all users)
Feishu:
- Connection: Webhook
- Authentication: App ID + App Secret
- Permissions: Configured in Feishu app settings
- DM Policy: Open (accepts messages from all users)
- Pod Network: Uses VPC CNI (10.1.0.0/16)
- Service Discovery: Kubernetes DNS
- LiteLLM Access: ClusterIP service in
litellmnamespace - External Access: None (sandboxes are internal only)
- Channel Access: Outbound HTTPS to Slack/Feishu APIs
