The Agent’s Blindfold: Architecting a Zero-Trust Sandbox for Local AI

In my last post, The Backup That Refused to Be Scheduled, we established a hard truth: Schedulers run payloads, not intent. We moved from “Hope-based Security” (Vibes) to “Architecture-based Security” (Parameters).

But a scheduled backup is just a deterministic script. What happens when your payload is a fully autonomous LLM agent with the ability to write code, call APIs, and execute system commands?

You run into the Agentic Security Paradox: You need the AI to have enough access to your network to be useful. But if you give it the keys to your infrastructure, a single prompt-injection attack (or a hallucinated logic loop) can nuke your entire homelab.

So, what are the options?. Most developers stop at creating a .env file and tossing the agent into a basic Docker bridge network. A “.env" file is a gift to an attacker. A VPN only protects the tunnel, not the endpoints. Docker networking is a software-level abstraction easily bypassed by a kernel-level vulnerability.

To safely run OpenClaw, we stopped relying on software promises and built a Physicalized Zero-Trust Model. This is a work in progress (our “Mark I” architecture) but if you are building an AI homelab and want to sleep at night, here is the blueprint of our “very peculiar” setup, the services that power it, and the compromises we made.

The Architecture: A Tiered Trust Model

We divided the environment into strictly segregated zones, connected only by a Zero-Trust Mesh (Tailscale integrated with Keycloak OIDC). Even if a device is physically plugged into our switch, it must authenticate against the central identity provider and be explicitly authorized to join the mesh.

Here is exactly how the network is sliced, service by service.

Tier 1: The Infrastructure Backbone (VLAN 10)

Hosted on our Management Server (a dedicated Proxmox hypervisor). This is the control plane. No AI inference happens here. We isolate these services so that if an LLM goes rogue, the house doesn’t stop functioning.

  • NetBootstrap (DNS/DHCP): The source of truth for the network. By keeping this in VLAN 10, we ensure that an AI agent hallucinating a million requests a second cannot exhaust the DNS resolver that keeps our NAS and security cameras online.
  • Keycloak (OIDC/Identity Realms): The bouncer. It manages who is who. Tailscale relies on this to authorize nodes. OpenClaw does not get to dictate its own identity; Keycloak issues the tokens.
  • APISix (The API Gateway): The gauntlet. All traffic from the agent to local services must pass through here. It enforces strict rate-limiting and logging. If OpenClaw tries to DDoS a third-party service or an internal app, APISix drops the packets. It helps providing a “paper trail” for every “thought” the agent tries to turn into a network action.
  • LiteLLM (The LLM Proxy / LMOPS): The universal translator. OpenClaw never talks to an LLM directly. It talks to APISix ant then to LiteLLM, which then routes the request to our Ollama service with its local GPU or to external providers (Gemini, BedRock, OpenAI) based on our predefined matrix. This abstraction means OpenClaw never possesses the API keys for the cloud providers but it also introduces a single point of failure for inference services (see bellow for our backup in case LiteLLM fails).
  • VaultBridge (The Secret Proxy): The most critical piece of the “Vibe-less” security. OpenClaw never handles database passwords or API tokens. When a Tool needs a secret, it sends a generic tool-call request to VaultBridge. The bridge intercepts it, retrieves the actual secret from our Secret Manager, executes the payload, and returns only the safe result to Tool the agent uses. The Secret is never known by the AI, an (as the secret has a TTL, it is forgoten by the tool that called it and never written to a file or log). This measn that to the AI, the tool just works and the secret is a black box. Cryptographic Secret Blindness.

Tier 2: The Compute Heavyweight (Inference Computer)

This is a dedicated workstation (Ryzen 9, RTX 3090 Ti) tasked solely with running local LLM inference via Ollama.

The logic: We physically decoupled the “Brain” (Inference) from the “Control Plane” (Management Server). AI models are memory hogs. When an LLM loos or spills over its VRAM into system RAM, it can cause kernel panics or complete system locks. By physically separating this machine, a VRAM Out-Of-Memory (OOM) collapse on the Inference Computer does not take down our DNS, our API Gateway, or our identity provider.

Tier 3: The Red-Zone (VLAN 30)

This is the “Clean Room.” The OpenClaw Client Machine (running on it’s own hardware) lives here. It is physically and logically isolated.

The logic: We treat the agent as an untrusted guest. It has no lateral visibility into our primary LAN or our IoT devices (also segregated in their own network). If a malicious actor manages a prompt-injection attack that takes full control of the OpenClaw agent, they find themselves trapped on a firewalled machine with no physical route to the NAS or any other valuable service internal to the Homblab (with the exception of the services in the agent’s toolbox).

The Ephemeral Docker Sandbox: Agents write code. Sometimes they write brilliant testing frameworks; sometimes they write “rm -rf /" by accident. To mitigate this, OpenClaw does not execute code on its own host OS. We implemented a Dockerized sandbox inside VLAN 30. When OpenClaw generates a script to test a theory, it pushes it into an isolated, ephemeral Docker container, executes it, reads the output, and destroys the container. We surgically limit the blast radius of the AI’s “imagination.” We recognize that It’s not an air-gapped solution but will work in the mean-time.

The Out-of-Band Kill Switch & Route Fallbacks

AI agents hallucinate. Sometimes, that hallucination turns into a high-speed logic loop that burns through thousands of cloud API tokens per minute. If you rely on logging into your management UI to stop it, you are too slow.

To solve this, we implemented an Out-of-Band (OOB) Kill Switch using n8n automation and a 3rd party SMS service (perhaps we can later build our own mobile app with a big red “stop” button).

If the primary local inference route (Inferece Machine) fails repeatedly over a set retry period, APISix triggers a fallback route to keep the agent alive. However, this fallback, and any other external routing, is strictly governed by the n8n automation. If the agent goes rogue or token expenditure spikes, we don’t need a laptop. One of the team members sends a specific SMS text message. The 3rd party service triggers the n8n webhook, which instantly reconfigures APISix to sever LiteLLM and fallback routes or kill the OpenClaw connection entirely.

A software “stop” button on a dashboard is cool too, but a pervarsive SMS text that severs the proxy route out-of-band is a better safety net for an Engineer that may be caught away from his desk or his own phone. Obviously, the opposite is also possible (reconnect the entire route mapping with an SMS and the proper OTP).

The Homelabber’s Reality Check: The Cost of Paranoia

This setup is architecturally sound, but it comes with severe friction.

1. The “Rube Goldberg” Latency: A single request from the agent travels from VLAN 30 -> APISix Gateway -> LiteLLM Proxy -> Ollama in the Inference Computer -> back through the chain. Every single hop adds milliseconds. For text generation, it’s manageable. For real-time agentic loops, it’s the network equivalent of going through TSA airport security just to use the bathroom in your own house. Paraphrasing Steve Gibson: “Security and Convenience are opossed extremes”.

2. The Identity Crisis Debugging Loop: When something fails, it fails silently and securely. With Keycloak, Tailscale, APISix, and LiteLLM all having their own strict “opinions” on who the user is and what they can access, debugging a simple 403 Forbidden error occasionally turns into a weekend-long archaeological dig through four different, decentralized log files (we are actively working on observability implementation as remediation).

Next Steps: The Paranoia Continues

As I mentioned, this is a work in progress. While the architecture enforces strict boundaries, there are still gaps in the parameters that we are engineering away in the next iterations:

  • From Delegated Trust to True Zero-Trust: We currently use Tailscale. While the data plane is peer-to-peer, the control plane is a SaaS product. We are placing a root of trust in an external entity. The next architectural evolution may involve migrating to Headscale to achieve true, air-gapped homelab sovereignty but also imposes more IT management burden.
  • Hardening the Sandbox Illusion: A Docker container is only as safe as its daemon. An LLM explicitly instructed to map networks could potentially write a container-escape exploit. Our next step is enforcing rootless Docker mode, perhaps applying strict AppArmor profiles, and mounting read-only filesystems to turn the software sandbox into a true digital concrete bunker.

The Skeptic’s Counterpoint

During this endeavor, I’ve heard objections: “This is absurdly over-engineered. Why not just run a local model on your MacBook and be done with it?”, “Why invest so much time in this?, let’s start with the agentic work already!”, “It has too many moving parts”, “Why are you spending that much time in the lab?” (that lastone is my wife :P)

For some of us, the future of technology isn’t just chatting with a code assistant. It is delegating autonomous tasks to agentic workflows, more capable AI systems and even robots. If you treat an autonomous agent as a “privileged user” on your primary machine, you are relying potentially on “very costly vibes”. Having a rouge agent or compromised LLM can cost from hundreds to portentially hundred of thousands of dollars.

Additionally, I do not want an agent (or swarm of agents) that “act like me” (or worst: “as me”). Our aim is to have agents that work with us as other collaborators do, having their own environments, their own scope of work, being able collaborate and to be delegated work with sinergy and within their own boundaries.

My team and I are not ready to let the AI have the keys to the castle (perhaps some day in the future). Agentic AI is still a moving target and while the dust settles we’ll keep evolving our infrastructure, networking setup, agentic capabilities and compute power, but for now, my recomendation is this:

Do the work! Build the pipeline, enforce the hardware boundaries, and give you AI a very specific, heavily monitored window to look through and then start incrementally trusting it to work with you, not as you.