ProbableOdyssey | Blake Cook

Your AI Agent is Unsafe and Sandboxes Won't Save It

· 3 min read · 586 words

Table of Contents

It has been impossible to avoid the sheer amount of coverage of OpenClaw (formerly MoltBot, formerly ClawdBot). For those who have managed to avoid it, OpenClaw is an open source implementation of an AI agent that “can manage your life”. You can hook it up to a messaging app (such as Telegram or WhatsApp), install plugins to give access to your email and your calendars and hook it all up to an LLM backend such as ChatGPT, Claude, or Gemini.

Essentially, it’s an AI secretary. OpenClaw has drastically simplified the process of integrating a CRON job and an LLM with large aspects of your digital life. While people are installing this in a “sandboxed” VPS and letting loose with this program, every part of my being shudders at the complete disregard for security. Adding insult to injury, this post on XDA claims to have found over 21,000 unsecured instances of OpenClaw!

Did the last 40+ years never happen in terms of software security? Have we seriously learned nothing? Or are we so willing to throw these valuable lessons out the window as soon as a shiny toy comes along

One of my favourite presentations on this topic is “the lethal trifecta” by Simon Willison on LLM security. I highly recommend checking it out, but one of the key takeaways from the talk is that an LLM that has

  1. Access to private data
  2. Ability to externally communicate
  3. Exposure to untrusted content

Then you’ve stumbled into the lethal trifecta, and a major security breach is one prompt injection attack away

In systems that interface with databases, tremendous care is taken to sanitise any user input so that a malicious actor can’t execute a query on a database by feeding in an SQL scripts into the user input. Today with LLMs, this type of attack is no longer bound to simple SQL statements — LLM prompt injections are written in plain natural language, which can easily be obfuscated within a larger body of text or hidden in a webpage via text that doesn’t render (such as a comment in HTML or SVG, or even plan text which is styled to render invisibly on a web browser)

For the love of all that is holy and otherwise — DO NOT DO THIS

Filtering out prompt injections is still an ongoing field of research, and it’s remarkably easy to circumvent any current measures put in place

Using a sandbox for OpenClaw might be enough to prevent any destructive actions happening to the computer, but access to private data and ability to communicate with the web are fundamentally flawed modes of operation

This is yet another LLM/AI software that is incredibly unsafe to use for people (especially those with a little expertise, at the top of the Dunning-Krueger curve). Another recent example is Google Antigravity, which again has almost unrestricted input from arbitrary webpages it processes, providing one of the largest attack vectors I have ever seen

And what is the response or protection? A small disclaimer that people are already accustomed to tuning out

While it is an exciting time, I’m incredibly disappointed by the negligent practices of AI developers churning out software to chase the hype at the expense of safety

I take issue with a lot of the culture surrounding AI and LLMs and how their outputs are trusted more than they ought to be. We are on track for a challenger-level disaster, adding these massive security holes on top of that larger issue is beyond horrifying to me

Reply to this post by email blZake@proZbableodyssey.blog (remove Z characters) ↪