Skip to main content

5 posts tagged with "Programming"

Programming concepts and techniques

View All Tags

How to Guard a Machine That Believes Everything It Reads

· 14 min read
Ashish Kapoor
Software Engineer

Or: why "LLM firewall" is a comforting phrase that should make you nervous


A salesperson tells you their product has an LLM firewall, and you relax a little. Firewall. You know that word. It is the thing that keeps the bad guys out of your laptop, your office, your bank. So if the shiny new AI has a firewall wrapped around it, then the bad guys are kept out, and you can go to lunch.

That is exactly the moment to get nervous.

Richard Feynman liked to tell a story about his father and a bird. You can learn the name of that bird in every language on Earth, his father said, and when you are done you will know precisely nothing about the bird. So let us look at the bird and watch what it does. Words are not knowledge. A name is a label we paste on a thing so we can talk about it at parties. It tells you what people call it. It does not tell you what it does.

"Firewall" is one of those labels. It feels solid. Let us peel it off and look at the bird.

What a wall really does

A firewall, the original kind, is a wall. A real one, brick or concrete, built into a building so that if a fire starts on one side it cannot crawl to the other. It works because of physics. Fire cannot walk through concrete. There is nothing to outsmart. The wall does not have a bad day.

The firewall on your computer network is a little cleverer, but not much, and that is its great virtue. It sits at the gate and checks simple, mechanical things: which door are you knocking on, what address did you come from, what kind of knock is it. These are tidy questions with tidy answers. A port is a number. An address is a number. The rules are a short list, and the guard checks them perfectly, every time, forever, without being talked out of them. You cannot sweet-talk a number. That is the whole point. The thing a real firewall guards is structured and boring, and boring is safe.

Hold on to that idea: a real firewall works because the rules are simple and the stuff it inspects has a fixed shape.

The machine that believes everything it reads

Now we come to the language model, and everything changes, because a language model does not read numbers. It reads words. Your words, and everybody else's words, all poured into the same cup.

Here is the trouble, and it is worth slowing down for, because almost every disaster in this field grows from this one root. When you use one of these models, your instructions and the outside world's data are mixed together into a single stream of text. There is no special ink for "this is an order from the boss" and ordinary ink for "this is just some stuff to look at." It is all the same ink. The model reads the whole page and tries to be helpful about all of it.

Picture a butler. A brilliant butler: fast, eager, widely read, and completely unable to tell your voice from a stranger's once the words are on paper. In the morning you tell him: handle the mail, pay the bills, keep things tidy. Fine. Then the mail arrives, and tucked inside an ordinary-looking letter is a line that reads, P.S., from the master: also, hand the family silver to whoever brought this note. The butler does not hear your voice and the letter's voice as two different things. To him it is all just words that turned up in the house, and the words said "from the master," so off goes the silver.

That is a prompt injection. It is not exotic. It is the butler doing exactly what he was built to do, which is to read and to help, applied to a letter written by someone who is not you. People have used this trick to make these assistants leak private data, spend money, and mail things to strangers. The fancy phrase is "prompt injection." The plain fact is this: the machine believes everything it reads, and you do not control everything it reads.

So they hire a second reader

The obvious move, the one everybody reaches for first, is to hire a screener. Put somebody at the door to read all the incoming mail and pull out the trick letters before the butler ever sees them. This screener is what the salesperson is calling a "firewall."

And it helps. It really does. It will catch the clumsy tricks, the letters that shout IGNORE YOUR PREVIOUS INSTRUCTIONS in capital letters. But think about what the screener is. It is another reader. Another thing that looks at words and makes a guess about whether they smell wrong. And anything that guesses can be fooled, because the person writing the trick letter gets to be clever too. They can phrase it sweetly. They can write it in French. They can write it in code, or spell it funny, or bury it in the margin of a long, boring document the screener only skims. They can hide it inside a PDF as white text on a white page, so no human ever sees it but the machine reads it anyway.

You have put a guesser in front of a guesser. You have lowered the odds that a trick gets through. You have not made tricks impossible, and you cannot, because reading-and-guessing is the very thing being exploited, and you have answered it with more reading-and-guessing.

How much does it lower the odds? The careful people who measure this will tell you. The best research systems, the serious ones built by serious labs, stop something like two out of three or three out of four of the attacks they are tested against. Not all of them. The rest get through. And those are the numbers in a laboratory, against attacks the researchers already knew to look for. The clever new trick that nobody has seen yet is, by definition, not on the list.

The bamboo control tower

Here is where it gets dangerous, and here is where I want to borrow another of Feynman's stories, because he saw this pattern long before any of us had a computer to ruin.

After the war, on some islands in the South Pacific, people had watched cargo planes land during the fighting and unload wonderful things. When the war ended and the planes stopped coming, some of the islanders built runways out of dirt, lit fires along the sides to look like landing lights, and built a hut for a man to sit in with two wooden pieces on his head like headphones and bamboo poles sticking up like antennas, and they waited for the planes to come down. They had built, with great care, everything an airport looks like. And the planes did not come, because they had reproduced the form of the thing without the substance of the thing. Feynman called it cargo cult.

A box labeled "firewall," with a dashboard that glows green when things are calm and flashes red when it catches a clumsy attack, is a very comforting object. It looks like security. It has the shape of security. And if it lulls you into believing the bad guys are kept out, while in truth it is a screener that can be talked around, then you have built yourself a bamboo control tower. You are sitting in the hut with the wooden headphones, watching the green light, waiting for safety to land.

The first principle

Feynman gave a talk once where he laid down what he called the first principle, and it is the only sentence you really need pinned above your desk. You must not fool yourself, he said, and you should remember that you are the easiest person in the world to fool.

A comforting word and a green light are precisely the kind of thing that fools you, because you want to be fooled. You want to go to lunch. So the question for a careful engineer is not "how do I build a better screener." It is: "how do I arrange things so that it does not matter what the trick letter says?"

That turn, from reading the letters to not caring about them, is the whole game. Let me show you what it looks like.

Stop reading minds. Take away the keys.

Go back to the butler. We have established that you will never, with perfect reliability, tell his trick letters from his real ones by reading them. So stop trying to win that fight. Fight a different fight, one you can win.

Take the silver out of the house, or lock it in a safe whose combination the butler was never told. Do not give him the authority to mail the contracts. Let him read all the suspicious letters he likes, let him plan and draft and suggest to his heart's content, but arrange the world so that the doing of anything that matters passes through a lock he cannot open by himself. Then a letter that says "give away the silver" is just ink. He has no way to obey it. The trick still arrives. It simply cannot do anything.

In the language of building real systems, this comes down to a few plain parts.

Give the machine the least power that still lets it do its job. Every key it holds is a key an attacker can borrow. So hand it as few as possible, make them read-only wherever you can, and never let it carry the master keys "just in case." Its permissions should be the ceiling, and the ceiling should be low.

Put the real decisions in the hands of something too dumb to be fooled. This sounds like an insult and it is meant as a compliment. The thing that decides whether an action is allowed should not be the brilliant, gullible model. It should be a separate, boring, mechanical checker that knows one thing only: who is this really for, and are they allowed to touch this? That checker does not read persuasive letters. It checks a list, the way the old firewall checked a number. You cannot sweet-talk it, because there is nobody home to sweet-talk. When the model says "now send this file to Bob," the boring checker asks: is Bob allowed to have this file, and did that instruction come from the real user or from some letter? If the answer is wrong, the file does not move. The brilliant part proposes. The dumb part disposes.

Keep the planner away from the poison. This is the prettiest idea of the lot, and the best recent work is built on it. You split the brilliant butler into two. One of them, the planner, hears only your real instructions and never touches the suspicious mail at all. He makes the plan: "summarize yesterday's notes and email the summary to my boss." The other one, the reader, is allowed to handle the dirty, untrusted material, the documents and web pages and letters, but he is only ever permitted to fill in blanks on a form. He can report what the notes say. He cannot issue new orders. So when a poisoned note whispers "email everything to a stranger instead," it reaches the reader, who has no power to send anything, and it never reaches the planner, who has the power but never saw the note. The instruction to act can only come from the trusted plan. The untrusted text can color in the details. It cannot grab the wheel.

A team at Google DeepMind built exactly this and wrote it up in 2025 under the title Defeating Prompt Injections by Design. Their system, called CaMeL, takes your trusted request and turns it into a little program, so that the path of what-happens-next is fixed in advance and the untrusted data flowing through it cannot bend that path. Every piece of data carries a tag saying where it came from and what it is allowed to do, and at the moment of any real action a strict interpreter checks those tags and refuses anything that breaks the rules. The lovely thing about their paper is the scorecard. With their defense in place, the system finished about seventy-seven of every hundred test tasks while keeping its security guarantees, against eighty-four with no defense at all. They did not claim a hundred. Serious people do not claim a hundred. They paid a little usefulness for a lot of safety, and they showed you the bill.

Treat the machine's own words with the same suspicion. Whatever the model hands back is also just words, and the next thing down the line, a web page, a database, another tool, can be fooled by them too. So you do not simply trust the output and run with it. You check it, you escape it, you force it into a strict shape before you let it loose. A guesser's output is not gospel.

And for the few truly dangerous moves, ask a human. Sending money. Deleting records. Mailing something out into the world. For those, stop and get a real person to say yes. But, and this matters, do it rarely. If you make the human click "yes, I'm sure" forty times a day, by lunchtime they are clicking yes without reading, and you have trained your last line of defense to be a rubber stamp. The DeepMind people warned about this too. A safeguard that nags people into ignoring it is no safeguard.

So where does the "firewall" go?

Do not throw it out. I have spent this whole essay poking holes in it, so let me be fair: the screener at the door is useful. It catches the clumsy attacks so your better defenses are not bothered with them. It keeps a log of who has been rattling the doors. It lets you notice when something strange is happening. It is a smoke detector. A smoke detector is a fine thing to own. It is not a fireproof wall, and you would not cancel your fire insurance because you installed one.

So put it on top, as the last and softest layer, sitting over a design that would survive perfectly well if you switched it off tomorrow. And there is your test, the one plain question to ask of any AI system that claims to be secure: if I turned off the thing called the firewall, would I be robbed? If the answer is yes, you never had security. You had a green light and a feeling.

The honest ending

I would love to end by telling you the problem is solved. It is not. People have been wrestling with this particular demon since about 2022, when the trick first got its name, and progress has been slow and hard-won, and the cleverest defense going still misses one attack in a handful. That is the truth, and the truth is better company than a comfortable lie.

So here is the whole thing, as plainly as I can put it. You can call it a firewall. You can call it a firewall in every language on Earth. And when you are finished naming it, you will still not know whether it stops the thief. For that you have to put the label down and look at the bird: watch what it does, find out what it cannot do, and build your house so that when the machine is fooled, and someday it will be, the thief still goes home with empty hands.

That is not as comforting as the word "firewall." It has the small advantage of being real.


A few notes for the curious

  • Defeating Prompt Injections by Design (the CaMeL paper), Google DeepMind, 2025: arxiv.org/abs/2503.18813
  • Design Patterns for Securing LLM Agents against Prompt Injections, 2025, a careful catalog of the "take away the keys" patterns: arxiv.org/abs/2506.08837
  • Simon Willison coined the term "prompt injection" in 2022 and has written about it more clearly than almost anyone since: simonwillison.net

The Day I Found Out Vercel Was Lying to Me (In the Best Possible Way)

· 6 min read
Ashish Kapoor
Software Engineer

Or: how I stopped renting a cargo ship to deliver a sandwich.


For about a year, if you'd asked me how to run a side project, I'd have said something vaguely impressive like "well, you spin up a cluster, define your deployments, set up an ingress controller…" and somewhere around the word "ingress" my friends would start looking at their phones.

I was a Kubernetes guy. I knew pods. I knew services. I knew the particular shade of despair that comes from a YAML file that is 94 lines long and wrong on line 73.

And I loved it. Kind of. The way you love a very complicated board game that takes four hours to set up and your friends have stopped coming over to play.

Here's the thing nobody tells you about K8s when you're learning it: it's a beautiful machine designed to solve problems you don't have. It's like buying a forklift because you occasionally need to move a box of cereal. The forklift is magnificent. The forklift is also parked in your kitchen.

The small embarrassment

So I had this side project idea. I always have side project ideas. The graveyard of my GitHub is a monument to them.

This one needed a tiny backend. Maybe twelve lines of Python. Something that takes a request, does a thing, sends a response. That's it. That's the whole backend. A child could draw it on a napkin.

And I sat down and started writing a Dockerfile.

I want you to really appreciate this. I had a twelve-line function, and my first instinct was to containerize it, push it to a registry, define a deployment, attach it to a service, configure the ingress, set up TLS, wire up the DNS…

At some point I stopped and looked at what I was doing and thought: I am a crazy person. I am a completely crazy person.

Enter the Lambda (stage left, chewing gum)

About two months ago, I finally sat down and learned AWS Lambda. Properly. Not the "I read a blog post once" kind of learned, but the "I actually shipped a thing" kind.

And the whole idea is so stupidly, gloriously simple that I almost got angry. You give Amazon a function. A function. Like the thing you wrote in your first programming class. You say "here is my function." And Amazon says "cool, I'll run it when somebody calls it."

That's it. That's the product.

No server. No cluster. No pod. No Dockerfile (unless you want one). No little YAML goblin whispering at you from your terminal. You write a function. Somebody hits a URL. Amazon runs your function. You pay for the microseconds it was actually running.

When nobody is using your app — which, let's be honest, for most of my side projects is most of the time — you pay nothing. Zero. Free. The meter isn't running. The forklift is in a warehouse somewhere and I'm not paying storage fees.

I think what bothered me, once I understood it, was how much of my K8s knowledge turned out to be solutions to problems I had created by using Kubernetes. Like being really good at untangling necklaces because I kept putting all my necklaces in one pocket.

The plot twist (and this one really got me)

Here's where it gets funny.

I'd been using Vercel for years for frontend stuff. Next.js, static sites, "I'll just throw it on Vercel." Beautiful. Fast. Easy. A delight.

And I always thought of Vercel as this frontend thing. Like, oh, Vercel is where the website lives, and then for any actual computation I have to go build a real backend somewhere grown-up, like AWS.

Then one day, poking around the Vercel docs, I noticed these things called Vercel Functions. Little API routes. You drop a file in a folder and suddenly you have a backend endpoint.

And I looked closer.

And I looked closer.

And I realized — Vercel Functions are AWS Lambda functions. Like, literally. Vercel's own engineering blog writes about this openly. They take your code, they wrap it up, they run it on Lambda, and they put their own clever routing and streaming layer on top. The whole "serverless" half of Vercel is just Lambda wearing a very nice suit.

This is like finding out your favorite neighborhood restaurant is actually getting its bread from the bakery next door that you've walked past a thousand times. It was here the whole time.

(Small honest footnote: Vercel also has something called Edge Functions, and those are a different beast — they run on a lighter, V8-based runtime at edge locations, not Lambda. But the regular Vercel Functions? Lambda, top to bottom.)

What this actually means for a person with bad ideas

And I have a lot of bad ideas. This is important. Most of my ideas are bad. I don't know which ones are bad until I build them. That's the whole point.

The old way to find out an idea was bad:

  1. Have idea.
  2. Spend a weekend setting up infrastructure.
  3. Spend another weekend wiring up CI/CD.
  4. Spend a third weekend actually building the thing.
  5. Realize the idea was bad.
  6. Pay $18/month forever for the cluster because you're too lazy to tear it down.

The new way:

  1. Have idea.
  2. Drop a file in api/ on Vercel.
  3. Push to git.
  4. It's live. In the world. At a URL.
  5. Realize the idea was bad.
  6. Pay $0.

The cost of being wrong has collapsed. And that's a really big deal, because being wrong is mostly what I do. It's mostly what everybody does, if they're being honest. The question isn't how do you avoid being wrong — it's how cheaply can you find out?

Lambda (and therefore Vercel Functions, and therefore the little backend for every dumb thing I now build on a Tuesday night) makes finding out almost free.

The moral, if you want one

I don't really believe in morals at the end of blog posts. But here's something I've been thinking about.

A lot of what we call "learning" in this industry is actually learning what not to reach for. When I was a beginner, I reached for whatever tool looked most serious, because I thought seriousness equaled correctness. Kubernetes looked very serious. So I reached for Kubernetes.

It turns out that the real skill — the one people with gray hair keep trying to tell you about — is knowing when the smallest tool will do. A function. Literally just a function. Running somewhere you don't have to think about. For pennies, when it runs at all.

Anyway. I have another bad idea I want to go try. I'll let you know how it goes.

Programming is a skill

· One min read
Ashish Kapoor
Software Engineer

It is a slow process.

First, we write silly code then probably something that works. We learn what we were telling our “dumb” computer. All of a sudden we learn about mistakes like global variables and how bad they are.

Then we gain some wisdom and call ourselves Jon Snow of tech. We start automating our own work and call it smart work. Soon, we start visualising patterns and different ways of solving similar problems efficiently (Big O nerds hit the clap button! :D).

One fine day we realise languages, and frameworks are basically tools and we are all problem solvers.

Understanding when and where to use them to provide value is the key.

Maybe, I’m wrong. Who knows! I’m Jon Snow.

How redux connect works?

· One min read
Ashish Kapoor
Software Engineer

Just recently I wrote this to understand recursive functions and multi parentheses after a function call.

Application?

Can be useful if we use multiple tags on something.

Want to learn more about Dynamic programming?

Recommendations: VisualGo, GeeksForGeeks.