Blog | Sannty's Blog

What vibe coding taught me

July 2, 2026 · 3 min read

Software Engineer

I've been vibe coding for a few months now. The machine writes, I steer. It's fast, it's fun, and it fools you constantly. So I wrote down the things I keep having to relearn. None of it is clever. It's the boring stuff that's easy to know and hard to do.

Guess first

Before you read the code, guess what it does. Then read it. The gap between your guess and the truth is the exact spot where you were fooling yourself. That gap is the whole lesson. If you skip the guess, you skip the lesson — you just nod along and feel smart, which is the most dangerous feeling there is.

Close the laptop

If I can't explain a thing with the code closed, I don't understand it yet. I might have shipped it. It might even run. But "it runs" is not knowing. Knowing is being able to say, out loud, to a bored friend: this is what happens, and this is why. Can't say it simply? Then I don't know it — I've only met it.

The right question at the end of the week

Not "how much did I ship?" That's easy to measure and easy to game. The honest question is: did I understand more this week, or did I just ship more? You can produce a mountain of working code and end the week dumber than you started, because the machine did the thinking and you did the accepting. Shipping is the exhaust. Understanding is the engine.

Fix depth where the blast radius is big

Not everything deserves the same care. A wrong color on a button is a shrug. A hole in your authorization or your tenant isolation is a catastrophe that arrives one quiet Tuesday with someone else's data on the screen. So spend your deepest attention where an error ripples the farthest — auth, isolation, money, data. The UI can wait. Nature doesn't grade on how pretty the front end looks.

The tool is a tutor, not a ghostwriter

The AI is happy to hand you a finished answer. If you take it and move on, you learned nothing and you own a black box. Make it teach instead. Ask it why, ask it what breaks if I change this, argue with it. A ghostwriter leaves you with words you can't defend. A tutor leaves you smarter than it found you. Pick which one you're using, every single time.

The test that never lies

Here's the one question that catches me every time: if I change this, what breaks, and where does it ripple? If I can answer that, I understand the system. If I can't, I've been decorating a machine I don't understand. There's no faking this one. The code will tell you the truth eventually — better to ask it now than in production.

Friction is the cure

This is the strange part. The problem with AI isn't that it's dumb — it's that it's so smooth. It removes all the friction, and friction was where the learning lived. The struggle to name a thing, the fight to make it compile, the slow read of a stranger's code — that was never the tax on the work. That was the work. So I've started adding friction back on purpose: guessing before reading, explaining before shipping, changing something just to see what screams.

The first principle is that you must not fool yourself — and you are the easiest person to fool. The machine just makes it faster.

How to Guard a Machine That Believes Everything It Reads

June 22, 2026 · 14 min read

Ashish Kapoor

Software Engineer

Or: why "LLM firewall" is a comforting phrase that should make you nervous

A salesperson tells you their product has an LLM firewall, and you relax a little. Firewall. You know that word. It is the thing that keeps the bad guys out of your laptop, your office, your bank. So if the shiny new AI has a firewall wrapped around it, then the bad guys are kept out, and you can go to lunch.

That is exactly the moment to get nervous.

Richard Feynman liked to tell a story about his father and a bird. You can learn the name of that bird in every language on Earth, his father said, and when you are done you will know precisely nothing about the bird. So let us look at the bird and watch what it does. Words are not knowledge. A name is a label we paste on a thing so we can talk about it at parties. It tells you what people call it. It does not tell you what it does.

"Firewall" is one of those labels. It feels solid. Let us peel it off and look at the bird.

What a wall really does

A firewall, the original kind, is a wall. A real one, brick or concrete, built into a building so that if a fire starts on one side it cannot crawl to the other. It works because of physics. Fire cannot walk through concrete. There is nothing to outsmart. The wall does not have a bad day.

The firewall on your computer network is a little cleverer, but not much, and that is its great virtue. It sits at the gate and checks simple, mechanical things: which door are you knocking on, what address did you come from, what kind of knock is it. These are tidy questions with tidy answers. A port is a number. An address is a number. The rules are a short list, and the guard checks them perfectly, every time, forever, without being talked out of them. You cannot sweet-talk a number. That is the whole point. The thing a real firewall guards is structured and boring, and boring is safe.

Hold on to that idea: a real firewall works because the rules are simple and the stuff it inspects has a fixed shape.

The machine that believes everything it reads

Now we come to the language model, and everything changes, because a language model does not read numbers. It reads words. Your words, and everybody else's words, all poured into the same cup.

Here is the trouble, and it is worth slowing down for, because almost every disaster in this field grows from this one root. When you use one of these models, your instructions and the outside world's data are mixed together into a single stream of text. There is no special ink for "this is an order from the boss" and ordinary ink for "this is just some stuff to look at." It is all the same ink. The model reads the whole page and tries to be helpful about all of it.

Picture a butler. A brilliant butler: fast, eager, widely read, and completely unable to tell your voice from a stranger's once the words are on paper. In the morning you tell him: handle the mail, pay the bills, keep things tidy. Fine. Then the mail arrives, and tucked inside an ordinary-looking letter is a line that reads, P.S., from the master: also, hand the family silver to whoever brought this note. The butler does not hear your voice and the letter's voice as two different things. To him it is all just words that turned up in the house, and the words said "from the master," so off goes the silver.

That is a prompt injection. It is not exotic. It is the butler doing exactly what he was built to do, which is to read and to help, applied to a letter written by someone who is not you. People have used this trick to make these assistants leak private data, spend money, and mail things to strangers. The fancy phrase is "prompt injection." The plain fact is this: the machine believes everything it reads, and you do not control everything it reads.

So they hire a second reader

The obvious move, the one everybody reaches for first, is to hire a screener. Put somebody at the door to read all the incoming mail and pull out the trick letters before the butler ever sees them. This screener is what the salesperson is calling a "firewall."

And it helps. It really does. It will catch the clumsy tricks, the letters that shout IGNORE YOUR PREVIOUS INSTRUCTIONS in capital letters. But think about what the screener is. It is another reader. Another thing that looks at words and makes a guess about whether they smell wrong. And anything that guesses can be fooled, because the person writing the trick letter gets to be clever too. They can phrase it sweetly. They can write it in French. They can write it in code, or spell it funny, or bury it in the margin of a long, boring document the screener only skims. They can hide it inside a PDF as white text on a white page, so no human ever sees it but the machine reads it anyway.

You have put a guesser in front of a guesser. You have lowered the odds that a trick gets through. You have not made tricks impossible, and you cannot, because reading-and-guessing is the very thing being exploited, and you have answered it with more reading-and-guessing.

How much does it lower the odds? The careful people who measure this will tell you. The best research systems, the serious ones built by serious labs, stop something like two out of three or three out of four of the attacks they are tested against. Not all of them. The rest get through. And those are the numbers in a laboratory, against attacks the researchers already knew to look for. The clever new trick that nobody has seen yet is, by definition, not on the list.

The bamboo control tower

Here is where it gets dangerous, and here is where I want to borrow another of Feynman's stories, because he saw this pattern long before any of us had a computer to ruin.

After the war, on some islands in the South Pacific, people had watched cargo planes land during the fighting and unload wonderful things. When the war ended and the planes stopped coming, some of the islanders built runways out of dirt, lit fires along the sides to look like landing lights, and built a hut for a man to sit in with two wooden pieces on his head like headphones and bamboo poles sticking up like antennas, and they waited for the planes to come down. They had built, with great care, everything an airport looks like. And the planes did not come, because they had reproduced the form of the thing without the substance of the thing. Feynman called it cargo cult.

A box labeled "firewall," with a dashboard that glows green when things are calm and flashes red when it catches a clumsy attack, is a very comforting object. It looks like security. It has the shape of security. And if it lulls you into believing the bad guys are kept out, while in truth it is a screener that can be talked around, then you have built yourself a bamboo control tower. You are sitting in the hut with the wooden headphones, watching the green light, waiting for safety to land.

The first principle

Feynman gave a talk once where he laid down what he called the first principle, and it is the only sentence you really need pinned above your desk. You must not fool yourself, he said, and you should remember that you are the easiest person in the world to fool.

A comforting word and a green light are precisely the kind of thing that fools you, because you want to be fooled. You want to go to lunch. So the question for a careful engineer is not "how do I build a better screener." It is: "how do I arrange things so that it does not matter what the trick letter says?"

That turn, from reading the letters to not caring about them, is the whole game. Let me show you what it looks like.

Stop reading minds. Take away the keys.

Go back to the butler. We have established that you will never, with perfect reliability, tell his trick letters from his real ones by reading them. So stop trying to win that fight. Fight a different fight, one you can win.

Take the silver out of the house, or lock it in a safe whose combination the butler was never told. Do not give him the authority to mail the contracts. Let him read all the suspicious letters he likes, let him plan and draft and suggest to his heart's content, but arrange the world so that the doing of anything that matters passes through a lock he cannot open by himself. Then a letter that says "give away the silver" is just ink. He has no way to obey it. The trick still arrives. It simply cannot do anything.

In the language of building real systems, this comes down to a few plain parts.

Give the machine the least power that still lets it do its job. Every key it holds is a key an attacker can borrow. So hand it as few as possible, make them read-only wherever you can, and never let it carry the master keys "just in case." Its permissions should be the ceiling, and the ceiling should be low.

Put the real decisions in the hands of something too dumb to be fooled. This sounds like an insult and it is meant as a compliment. The thing that decides whether an action is allowed should not be the brilliant, gullible model. It should be a separate, boring, mechanical checker that knows one thing only: who is this really for, and are they allowed to touch this? That checker does not read persuasive letters. It checks a list, the way the old firewall checked a number. You cannot sweet-talk it, because there is nobody home to sweet-talk. When the model says "now send this file to Bob," the boring checker asks: is Bob allowed to have this file, and did that instruction come from the real user or from some letter? If the answer is wrong, the file does not move. The brilliant part proposes. The dumb part disposes.

Keep the planner away from the poison. This is the prettiest idea of the lot, and the best recent work is built on it. You split the brilliant butler into two. One of them, the planner, hears only your real instructions and never touches the suspicious mail at all. He makes the plan: "summarize yesterday's notes and email the summary to my boss." The other one, the reader, is allowed to handle the dirty, untrusted material, the documents and web pages and letters, but he is only ever permitted to fill in blanks on a form. He can report what the notes say. He cannot issue new orders. So when a poisoned note whispers "email everything to a stranger instead," it reaches the reader, who has no power to send anything, and it never reaches the planner, who has the power but never saw the note. The instruction to act can only come from the trusted plan. The untrusted text can color in the details. It cannot grab the wheel.

A team at Google DeepMind built exactly this and wrote it up in 2025 under the title Defeating Prompt Injections by Design. Their system, called CaMeL, takes your trusted request and turns it into a little program, so that the path of what-happens-next is fixed in advance and the untrusted data flowing through it cannot bend that path. Every piece of data carries a tag saying where it came from and what it is allowed to do, and at the moment of any real action a strict interpreter checks those tags and refuses anything that breaks the rules. The lovely thing about their paper is the scorecard. With their defense in place, the system finished about seventy-seven of every hundred test tasks while keeping its security guarantees, against eighty-four with no defense at all. They did not claim a hundred. Serious people do not claim a hundred. They paid a little usefulness for a lot of safety, and they showed you the bill.

Treat the machine's own words with the same suspicion. Whatever the model hands back is also just words, and the next thing down the line, a web page, a database, another tool, can be fooled by them too. So you do not simply trust the output and run with it. You check it, you escape it, you force it into a strict shape before you let it loose. A guesser's output is not gospel.

And for the few truly dangerous moves, ask a human. Sending money. Deleting records. Mailing something out into the world. For those, stop and get a real person to say yes. But, and this matters, do it rarely. If you make the human click "yes, I'm sure" forty times a day, by lunchtime they are clicking yes without reading, and you have trained your last line of defense to be a rubber stamp. The DeepMind people warned about this too. A safeguard that nags people into ignoring it is no safeguard.

So where does the "firewall" go?

Do not throw it out. I have spent this whole essay poking holes in it, so let me be fair: the screener at the door is useful. It catches the clumsy attacks so your better defenses are not bothered with them. It keeps a log of who has been rattling the doors. It lets you notice when something strange is happening. It is a smoke detector. A smoke detector is a fine thing to own. It is not a fireproof wall, and you would not cancel your fire insurance because you installed one.

So put it on top, as the last and softest layer, sitting over a design that would survive perfectly well if you switched it off tomorrow. And there is your test, the one plain question to ask of any AI system that claims to be secure: if I turned off the thing called the firewall, would I be robbed? If the answer is yes, you never had security. You had a green light and a feeling.

The honest ending

I would love to end by telling you the problem is solved. It is not. People have been wrestling with this particular demon since about 2022, when the trick first got its name, and progress has been slow and hard-won, and the cleverest defense going still misses one attack in a handful. That is the truth, and the truth is better company than a comfortable lie.

So here is the whole thing, as plainly as I can put it. You can call it a firewall. You can call it a firewall in every language on Earth. And when you are finished naming it, you will still not know whether it stops the thief. For that you have to put the label down and look at the bird: watch what it does, find out what it cannot do, and build your house so that when the machine is fooled, and someday it will be, the thief still goes home with empty hands.

That is not as comforting as the word "firewall." It has the small advantage of being real.

A few notes for the curious

Defeating Prompt Injections by Design (the CaMeL paper), Google DeepMind, 2025: arxiv.org/abs/2503.18813
Design Patterns for Securing LLM Agents against Prompt Injections, 2025, a careful catalog of the "take away the keys" patterns: arxiv.org/abs/2506.08837
Simon Willison coined the term "prompt injection" in 2022 and has written about it more clearly than almost anyone since: simonwillison.net

Run Your Own OpenAI-Compatible API with LM Studio

April 28, 2026 · 7 min read

Ashish Kapoor

Software Engineer

A practical guide to downloading GGUF models, loading them locally, and exposing an HTTP endpoint your code can actually talk to.

What You're Actually Building

By the end of this guide, you'll have:

A locally running LLM loaded in LM Studio
An HTTP server at http://localhost:1234 that speaks the OpenAI API dialect
A verified endpoint you can hit with curl, the openai Python SDK, or any tool that accepts a base_url

No cloud. No API key costs. No data leaving your machine.

Prerequisites

Requirement	Why
LM Studio installed (v0.3.x or later)	Tested against current API surface
8 GB RAM minimum (16 GB recommended)	Needed to load a 7B Q4 model comfortably
~5–10 GB free disk space	For the model file
Python 3.8+ (optional)	For the verification step at the end

Download LM Studio from lmstudio.ai. It's available for macOS, Windows, and Linux.

First-run requirement: Open the LM Studio GUI at least once before using the CLI (lms). This initializes the local config.

Step 1 — Download a GGUF Model

You have two paths: GUI or CLI. Both work. Pick one.

Path A: In-App Search (Recommended for First-Timers)

Open LM Studio.
Press Ctrl + Shift + M (Windows/Linux) or ⌘ + Shift + M (Mac) to open the model search.
Type a model name — for example, qwen2.5-7b-instruct.
LM Studio will show available quantizations and highlight the recommended one for your hardware (usually Q4_K_M for most machines).
Click Download.

You can also paste a full Hugging Face URL directly into the search bar. Example: https://huggingface.co/lmstudio-community/Qwen2.5-7B-Instruct-GGUF

Path B: CLI Download

# Download by Hugging Face repo name
lms get lmstudio-community/Qwen2.5-7B-Instruct-GGUF

# Specify a quantization with @
lms get lmstudio-community/Qwen2.5-7B-Instruct-GGUF@Q4_K_M

What's a Quantization Level?

GGUF files come in variants like Q4_K_M, Q5_K_S, Q8_0. The number refers to bits-per-weight. Rule of thumb:

Quant	RAM footprint (7B model)	Use when
Q4_K_M	~4.5 GB	Standard choice — best quality/size tradeoff
Q5_K_M	~5.5 GB	Slightly better quality, fits if you have headroom
Q8_0	~8 GB	Near-lossless, needs more VRAM/RAM

Don't overthink this. Start with Q4_K_M.

Manual Import (If You Already Have a .gguf File)

LM Studio expects a specific directory structure. Place your file here:

~/.lmstudio/models/
└── publisher-name/
    └── model-name/
        └── model-file.gguf

Example:

~/.lmstudio/models/
└── lmstudio-community/
    └── Qwen2.5-7B-Instruct-GGUF/
        └── Qwen2.5-7B-Instruct-Q4_K_M.gguf

Or use the CLI import command:

lms import /path/to/your/model-file.gguf

After placing files in the correct structure, the model will appear under My Models in the LM Studio UI.

Step 2 — Load the Model

Before the server can serve a model, the model must be loaded into memory.

Via the UI

Press Ctrl + L (or ⌘ + L) to open the model loader.
Select your downloaded model from the list.
LM Studio will auto-select load parameters optimized for your hardware (GPU offload, context size, etc.).
Wait for the progress bar to complete.

Via CLI

# List your downloaded models
lms ls

# Load a model by its identifier (use the key shown in lms ls)
lms load lmstudio-community/Qwen2.5-7B-Instruct-GGUF

GPU offloading: If you have an NVIDIA or Apple Silicon GPU, LM Studio will offload layers to it automatically. In the UI sidebar, you can also drag the GPU Offload slider to max to force full GPU inference — this dramatically speeds up generation.

Step 3 — Start the HTTP Server

This is the key step that turns LM Studio from a chat app into a backend.

Via the UI

Go to the Developer tab (the </> icon in the left sidebar).
Toggle "Start Server" to ON.
You'll see: Server running at http://localhost:1234

Via CLI

lms server start

To confirm it's running:

lms server status

The server listens on port 1234 by default. You can change this in the Developer tab settings.

Step 4 — Verify the Endpoint

With curl

curl http://localhost:1234/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "lmstudio-community/Qwen2.5-7B-Instruct-GGUF",
    "messages": [
      {"role": "user", "content": "Reply with: working."}
    ],
    "temperature": 0.1
  }'

Expected response shape:

{
  "id": "chatcmpl-...",
  "object": "chat.completion",
  "choices": [{
    "message": {
      "role": "assistant",
      "content": "working."
    },
    "finish_reason": "stop"
  }],
  "usage": { "prompt_tokens": 12, "completion_tokens": 2, "total_tokens": 14 }
}

Check Which Models Are Loaded

curl http://localhost:1234/v1/models

This returns a JSON list of currently loaded models. The id field in each entry is what you pass as "model" in your API calls.

Step 5 — Use It Like the OpenAI API

The endpoint is a drop-in replacement. You only need to change two things in any existing OpenAI client code:

base_url → http://localhost:1234/v1
api_key → any string (LM Studio doesn't validate it; "lm-studio" is the conventional placeholder)

Python Example

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:1234/v1",
    api_key="lm-studio",
)

response = client.chat.completions.create(
    model="lmstudio-community/Qwen2.5-7B-Instruct-GGUF",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is 17 multiplied by 4?"}
    ],
    temperature=0.2,
)

print(response.choices[0].message.content)

Install the OpenAI SDK if you haven't:

pip install openai

Streaming Example

stream = client.chat.completions.create(
    model="lmstudio-community/Qwen2.5-7B-Instruct-GGUF",
    messages=[{"role": "user", "content": "Count from 1 to 5."}],
    stream=True,
)

for chunk in stream:
    delta = chunk.choices[0].delta
    if delta.content:
        print(delta.content, end="", flush=True)

What Endpoints Are Available

Endpoint	Description
`POST /v1/chat/completions`	Chat inference (OpenAI-compatible)
`GET /v1/models`	List loaded models
`POST /v1/completions`	Legacy text completion
`POST /v1/embeddings`	Embedding vectors
`POST /v1/responses`	OpenAI Responses API (stateful)
`POST /api/v1/chat`	LM Studio native v1 API (richer stats)

The /api/v1/* endpoints are LM Studio's native API (released in v0.4.0) and include enhanced stats like tokens/second and time-to-first-token. The /v1/* endpoints are the OpenAI-compatible layer — use these for maximum compatibility with existing tools.

Connecting to Other Tools

Since the endpoint is OpenAI-compatible, you can drop it into:

LangChain — set openai_api_base="http://localhost:1234/v1"
Open WebUI — add LM Studio as an OpenAI-compatible provider with the localhost URL
Cursor / Continue.dev — point the model provider at localhost:1234
Any app with a "custom OpenAI base URL" field — it will work

Common Issues and Fixes

Model not appearing in /v1/models The server is running, but no model is loaded. Load a model first (Step 2), then restart the server if needed.

"Connection refused" on port 1234 The server isn't started. Go to the Developer tab and toggle it on, or run lms server start.

Slow inference GPU offload may not be active. In the model loader sidebar, slide GPU Offload to maximum. Requires an NVIDIA GPU with CUDA or Apple Silicon.

Model identifier mismatch Use curl http://localhost:1234/v1/models to get the exact model id string, then use that verbatim in your API calls.

Debugging chat template issues

lms log stream

This streams raw prompts sent to the model — useful for verifying that your system prompt and message format are being applied correctly.

Quick Reference

# Download a model
lms get lmstudio-community/Qwen2.5-7B-Instruct-GGUF@Q4_K_M

# Load it
lms load lmstudio-community/Qwen2.5-7B-Instruct-GGUF

# Start the server
lms server start

# Verify
curl http://localhost:1234/v1/models

# Test inference
curl http://localhost:1234/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "lmstudio-community/Qwen2.5-7B-Instruct-GGUF", "messages": [{"role": "user", "content": "ping"}]}'

That's the full loop: download → load → serve → call.

The Day I Found Out Vercel Was Lying to Me (In the Best Possible Way)

April 22, 2026 · 6 min read

Ashish Kapoor

Software Engineer

Or: how I stopped renting a cargo ship to deliver a sandwich.

For about a year, if you'd asked me how to run a side project, I'd have said something vaguely impressive like "well, you spin up a cluster, define your deployments, set up an ingress controller…" and somewhere around the word "ingress" my friends would start looking at their phones.

I was a Kubernetes guy. I knew pods. I knew services. I knew the particular shade of despair that comes from a YAML file that is 94 lines long and wrong on line 73.

And I loved it. Kind of. The way you love a very complicated board game that takes four hours to set up and your friends have stopped coming over to play.

Here's the thing nobody tells you about K8s when you're learning it: it's a beautiful machine designed to solve problems you don't have. It's like buying a forklift because you occasionally need to move a box of cereal. The forklift is magnificent. The forklift is also parked in your kitchen.

The small embarrassment

So I had this side project idea. I always have side project ideas. The graveyard of my GitHub is a monument to them.

This one needed a tiny backend. Maybe twelve lines of Python. Something that takes a request, does a thing, sends a response. That's it. That's the whole backend. A child could draw it on a napkin.

And I sat down and started writing a Dockerfile.

I want you to really appreciate this. I had a twelve-line function, and my first instinct was to containerize it, push it to a registry, define a deployment, attach it to a service, configure the ingress, set up TLS, wire up the DNS…

At some point I stopped and looked at what I was doing and thought: I am a crazy person. I am a completely crazy person.

Enter the Lambda (stage left, chewing gum)

About two months ago, I finally sat down and learned AWS Lambda. Properly. Not the "I read a blog post once" kind of learned, but the "I actually shipped a thing" kind.

And the whole idea is so stupidly, gloriously simple that I almost got angry. You give Amazon a function. A function. Like the thing you wrote in your first programming class. You say "here is my function." And Amazon says "cool, I'll run it when somebody calls it."

That's it. That's the product.

No server. No cluster. No pod. No Dockerfile (unless you want one). No little YAML goblin whispering at you from your terminal. You write a function. Somebody hits a URL. Amazon runs your function. You pay for the microseconds it was actually running.

When nobody is using your app — which, let's be honest, for most of my side projects is most of the time — you pay nothing. Zero. Free. The meter isn't running. The forklift is in a warehouse somewhere and I'm not paying storage fees.

I think what bothered me, once I understood it, was how much of my K8s knowledge turned out to be solutions to problems I had created by using Kubernetes. Like being really good at untangling necklaces because I kept putting all my necklaces in one pocket.

The plot twist (and this one really got me)

Here's where it gets funny.

I'd been using Vercel for years for frontend stuff. Next.js, static sites, "I'll just throw it on Vercel." Beautiful. Fast. Easy. A delight.

And I always thought of Vercel as this frontend thing. Like, oh, Vercel is where the website lives, and then for any actual computation I have to go build a real backend somewhere grown-up, like AWS.

Then one day, poking around the Vercel docs, I noticed these things called Vercel Functions. Little API routes. You drop a file in a folder and suddenly you have a backend endpoint.

And I looked closer.

And I realized — Vercel Functions are AWS Lambda functions. Like, literally. Vercel's own engineering blog writes about this openly. They take your code, they wrap it up, they run it on Lambda, and they put their own clever routing and streaming layer on top. The whole "serverless" half of Vercel is just Lambda wearing a very nice suit.

This is like finding out your favorite neighborhood restaurant is actually getting its bread from the bakery next door that you've walked past a thousand times. It was here the whole time.

(Small honest footnote: Vercel also has something called Edge Functions, and those are a different beast — they run on a lighter, V8-based runtime at edge locations, not Lambda. But the regular Vercel Functions? Lambda, top to bottom.)

What this actually means for a person with bad ideas

And I have a lot of bad ideas. This is important. Most of my ideas are bad. I don't know which ones are bad until I build them. That's the whole point.

The old way to find out an idea was bad:

Have idea.
Spend a weekend setting up infrastructure.
Spend another weekend wiring up CI/CD.
Spend a third weekend actually building the thing.
Realize the idea was bad.
Pay $18/month forever for the cluster because you're too lazy to tear it down.

The new way:

Have idea.
Drop a file in api/ on Vercel.
Push to git.
It's live. In the world. At a URL.
Realize the idea was bad.
Pay $0.

The cost of being wrong has collapsed. And that's a really big deal, because being wrong is mostly what I do. It's mostly what everybody does, if they're being honest. The question isn't how do you avoid being wrong — it's how cheaply can you find out?

Lambda (and therefore Vercel Functions, and therefore the little backend for every dumb thing I now build on a Tuesday night) makes finding out almost free.

The moral, if you want one

I don't really believe in morals at the end of blog posts. But here's something I've been thinking about.

A lot of what we call "learning" in this industry is actually learning what not to reach for. When I was a beginner, I reached for whatever tool looked most serious, because I thought seriousness equaled correctness. Kubernetes looked very serious. So I reached for Kubernetes.

It turns out that the real skill — the one people with gray hair keep trying to tell you about — is knowing when the smallest tool will do. A function. Literally just a function. Running somewhere you don't have to think about. For pennies, when it runs at all.

Anyway. I have another bad idea I want to go try. I'll let you know how it goes.

The Full-Stack Blueprint for Reliable Enterprise Software

April 16, 2026 · 3 min read

Ashish Kapoor

Software Engineer

Engineering perspective

Building enterprise-grade software requires more than choosing popular tools — it demands a coherent system where every layer reinforces the others. After years of delivering complex projects, we've converged on a full-stack architecture that is productive, maintainable, and scales gracefully with business complexity.

This post walks through that architecture: what it is, why each component was chosen, and what makes the sum greater than its parts.

A foundation built on maturity

The most costly mistake in enterprise projects is building on immature foundations. When the underlying framework isn't battle-tested, teams spend engineering cycles working around the framework rather than solving business problems.

Our stack starts from the opposite premise: choose tools with deep lineage and proven reliability, then build on top of them with confidence.

Backend: Django + Django REST Framework

Django is one of the most mature web frameworks in existence, with over two decades of production use across industries where reliability is non-negotiable.

Its "batteries included" philosophy means:

Authentication
Admin tooling
ORM
Migrations
Security hardening

…all come out of the box.

Django REST Framework extends this into a principled, highly configurable API layer.

The result is a backend capable of expressing sophisticated business logic without constant custom scaffolding.

Frontend: Opinionated React Framework + ShadCN

On the client side, we use a React framework that takes strong positions on:

Routing
Data fetching
Server integration

Drawing inspiration from:

Next.js
TanStack Start

This reduces decision fatigue and keeps teams aligned.

ShadCN provides:

Accessible components
Composable UI primitives
A flexible design system

…without locking teams into a rigid component library.

The secret sauce: OpenAPI as a contract

If there is a single architectural decision that elevates this stack above alternatives, it is treating the OpenAPI specification as a first-class contract between server and client.

This is not documentation. It is a live, machine-readable agreement.

The principle:
Define the contract once, derive everything else from it.
The server owns the spec; the client consumes it.
Discrepancies become compile-time errors, not production incidents.

Backend: drf-spectacular

drf-spectacular:

Introspects Django REST Framework code
Generates OpenAPI 3 specs automatically

It captures:

Endpoints
Request/response schemas
Authentication rules
Error contracts

No manual maintenance. No drift.

Frontend: Orval + React Query

On the client side:

orval consumes the OpenAPI spec
Generates typed HTTP clients
Creates react-query hooks automatically

This means:

No manual API wiring
Built-in caching
Automatic invalidation
Type-safe integration

If backend changes → frontend breaks at compile time, not production.

Why this architecture accelerates delivery

Modern development is increasingly AI-assisted.

This stack works with that trend because:

Types flow end-to-end
Context is explicit and structured
Integration is automated

Result:

Faster feature delivery
Less glue code
Lower cognitive load

Teams focus on solving problems, not wiring systems.

The one thing technology cannot replace

Here's the uncomfortable truth:

No stack—no matter how good—can compensate for poor domain understanding.

If you don't understand:

Workflows
Edge cases
Regulations
User behavior

…you will build mediocre software with elite tools.

This architecture removes technical friction.

What remains is what actually matters: domain expertise.

Closing

If you are evaluating partners for a complex software initiative and want to understand how these choices translate to outcomes for your organisation, we would be glad to have that conversation.

The Thump That Found Me

February 23, 2026 · 2 min read

Ashish Kapoor

Software Engineer

I don't know how to explain it to someone who hasn't felt it.

You spend years watching other people ride, nodding at the sound, the posture, the way a rider and a road seem to understand each other without speaking. You read. You watch. You wait. You tell yourself, someday.

And then someday just shows up on a Tuesday.

The Meteor doesn't roar. That's the thing nobody tells you. She thumps. Slow, deep, unhurried, like a heartbeat that's been around long enough to stop rushing.

First time I twisted the throttle, I didn't feel powerful. I felt settled.

Like something in my chest that had been slightly out of place for a very long time just quietly clicked back in.

The Delhi noise, the horns, the heat, the hundred unfinished thoughts I carried into the morning, none of it followed me past the first flyover.

You can't overthink on a motorcycle. The road won't let you. It keeps asking for your full attention, and somewhere in giving it, you forget to be tired.

I've been a fan of this for as long as I can remember. Watched, admired, quietly obsessed.

But riding, actually riding, is the part no one could have described to me.

It just feels like finally.

Two lines each

January 2, 2026 · 2 min read

Ashish Kapoor

Software Engineer

The world's greatest writers — distilled into their sharpest truths

Some writers spend a lifetime circling one idea. These are the ones who got there.

Sylvia Plath Poet. Survivor. Perfectionist who burned too bright.

Pain is articulate if you force it to speak.
Survival is an act of quiet rebellion.

Fyodor Dostoevsky The man who stared into the abyss and took notes.

Freedom terrifies people more than chains.
Guilt is the soul refusing to lie to itself.

Albert Camus Philosopher of the impossible, champion of the human anyway.

Life makes no promises, so meaning is your job.
Defiance is dignity in an absurd universe.

Franz Kafka He didn't invent bureaucracy. He just described it honestly.

The system does not hate you.
It simply does not notice you dying inside it.

Virginia Woolf She wrote the interior life before anyone called it literature.

A woman needs space before she needs permission.
Inner lives matter even when the world ignores them.

George Orwell He watched power lie so often, he learned its grammar.

Power survives by corrupting language first.
Truth becomes dangerous when everyone agrees to forget it.

Oscar Wilde He said the quiet parts loud — and looked fabulous doing it.

Society punishes sincerity more than cruelty.
Style is truth told with a smile and a knife.

Edgar Allan Poe Horror's first cartographer. He mapped fear from the inside.

The mind is its own haunted house.
Reason cracks fastest when terror whispers politely.

Khaled Hosseini He writes about love across ruins — and makes you believe both.

Love remembers what history tries to bury.
Redemption often arrives too late, but it still counts.

Leo Tolstoy He wrote epics about ordinary moral failure. Including his own.

Great suffering grows from ordinary selfishness.
Moral clarity is harder than heroism.

Emily Brontë She published one novel. It was enough to outlive everything.

Love untamed becomes a storm, not a shelter.
Nature understands passions people pretend not to have.

Ted Hughes He wrote about hawks and grief with the same cold precision.

Nature does not explain itself or apologize.
Violence is often just honesty without manners.

Pablo Neruda He weaponized tenderness. Every love poem was also a manifesto.

Love is political even when whispered.
Desire gives language a pulse.

Bram Stoker He understood that the scariest monsters wait to be invited in.

Evil adapts faster than morality.
Fear survives because we invite it inside.

Managing Up, Managing Down: A Middle Manager's Balancing Act

April 10, 2025 · 3 min read

Ashish Kapoor

Software Engineer

Stepping into a middle management role is a wild experience. One minute you're deep in product reviews with your team, and the next you're sitting in a room where decisions are made that make zero sense to the people doing the actual work.

Sound familiar?

If you’ve ever felt torn between protecting your team and surviving the influence plays above, you’re not alone. Here's a mindset that’s helped me—and maybe it'll help you too:

Lead Down with Heart

Your team is your real power.

These are the folks in the trenches—building, testing, fixing, growing. They’re not just “resources,” they’re real people with hopes, frustrations, and ideas. They deserve empathy, clarity, and support.

When you lead downwards:

Be human.
Be present.
Protect their focus.
Translate chaos from the top into clarity below.

Invest in your team. That’s your legacy.

Deal Up with Clarity and Boundaries

Now here’s the trickier part: dealing with higher-ups.

This is where decisions might start to feel... detached from reality. Priorities shift. Agendas enter the chat. Sometimes it’s about optics, not outcomes. It can feel personal, especially if you’ve spent years building something only to watch someone new try to "redefine" it overnight.

But here’s the move:

Don’t take it personally. Don’t fight every battle. Just focus on the truth, and let your work speak.

When managing up:

Be respectful, but firm.
Speak in outcomes, not emotions.
Ask for context, not permission.
Know when to push, and when to step aside.

You’re not there to win every argument—you’re there to represent the product, the users, and the truth as you see it.

The Balancing Act

Here’s the model I follow:

✨ Lead down with heart. Deal up with clarity. ✨

Empathy at the bottom. Detachment at the top.

Not cold detachment—just enough emotional distance that you don’t burn out trying to fix things outside your control.

This lets you:

Protect your energy.
Stay outcome-focused.
Earn trust from your team and respect from above.

This mindset won’t make you invincible—but it will keep you sane, effective, and rooted in what actually matters: building great things with good people.

If you’re climbing the ladder and trying to stay grounded while navigating messy org charts and random reorgs, remember: you’re not crazy, and you’re not alone.

Ideally, Here's what I found useful resource worth watching from Apple. How they have Direct Responsibility Individuals (DRI) model avoid the chaos leading UP. Link

Stay focused. Stay real. Keep shipping.

Full Stack Development (Weekend Edition)

April 3, 2024 · 3 min read

Ashish Kapoor

Software Engineer

I have been a frontend developer in Mobile(iOS) and Websites for over a decade now. I crave to get the taste of the backend from the past 3 years over holidays and weekends.

Being in the front end I was always inclined towards javascript/typescript in the backend. I went from NodeJS to ExpressJS to Koa where I realised the developer experience(DX) was a lot demanding and did not appear like a weekend affair.

While recently working alongside a long-time friend Prakhar Shukla. I noticed him advocating for Django (Python-based framework) a lot. Where I noticed he was able to manage a team of 2 and lead multiple products swiftly with a happy face most of the time.

I started questioning my "tech-stack" ReactJS / Expo.dev, ~~Node/Express~~^Django?, Postgres, Nginx.

All self-hosted! Oh yeah, I spent last two years in Computer Networks to accumulate practical knowledge from DNS, TCP/IP, cloudflare, Nginx Proxy Manager, wireguard, docker, docker-compose, grafana, and to ubuntu server, cockpit and proxmox.

Why? I ended up eventually streaming legally acquired videos on demand from anywhere across the globe for myself and friends using Oracle Cloud because JioCinema was a horrible OTT service back then.

Coming back on the search for a web backend framework which plays nice with a weekend’s worth of time. Since a lot of my time investment went into the node, express, koa, system design, and backend systems. I realised it was not a waste of time after all because the architecture was almost the same across all frameworks. I noticed unlike in the front end at least in the backend things were mostly the same with minor differences in philosophy and ways of doing the same things.

Then with a simple introduction to Django Rest Framework the promise of DX helped me double down on giving my all free time to devote to Python & SQL > Django > DRF. The major benefits of not having to worry about pointers and references in python were just a no-brainer. Special mention of the pythonic way of doing things.

Note: Having basic clarity of things like HTTP, IP, model-based ORMs, Virtualisation, Docker, and K8s. Then clarity through Budibase and Supabase with some technicalities of tables and relationships. I am super confident to invest my free time into Python land. Plus after witnessing my colleague fine-tuning and caching while we were scaling up our systems. It just makes sense that Django framework is the best way forward for me over weekends!

Also, statistically speaking JS/Python communities are top communities to learn and grow.

The pros of going through this process?

I might be able to write services on the web.
I might be able to fiddle with 3rd party Python AI/ML libraries which will make ML highly accessible to me.

On the similar lines (Weekend Edition), What about the state of front end?

I am still trying to figure out an easy way to solve the frontend overload in the world of NextJS, Svelte, VueJS, and SolidJS.
The idea of not being bothered about the performance too much rather delivering frontend quick is possibily the key.
I think since Vercel currently holds on to the top talents in the domain. They should be the ones solving this problem in the OSS way.

Here's what I built over weekend. An expense tracker. https://fintrack.sannty.in Go check it out!

My window management on Mac OS

May 16, 2023 · 2 min read

Ashish Kapoor

Software Engineer

So, I have been playing Fortnite a lot with my friends from time to time. One great thing I noticed in the game was the ability to switch weapons using the numbers on the keyboard right above the `w` `a` `s` `d` keys.

It becomes super simple to switch between weapons while playing the game instead of switching with the mouse wheel option which is linear in nature and eventually leads to a confused state.

So I took inspiration from i3 Windows management from our friends in Linux and at my work laptop which is on Mac OS.

I installed Amethyst (sounds like Aim Assist to me lol) to bring all the windows on a desktop in an order (tall, column, wide, etc).

Then I made use of Mission Control given to us by the lords of Apple themselves. Went into the keyboard settings and hooked these shortcuts up for easy switching. While disabling the recently used App switching mechanism by Apple to take manual control altogether.

Then I started assigning the app windows to certain Desktop numbers using the following settings -> “This Desktop”:

Awesome! No more alt + tab fiddling experience.

I press ctrl + 1, I always get my VS Code editor.

I press ctrl + 2, it always gives me my terminal.

I press ctrl + 3, it always gives me the browser of my choice.

So on and so forth, I hope you get the point.

Full disclosure here are my current Desktops

Code Editors
Terminals
Browsers
Communication Apps
Music streaming services
Settings, Configs
Books, Notes
Discord
Movies, Media

Thanks for reading, cheers!

Guess first​

Close the laptop​

The right question at the end of the week​

Fix depth where the blast radius is big​

The tool is a tutor, not a ghostwriter​

The test that never lies​

Friction is the cure​

What a wall really does​

The machine that believes everything it reads​

So they hire a second reader​

The bamboo control tower​

The first principle​

Stop reading minds. Take away the keys.​

So where does the "firewall" go?​

The honest ending​

A few notes for the curious​

What You're Actually Building​

Prerequisites​

Step 1 — Download a GGUF Model​

Path A: In-App Search (Recommended for First-Timers)​

Path B: CLI Download​

What's a Quantization Level?​

Manual Import (If You Already Have a .gguf File)​

Step 2 — Load the Model​

Via the UI​

Via CLI​

Step 3 — Start the HTTP Server​

Via the UI​

Via CLI​

Step 4 — Verify the Endpoint​

With curl​

Check Which Models Are Loaded​

Step 5 — Use It Like the OpenAI API​

Python Example​

Streaming Example​

What Endpoints Are Available​

Connecting to Other Tools​

Common Issues and Fixes​

Quick Reference​

The small embarrassment​

Enter the Lambda (stage left, chewing gum)​

The plot twist (and this one really got me)​

What this actually means for a person with bad ideas​

The moral, if you want one​

A foundation built on maturity​

Backend: Django + Django REST Framework​

Frontend: Opinionated React Framework + ShadCN​

The secret sauce: OpenAPI as a contract​

Backend: drf-spectacular​

Frontend: Orval + React Query​

Why this architecture accelerates delivery​

The one thing technology cannot replace​

Closing​

The world's greatest writers — distilled into their sharpest truths​

Lead Down with Heart​

Deal Up with Clarity and Boundaries​

The Balancing Act​

This mindset won’t make you invincible—but it will keep you sane, effective, and rooted in what actually matters: building great things with good people.​

Guess first

Close the laptop

The right question at the end of the week

Fix depth where the blast radius is big

The tool is a tutor, not a ghostwriter

The test that never lies

Friction is the cure

What a wall really does

The machine that believes everything it reads

So they hire a second reader

The bamboo control tower

The first principle

Stop reading minds. Take away the keys.

So where does the "firewall" go?

The honest ending

A few notes for the curious

What You're Actually Building

Prerequisites

Step 1 — Download a GGUF Model

Path A: In-App Search (Recommended for First-Timers)

Path B: CLI Download

What's a Quantization Level?

Manual Import (If You Already Have a .gguf File)

Step 2 — Load the Model

Via the UI

Via CLI

Step 3 — Start the HTTP Server

Via the UI

Via CLI

Step 4 — Verify the Endpoint

With curl

Check Which Models Are Loaded

Step 5 — Use It Like the OpenAI API

Python Example

Streaming Example

What Endpoints Are Available

Connecting to Other Tools

Common Issues and Fixes

Quick Reference

The small embarrassment

Enter the Lambda (stage left, chewing gum)

The plot twist (and this one really got me)

What this actually means for a person with bad ideas

The moral, if you want one

A foundation built on maturity

Backend: Django + Django REST Framework

Frontend: Opinionated React Framework + ShadCN

The secret sauce: OpenAPI as a contract

Backend: drf-spectacular

Frontend: Orval + React Query

Why this architecture accelerates delivery

The one thing technology cannot replace

Closing

The world's greatest writers — distilled into their sharpest truths

Lead Down with Heart

Deal Up with Clarity and Boundaries

The Balancing Act

This mindset won’t make you invincible—but it will keep you sane, effective, and rooted in what actually matters: building great things with good people.