Why I Rewrote Four Services in Go

Python on Knative was costing me 5–7 seconds every cold start. On a small Kubernetes cluster, that latency was the difference between a usable AI agent and a broken one. So I rewrote four services in Go on three weekends. Here are the measured numbers, the unexpected wins, the unexpected pains, and when you absolutely should not do the same.
Why I Rewrote Four Services in Go

I had four small services. Each one was a Model Context Protocol adapter — a thin wrapper that lets an AI agent call out to some external thing. One talked to Replicate for image generation. One talked to a Nostr-friendly social poster. One was a Git-aware research helper. One was a Tavily-powered web search.

They were all written in Python. They all ran on Knative on a small Kubernetes cluster. They all worked. And they were all just slightly too slow to use.

A six-second cold start is fine for nothing. It is the precisely wrong amount of time — slow enough to be noticed, fast enough to feel almost loaded. An AI agent waiting six seconds for a single tool call does not know it is waiting for a cold start; it just knows the tool is sluggish. The user does not know either. The user just thinks the agent is broken.

And six seconds was a good day. Some of the services took longer.

So I rewrote them in Go. This is what that cost me, and what the measurements actually were before and after.

The actual problem

Cold starts on serverless platforms are an old problem with a well-known shape. The platform spins your container up only when traffic arrives, so the first request after an idle period pays the full startup tax — image pull (or warm cache hit), container start, language runtime initialisation, application bootstrap.

For Python, application bootstrap is where the bill arrives. The interpreter has to start. import statements run. The dependency tree gets walked. If you have ever wondered why a hello world Flask app feels so much heavier than a hello world Go binary, this is why. Python is doing real work before your code runs. Go has already started.

On a small Kubernetes cluster — small as in I am paying for it personally — you do not keep a fleet of warm replicas around. You scale-to-zero. You scale-to-zero because that is the entire point of using serverless on small infrastructure. The trade-off is that every idle service eats a cold start the next time it is invoked.

For my four MCPs, next time it is invoked was approximately every time an agent decided to use them. Which was constantly. Which meant cold starts were not the rare edge case. They were the common case.

The measured cold-start latency on each Python MCP, taken from production:

  • postiz-mcp5.9 seconds
  • git-mcp4.9 seconds
  • research-mcp5.4 seconds
  • image-mcp7.2 seconds

For comparison, the one MCP I had already written in Go (clickup-mcp) was cold-starting in 200 to 400 milliseconds. An order of magnitude faster, on the same cluster, in the same Knative configuration.

But the headline number was the workflow chain. A real agent run — drafting a blog post — hits four MCPs sequentially in its draft phase. On Python, that meant 20 to 25 seconds of cumulative cold-start latency every time the chain ran cold. On Go, the same chain came down to roughly 2 seconds.

Twenty seconds of staring at a loading spinner, every time, in steady state. That was the ceiling I was bumping into.

The rewrite

I would like to tell you the rewrite was elegant. It was not. It was four small slogs, in a row, on weekends, with the same set of decisions made four times in a row because I was too lazy to extract the boilerplate properly until service three.

The actual work, per service, looked like this.

Setting up the project. Pick a Go module name, decide on a directory structure, pick an HTTP framework or just use the standard library, pick a logger. Repeat per service. (I eventually settled on net/http plus slog. The number of choices Go gives you for stdlib HTTP is zero, and that is why it is faster than the equivalent in any other ecosystem.)

Translating the SDK calls. The Python services used native Python SDKs for Replicate, the Nostr toolkit, and so on. Go either had a vendor SDK, a community one, or — in two cases — nothing usable. In those two cases I just hand-rolled the HTTP requests because the APIs were small enough. This took less time than I expected. Most modern APIs are thin wrappers over REST and JSON, which is exactly what net/http and encoding/json are built for.

Translating the state. Two of the services held a small amount of in-memory state — caches, mostly. Python made this trivial; Go made me think about it. In both cases the right answer was use sync.Map and stop overengineering.

Containerising. This is where the win started becoming visible. A multi-stage Go Dockerfile (golang:1.25-alpine builder going to a FROM scratch final) produces a static binary in an image of about 20 megabytes. The Python images had been about 300 megabytes with all their dependencies. Pull times went from “perceptible” to “negligible.” This is a real cold-start contribution that has nothing to do with language runtime — it is image-pull latency, which is dominated by image size.

Wiring observability. The Python services had structured logging via Loguru. The Go services got slog with the same JSON output. Same dashboards still worked. Same alerts still worked. Nobody in the observability stack knew or cared that the language had changed, which is a very nice property for a rewrite.

The total time, across four services and a lot of starting-and-stopping, was something like three weekends. None of them were full weekends. None of them were free of context-switching to other work. So call it twenty hours, generously.

The unexpected wins

A few things landed in my lap that I had not planned for.

No more Python image dance. Building Python container images involves a lot of dancing — the right base image, the right Python version, the right system packages, the right pip cache strategy, the right --no-deps if you have already got a vendored wheel directory. Half the Dockerfiles in my old repo were Python wheel-and-glue. The Go equivalents are six lines and they all look the same.

Single binary deploys. When the only artefact your service produces is a compiled binary, deployment becomes a COPY instruction in a FROM scratch image. There is no virtualenv. There is no missing system library. There is no “works on my laptop, fails on the cluster” because you have shipped exactly the same bytes.

Easier to read in production. Go’s lack of magic — its terrible-by-design lack of expressiveness — turned out to be a virtue when I was on-call at two in the morning trying to figure out why an MCP was throwing 500s. There is one way to do error handling. There is one way to do concurrency. There is one way to read a JSON body. The code is boring, and the boring code was easier to debug.

Smaller blast radius for dependency updates. Go modules with go.sum produce reproducible builds. A go mod tidy followed by a build verifies the entire dependency graph. When you update a Python package, you find out at runtime whether it was a breaking change. When you update a Go package, you find out at compile time. This is not a small difference at two in the morning.

The unexpected pains

Three things hurt that I had underestimated.

Boilerplate. Go has a verbose-by-design philosophy. Where Python lets me read a JSON request body in two lines, Go takes six. Where Python has list comprehensions, Go has for loops. Where Python has decorators, Go has wrapper functions. Multiply this by the surface area of four services and you write a lot of code that, in Python, was implicit. I learned to like this. But I had to learn.

Error handling discipline. if err != nil is the most-mocked line in Go and it deserves the mockery — until you spend an afternoon debugging a Python service that swallowed an exception three layers deep. Go’s every error is in your face model is exhausting to write and reassuring to read. I now know exactly what every one of my services does when replicate.Run() returns an error. I did not know that for the Python equivalents.

The library gap on niche packages. For Replicate and Tavily, the Go SDKs were either community-maintained or non-existent, and I rolled my own HTTP. That is fine if the API is small. It is painful if the API is large or undocumented. One of my services lost a weekend to what does this API actually do when you pass it an empty string for a required field research that I would not have had to do if a vendor-maintained SDK existed.

The numbers

These are the actual measurements from before and after — taken on the same cluster, the same Knative configuration, the same workload.

  • Cold-start latency — Python services were measuring 4.9 to 7.2 seconds (about 6 seconds average across the four). The Go rewrites came in at 200 to 400 milliseconds. Roughly 20× faster at the per-call level.
  • Memory at idle — Python services were sitting at about 80 MiB each. Go binaries sit at about 15 MiB. Across five services, that is the difference between 400 MiB always reserved and 75 MiB always reserved — and on a small cluster with limited CPU per VM, this is the kind of footprint that matters.
  • Image sizes — about 300 MB Python images became about 20 MB Go images. 15× smaller, with proportional improvements to image-pull latency and registry storage churn.
  • The headline number — workflow chain latency. A real agent run that chains four MCPs sequentially in its draft phase used to eat 20–25 seconds of cumulative cold-start tax every time the chain ran cold. After the rewrite, the same chain takes about 2 seconds end-to-end. That is the user-facing win, and it is the one I actually care about. The rest is plumbing.

The agents, which were the actual customers, started working better. Tool-call chains that had felt sluggish now feel responsive. That is the only metric I deeply care about. Everything else is vanity.

When you should not do this

You should not rewrite a Python service in Go if:

  • The service is fast enough. If your cold start is already under a second and you are not running on a small cluster, this is not your problem. Do not make it your problem.
  • You are doing data science. Python’s library ecosystem for ML, statistics, image processing, anything-with-numpy is incomparable. Go is not where your model-serving service should live.
  • You are using a heavy framework that Go does not have. If your Python service is built on Django or some specific async framework with deep ecosystem dependencies, the Go port is not a port — it is a rewrite. Different math.
  • You are a Python team. If you do not already have Go in production, the operational cost of introducing a new language for one service is not worth the latency win on that service. Pick your battles.

I had four small adapters with simple request-response shapes, no heavy library deps, and a latency floor that mattered. That is the exact shape Go was good for.

What I would do differently

If I had to do this over, I would extract the boilerplate before the second service, not the third. Two-thirds of each MCP is the same — HTTP setup, JSON request-and-response handling, MCP protocol scaffolding, error envelopes, structured logging. By service three I had a small internal library that did all of that. By service four it was a copy-paste. Services one and two had a lot of code I would have happily deleted in retrospect.

I would also keep at least one Python service around as a comparison baseline, so when someone asks “could you not have just optimised the Python,” I have a same-shape service to point to. (I can confidently tell you the answer is no, not at the cold-start level, but I cannot show you, which is annoying.)

The bigger meta-lesson — small services on a small cluster are a different optimisation problem than big services on big infrastructure. On EKS with a fleet of warm replicas, none of this matters; keep your Python. On a six-node cluster scaling-to-zero between requests, every second of cold start is a second the user is staring at a loading spinner. Pick the language that respects the loading spinner.

I respect the loading spinner now. The agents do too. The four services do not time out any more.

That is the whole win, in one sentence.


Write a comment
No comments yet.