What is cloud repatriation?

Cloud repatriation is moving workloads off public cloud and back onto infrastructure you own or colocate. It is not a rejection of cloud — it is a recalculation. The case for it gets strong when you have heavy, steady-state compute that runs whether it is busy or not, because that is exactly the usage pattern public cloud prices least efficiently. Always-on GPU instances for AI training and inference are the most common trigger. The right way to do it is selectively and reversibly: move the workloads where the math has flipped, keep the ability to burst back to cloud for everything else.

Why move GPU workloads off the cloud?

Cloud GPUs are priced for elasticity, and elasticity is exactly what an always-on training or inference workload does not use. If a GPU instance runs around the clock, you pay the on-demand premium every hour whether the card is saturated or idle. For a team with two always-on A100-class instances, that premium dominates the bill. Owning the hardware converts a large recurring operating cost into a one-time capital cost plus electricity — and for steady-state usage that math frequently favors on-prem within a year. The trade-off is you take on capacity planning and maintenance, so it only makes sense once the workload is genuinely steady.

How does Cloudflare Zero Trust replace VPNs and tunnels for dev environments?

Cloudflare Zero Trust provides one control plane for the things teams usually stitch together from separate tools: cloudflared replaces ad-hoc tunneling like ngrok and editor tunnels, and Cloudflare's mesh networking replaces the cloud VPCs used to connect development pods. Access policies live in the same place. The connector establishes an outbound-only tunnel, so the machine it runs on does not need to be a publicly reachable cloud VM — it can be a server on-prem. That single property is what makes it possible to move environments off cloud VMs without losing remote access.

Does moving dev environments on-prem hurt developer experience?

It does not have to — and in this case it improved. Defining tunnels, access policies, and networking in Terraform made the whole setup version-controlled and reproducible, and an internal dashboard handled pod management, onboarding, and resource leasing. Standing up a new project — all its pods, resources, security, and access management — dropped from about half a day to under 15 minutes, and onboarding a new developer went from roughly a day of access plumbing to under an hour. The gains came from consolidating onto one control plane and codifying it, not from the location of the hardware.

When should a company reconsider its cloud setup?

Treat infrastructure like any other major recurring cost: re-evaluate it on a schedule rather than assuming the decision you made a year or two ago still holds. The strongest signal to look closely is heavy, always-on compute — GPUs especially — sitting at full on-demand price regardless of utilization. Other signals include onboarding and environment setup that have quietly become multi-tool chores, and a meaningful share of an engineer's week spent babysitting infrastructure plumbing. Pilot any change on a non-critical environment first, measure, and keep a path back to cloud for the workloads that still belong there.

When the Cloud Math Stops Working: Consolidating Dev Environments and Moving GPUs On-Prem

TL;DR.

A 25-developer engineering team was running dev environments on three separate tools — cloud VPCs, ngrok, and GitHub/VS Code tunnels — on top of a GCP fleet dominated by two always-on A100 GPU VMs.
Consolidating onto Cloudflare Zero Trust (cloudflared for tunneling, Cloudflare mesh for networking) collapsed three tools into one Terraform-defined control plane.
The hinge: Cloudflare’s tunnel connector needs no VMs. That single property is what made it possible to move the whole environment — GPUs included — on-prem.
The economics: tunneling spend went from ~$7,500/year to $0; a ~$7,600/month cloud fleet became a ~$73,000 one-time on-prem build plus ~$500/month in power. Payback under a year, ~$7,000/month saved ongoing.
The point is not “leave the cloud.” It is that infrastructure deserves the same periodic re-evaluation as every other recurring cost — especially once you have always-on GPUs sitting idle at full price.

Two problems that turned out to be one

I spend a lot of my time inside other people’s infrastructure, and the most expensive problems are rarely the ones a team flags. They are the ones that have quietly become “just how things are.”

This is a story about one of those. The client is an engineering team running about 25 developers, and they had two problems they thought of as separate.

The first was tooling sprawl. Their dev environments were held together by three tools duct-taped into one workflow: cloud VPCs to link development pods, ngrok for ad-hoc tunnels, and GitHub/VS Code tunnels for remote editing. Each was its own thing to configure, pay for, and babysit. Tunnels died and needed regenerating. Onboarding a single developer meant wiring up access across all three systems before they could write a line of code.

The second was cost. The VPCs were tied to cloud VMs running 24/7, including two always-on A100 GPU instances for AI training and workflows. Those cost a fortune whether they were busy or idle — and they were idle a lot.

These read like two tickets for two different teams. They were actually the same problem wearing two hats, and solving the first one is what made the second one solvable.

Thread A: three tools become one control plane

The first move was consolidation. I moved everything to Cloudflare Zero Trust — cloudflared for tunneling, Cloudflare’s mesh for networking. One control plane instead of three tools, each with its own configuration, billing, and failure modes.

A few things mattered about how this was done, because the how is where the durable wins came from.

I defined the whole setup in Terraform. Tunnels, access policies, and networking became version-controlled infrastructure that anyone on the team could read, review, and reproduce. The configuration stopped living in someone’s head and three dashboards, and started living in a repo.

Then I built an in-house tool to manage the dev pods, onboarding, and access from a single dashboard. Instead of three systems to wire up per developer, there was one place to grant access and spin up an environment.

There was no procurement fight to get going, either — Cloudflare’s free tier covers teams up to 50 users, which comfortably fit a 25-person team. That mattered more than it sounds: it meant the migration could start as an experiment rather than a budget line that needed sign-off before anyone could prove it worked.

The results here were workflow results. Standing up a new project — provisioning all its pods, resources, security, and access management — dropped from about half a day of fiddling to under 15 minutes. Onboarding a new developer went from a day of access plumbing to under an hour. Those gains came from consolidating onto one control plane and codifying it in Terraform — not from anything about where the hardware lived.

But one property of this switch is what turned a tooling cleanup into something much bigger.

The hinge: the connector needs no VMs

Here is the technical detail that connects the two threads.

Cloudflare’s tunnel connector establishes an outbound-only tunnel. The machine running cloudflared reaches out to Cloudflare; nothing needs to reach in. That means the connector does not have to run on a publicly reachable cloud VM. It can run on a server sitting in an office.

The old stack quietly assumed the opposite. The VPCs were tied to cloud VMs because that is where the connectivity terminated. Remote access depended on paying for always-on cloud machines. Once the tunneling layer no longer needed a VM to terminate on, the dependency that pinned the whole environment to the cloud was gone.

That is the moment the second problem became solvable. Dropping the dependence on cloud VMs let us move the whole environment — including the AI workloads — onto on-prem hardware.

And because the connector needs no VMs, we could migrate one environment at a time instead of attempting a risky big-bang cutover. Each environment moved when it was ready, with the option to roll back if something went wrong. Cloud-to-on-prem migrations get a bad reputation precisely because they are usually framed as all-or-nothing. This one wasn’t.

Thread B: the cloud math, recalculated

With the dependency removed, the economics were straightforward to run — and stark.

The tunneling line was the easy one. Paid ngrok was running roughly $7,500/year. On Cloudflare’s free tier, that went to $0.

The infrastructure line was the real story. The old GCP fleet ran about $7,600/month, dominated by those two always-on A100 GPU VMs. We replaced it with a one-time on-prem build of about $73,000 — four servers, a NAS, a UPS, networking, the rest of it — plus ongoing electricity of roughly $500/month.

The shape of that change is the whole point. A large recurring operating cost became a one-time capital cost plus a small recurring one. For a workload that runs around the clock regardless of utilization, that trade almost always favors ownership, because cloud GPUs are priced for elasticity and an always-on training workload uses none of it. You pay the on-demand premium every hour whether the card is saturated or sitting idle.

The hardware paid for itself in under a year. The client now saves around $7,000/month.

	Before (cloud)	After (on-prem)
Tunneling	~$7,500/year (paid ngrok)	$0 (Cloudflare free tier)
Compute	~$7,600/month (GCP, two always-on A100 VMs)	~$73,000 one-time + ~$500/month power
Project setup (pods, resources, security, access)	~half a day	under 15 minutes
Developer onboarding	~a day	under an hour
Control plane	three tools (VPCs, ngrok, editor tunnels)	one (Cloudflare Zero Trust)

This only works because the workload was genuinely steady-state. The discipline is in being honest about that: ownership makes sense for the always-on core, not for spiky or experimental capacity. Which is why we never burned the bridge — we can still spin up cloud capacity whenever a workload genuinely needs it.

The wins we didn’t plan for

The numbers above were the goal. The wins that came from owning the hardware were not on the plan, and a few of them mattered as much as the savings.

We freed up one of the team’s two DevOps engineers. He had been spending much of his week playing musical chairs with VMs, ngrok domains, and broken tunnels — work that produced nothing except keeping the lights on. That time got rerouted to higher-value work.

Easier, direct access to on-prem servers changed how engineers used the GPUs. When every GPU hour is a metered cloud cost, people ration experiments. When the hardware is sitting there already paid for, they run far more of them. The internal dashboard we built leans into this directly: engineers and pods can “lease” resources on demand rather than filing for them.

Between the freed-up time and the increase in experimentation, we measured a 14–18% productivity boost. I’d caveat that hard: it is an early signal from a small sample, since we only launched this a few months ago. But the direction is clear, and the mechanism behind it is not mysterious — remove the friction tax on compute and people compute more.

What I’d tell another owner

The cloud is a great way to go, and honestly it is probably the right call for most teams most of the time. I want to be precise about that, because the lesson here is easy to over-read into “leave the cloud,” which is the wrong lesson.

The actual lesson is narrower and more useful. A lot of companies treat cloud as just the way things are done — an assumption nobody revisits. But your infrastructure deserves the same ongoing evaluation you give every other part of the business. Costs and usage change. The math that made cloud obvious a year ago might not hold today, especially once you have heavy, always-on compute like GPUs sitting idle at full price.

So, concretely:

Re-evaluate on a schedule, not on a crisis. Put infrastructure cost on the same review cadence as any other major recurring line. The trigger for a closer look is steady-state, always-on compute — GPUs above all.
Look for the dependency, not just the bill. The savings here were unlocked by removing a single technical dependency (cloud VMs for tunnel termination), not by negotiating a better rate. The expensive part is often pinned in place by an assumption one layer down.
Pilot on something non-critical first. The migration worked because it went one environment at a time with a way back. Make your first move reversible.
Keep the bridge. Repatriation is not exile. Move the workloads where the math has flipped and keep the ability to burst back to cloud for the ones where it hasn’t.

You may be surprised what you find. This client thought they had a tooling problem. They also had a $7,000-a-month one hiding behind it — and the two turned out to be the same problem.

If you have always-on compute you have never re-priced, or a dev-environment setup that has quietly grown into three tools and a babysitting habit, a conversation costs nothing and usually surfaces the dependency worth looking at first.

When the Cloud Math Stops Working: Consolidating Dev Environments and Moving GPUs On-Prem

Two problems that turned out to be one

Thread A: three tools become one control plane

The hinge: the connector needs no VMs

Thread B: the cloud math, recalculated

The wins we didn’t plan for

What I’d tell another owner

Tags

Related Articles

Fractional CTO vs CAIO: which one do you actually need?

AI Governance Framework: From Proof to Procurement

AI Governance Measurement Layer: From Posture to Proof