When the Cloud Math Stops Working: Consolidating Dev Environments and Moving GPUs On-Prem
AI/ML Integration

When the Cloud Math Stops Working: Consolidating Dev Environments and Moving GPUs On-Prem

A 25-developer team was running its dev environments on three duct-taped tools and a GCP fleet dominated by two always-on A100 GPUs. Consolidating onto Cloudflare Zero Trust removed the dependency on cloud VMs — and that one property is what made moving the whole environment, GPUs included, on-prem possible. Here's the switch, the math, and what I'd tell another owner.

9 min read
Segev Shmueli

TL;DR.

  • A 25-developer engineering team was running dev environments on three separate tools — cloud VPCs, ngrok, and GitHub/VS Code tunnels — on top of a GCP fleet dominated by two always-on A100 GPU VMs.
  • Consolidating onto Cloudflare Zero Trust (cloudflared for tunneling, Cloudflare mesh for networking) collapsed three tools into one Terraform-defined control plane.
  • The hinge: Cloudflare’s tunnel connector needs no VMs. That single property is what made it possible to move the whole environment — GPUs included — on-prem.
  • The economics: tunneling spend went from ~$7,500/year to $0; a ~$7,600/month cloud fleet became a ~$73,000 one-time on-prem build plus ~$500/month in power. Payback under a year, ~$7,000/month saved ongoing.
  • The point is not “leave the cloud.” It is that infrastructure deserves the same periodic re-evaluation as every other recurring cost — especially once you have always-on GPUs sitting idle at full price.

Two problems that turned out to be one

I spend a lot of my time inside other people’s infrastructure, and the most expensive problems are rarely the ones a team flags. They are the ones that have quietly become “just how things are.”

This is a story about one of those. The client is an engineering team running about 25 developers, and they had two problems they thought of as separate.

The first was tooling sprawl. Their dev environments were held together by three tools duct-taped into one workflow: cloud VPCs to link development pods, ngrok for ad-hoc tunnels, and GitHub/VS Code tunnels for remote editing. Each was its own thing to configure, pay for, and babysit. Tunnels died and needed regenerating. Onboarding a single developer meant wiring up access across all three systems before they could write a line of code.

The second was cost. The VPCs were tied to cloud VMs running 24/7, including two always-on A100 GPU instances for AI training and workflows. Those cost a fortune whether they were busy or idle — and they were idle a lot.

These read like two tickets for two different teams. They were actually the same problem wearing two hats, and solving the first one is what made the second one solvable.

Thread A: three tools become one control plane

The first move was consolidation. I moved everything to Cloudflare Zero Trustcloudflared for tunneling, Cloudflare’s mesh for networking. One control plane instead of three tools, each with its own configuration, billing, and failure modes.

A few things mattered about how this was done, because the how is where the durable wins came from.

I defined the whole setup in Terraform. Tunnels, access policies, and networking became version-controlled infrastructure that anyone on the team could read, review, and reproduce. The configuration stopped living in someone’s head and three dashboards, and started living in a repo.

Then I built an in-house tool to manage the dev pods, onboarding, and access from a single dashboard. Instead of three systems to wire up per developer, there was one place to grant access and spin up an environment.

There was no procurement fight to get going, either — Cloudflare’s free tier covers teams up to 50 users, which comfortably fit a 25-person team. That mattered more than it sounds: it meant the migration could start as an experiment rather than a budget line that needed sign-off before anyone could prove it worked.

The results here were workflow results. Standing up a new project — provisioning all its pods, resources, security, and access management — dropped from about half a day of fiddling to under 15 minutes. Onboarding a new developer went from a day of access plumbing to under an hour. Those gains came from consolidating onto one control plane and codifying it in Terraform — not from anything about where the hardware lived.

But one property of this switch is what turned a tooling cleanup into something much bigger.

The hinge: the connector needs no VMs

Here is the technical detail that connects the two threads.

Cloudflare’s tunnel connector establishes an outbound-only tunnel. The machine running cloudflared reaches out to Cloudflare; nothing needs to reach in. That means the connector does not have to run on a publicly reachable cloud VM. It can run on a server sitting in an office.

The old stack quietly assumed the opposite. The VPCs were tied to cloud VMs because that is where the connectivity terminated. Remote access depended on paying for always-on cloud machines. Once the tunneling layer no longer needed a VM to terminate on, the dependency that pinned the whole environment to the cloud was gone.

That is the moment the second problem became solvable. Dropping the dependence on cloud VMs let us move the whole environment — including the AI workloads — onto on-prem hardware.

And because the connector needs no VMs, we could migrate one environment at a time instead of attempting a risky big-bang cutover. Each environment moved when it was ready, with the option to roll back if something went wrong. Cloud-to-on-prem migrations get a bad reputation precisely because they are usually framed as all-or-nothing. This one wasn’t.

Thread B: the cloud math, recalculated

With the dependency removed, the economics were straightforward to run — and stark.

The tunneling line was the easy one. Paid ngrok was running roughly $7,500/year. On Cloudflare’s free tier, that went to $0.

The infrastructure line was the real story. The old GCP fleet ran about $7,600/month, dominated by those two always-on A100 GPU VMs. We replaced it with a one-time on-prem build of about $73,000 — four servers, a NAS, a UPS, networking, the rest of it — plus ongoing electricity of roughly $500/month.

The shape of that change is the whole point. A large recurring operating cost became a one-time capital cost plus a small recurring one. For a workload that runs around the clock regardless of utilization, that trade almost always favors ownership, because cloud GPUs are priced for elasticity and an always-on training workload uses none of it. You pay the on-demand premium every hour whether the card is saturated or sitting idle.

The hardware paid for itself in under a year. The client now saves around $7,000/month.

Before (cloud)After (on-prem)
Tunneling~$7,500/year (paid ngrok)$0 (Cloudflare free tier)
Compute~$7,600/month (GCP, two always-on A100 VMs)~$73,000 one-time + ~$500/month power
Project setup (pods, resources, security, access)~half a dayunder 15 minutes
Developer onboarding~a dayunder an hour
Control planethree tools (VPCs, ngrok, editor tunnels)one (Cloudflare Zero Trust)

This only works because the workload was genuinely steady-state. The discipline is in being honest about that: ownership makes sense for the always-on core, not for spiky or experimental capacity. Which is why we never burned the bridge — we can still spin up cloud capacity whenever a workload genuinely needs it.

The wins we didn’t plan for

The numbers above were the goal. The wins that came from owning the hardware were not on the plan, and a few of them mattered as much as the savings.

We freed up one of the team’s two DevOps engineers. He had been spending much of his week playing musical chairs with VMs, ngrok domains, and broken tunnels — work that produced nothing except keeping the lights on. That time got rerouted to higher-value work.

Easier, direct access to on-prem servers changed how engineers used the GPUs. When every GPU hour is a metered cloud cost, people ration experiments. When the hardware is sitting there already paid for, they run far more of them. The internal dashboard we built leans into this directly: engineers and pods can “lease” resources on demand rather than filing for them.

Between the freed-up time and the increase in experimentation, we measured a 14–18% productivity boost. I’d caveat that hard: it is an early signal from a small sample, since we only launched this a few months ago. But the direction is clear, and the mechanism behind it is not mysterious — remove the friction tax on compute and people compute more.

What I’d tell another owner

The cloud is a great way to go, and honestly it is probably the right call for most teams most of the time. I want to be precise about that, because the lesson here is easy to over-read into “leave the cloud,” which is the wrong lesson.

The actual lesson is narrower and more useful. A lot of companies treat cloud as just the way things are done — an assumption nobody revisits. But your infrastructure deserves the same ongoing evaluation you give every other part of the business. Costs and usage change. The math that made cloud obvious a year ago might not hold today, especially once you have heavy, always-on compute like GPUs sitting idle at full price.

So, concretely:

  • Re-evaluate on a schedule, not on a crisis. Put infrastructure cost on the same review cadence as any other major recurring line. The trigger for a closer look is steady-state, always-on compute — GPUs above all.
  • Look for the dependency, not just the bill. The savings here were unlocked by removing a single technical dependency (cloud VMs for tunnel termination), not by negotiating a better rate. The expensive part is often pinned in place by an assumption one layer down.
  • Pilot on something non-critical first. The migration worked because it went one environment at a time with a way back. Make your first move reversible.
  • Keep the bridge. Repatriation is not exile. Move the workloads where the math has flipped and keep the ability to burst back to cloud for the ones where it hasn’t.

You may be surprised what you find. This client thought they had a tooling problem. They also had a $7,000-a-month one hiding behind it — and the two turned out to be the same problem.


If you have always-on compute you have never re-priced, or a dev-environment setup that has quietly grown into three tools and a babysitting habit, a conversation costs nothing and usually surfaces the dependency worth looking at first.

Tags

cloud repatriation cloudflare zero trust cloudflared on-prem gpu cloud cost optimization platform engineering developer experience infrastructure as code