Field Notes

Platform engineering notes and essays.

Practical writing drawn from real client work: migrations, autoscaling, SRE rollout, and operating production systems under pressure.

What a Good GKE Migration Actually Looks Like

A GKE migration was not just about moving workloads. The real win came from simplifying the platform, adopting GitOps, and leaving behind a cluster shape the team could actually operate.

GKE GitOps ArgoCD Terraform

One of the easiest ways to misunderstand a Kubernetes migration is to treat it like a hosting move. Copy workloads, point traffic, call it done. In practice, the move itself is rarely the most important part.

The more valuable outcome was not simply getting onto a new GKE footprint. It was using the migration to simplify a dual-cluster setup, establish a cleaner operational model, and codify the platform so future changes stopped depending on tribal knowledge.

The starting point

The environment supported multiple high-traffic sites and had accumulated the kind of complexity that makes everyday work more expensive than it should be. There were separate clusters, inconsistent deployment patterns, and too much friction around changes that should have been routine.

The goal was to move toward a unified cluster model with proper namespace isolation for development and production, while improving change control and reducing recurring platform overhead.

Why GitOps mattered more than the migration itself

The strongest decision in the project was creating a dedicated GitOps repository to codify Kubernetes applications and image versions. That shifted the source of truth into versioned configuration instead of scattered runtime state.

Once ArgoCD was in place, the platform stopped being something people “kept in sync” by memory. Deployments became more consistent, rollback paths became clearer, and it was much easier to explain the system to developers and auditors.

That is usually the real platform upgrade in this kind of project: not just better infrastructure, but a better operating model.

What we put in place

  • A new Terraform-managed GKE cluster with the platform components needed to support the application estate cleanly
  • ArgoCD for GitOps-based rollout and reconciliation
  • cert-manager and external DNS to make certificate and ingress operations more predictable
  • Google Cloud Monitoring for clearer platform visibility
  • Documentation that developers could actually use and auditors could actually follow

What changed for the team

Developer productivity improved because the deployment path became more legible. Security and compliance improved because the platform became easier to reason about. Operational risk dropped because there were fewer hidden corners where manual drift could accumulate.

There was also a very tangible cost outcome: the migration eliminated a meaningful line item in networking spend over a six-month period. That mattered, but it was a consequence of simplification, not the whole point of the work.

The pattern I keep seeing

Good migrations are rarely pure infrastructure exercises. The best ones use the transition window to fix a deeper operating problem:

  • too many clusters for the actual workload shape
  • too much manual deployment behavior
  • too little documentation around platform intent
  • too much ambiguity about what the desired state should be

If those problems survive the migration, the team often ends up with a newer platform but the same old friction.

What I would carry into the next migration

If I were planning a similar move again, I would keep the same priorities:

  1. simplify the target platform first
  2. make desired state explicit in code
  3. improve the day-two operating model, not just the cutover plan

That is what turns a migration into a lasting platform improvement instead of a very expensive lateral move.