
The Platform Engineering Podcast is a show about the real work of building and running internal platforms — hosted by Cory O’Daniel, longtime infrastructure and software engineer, and CEO/cofounder of Massdriver. Each episode features candid conversations with the engineers, leads, and builders shaping platform engineering today. Topics range from org structure and team ownership to infrastructure design, developer experience, and the tradeoffs behind every “it depends.” Cory brings two decades of experience building platforms — and now spends his time thinking about how teams scale infrastructure without creating bottlenecks or burning out ops. This podcast isn’t about trends. It’s about how pl...
10

<p>What happens when your “coworker” can generate code and changes faster than your team can review them, and production still has to stay up?</p><p>William Collins breaks down what AI-Native Ops looks like when you take reliability seriously: where reasoning should stop, where deterministic automation should begin, and how guardrails like compliance checks, version pinning, and controlled workflows keep AI from turning into outage fuel. Cory and William also dig into why context windows and tool sprawl matter in real systems, how protocols like MCP and agent-to-agent communication are shaping day-to-day automation, and why regulated environments can’t adop...

<p>Terraform drift, state wrangling, and a growing “tools for tools” stack are still daily work for many platform teams - despite a decade of DevOps talk and cloud maturity. Why does ops automation so often feel like it needs babysitting?</p><p>Pavlo Baron breaks down where Infrastructure as Code tends to break down in real organizations: manual drift management, low-level state complexity, and a lack of practical abstractions that let developers self-serve without inheriting the entire ops burden.</p><p>The conversation digs into what a more use-case-driven approach could look like - where teams can choose when to e...

<p>Billions of requests a month on AWS Lambda can cost less than a single engineer’s laptop budget, but only if the architecture and developer workflow are designed for it.</p><p>Justin Masse, Senior Platform DevOps Engineer at Extend, shares how Extend committed early to a serverless-first approach and built a platform that prioritizes developer speed and low operational toil. The conversation breaks down what it takes to run active-active, multi-region systems in a serverless world, how the team keeps services small and fast, and why asynchronous, event-driven design changes both reliability and cost.</p><p>You’ll also...

<p>What happens when nobody wrote the code running in your production environment? As AI-generated software becomes standard practice, platform engineers face a new challenge: operating systems without experts to consult.</p><p>Nic Benders, Chief Technical Strategist at New Relic, has spent 15 years watching observability evolve from basic server monitoring to understanding complex distributed systems. Now he's tackling the next frontier: how to maintain and operate software when there's no human author to ask why something was built a certain way.</p><p>The conversation covers the shift from instrumentation being the hard problem to understanding being the bottleneck...

<p>Why do so many “modern” platforms feel slow, fragile, and painful to work on?</p><p>Platform engineer and fractional CTO Brian Childress joins Cory to discuss how over-engineering, resume‑driven development, and scattered tooling quietly block teams from shipping value. They explore why simplicity is a competitive advantage for platform teams, especially as AI becomes part of everyday development.</p><p>You’ll learn:</p>How to design a simple platform MVP that developers actually like usingWhat a good local‑to‑prod story looks like (and why it’s the real scaling superpower)Practical ways to onboard humans and AI tools s...

<p>What if changing a single flag could save you from a failed migration, a broken API, or a late-night rollback?</p><p>Join us as we dive into how feature flags become a practical tool for changing application behavior at runtime, not just toggling UI elements. Cory talks Mike Zorn about real stories from LaunchDarkly and Rippling, covering how teams use flags to ship safely, debug faster, and simplify complex systems.</p><p>You’ll hear about:</p>Using feature flags to avoid staging overload and ship directly to productionMigrating critical systems and databases with minimal downtime and riskControlling lo...

<p>Most Kubernetes security breaches don't come from zero-day exploits - they come from misconfigurations. While your team runs scanners and reviews reports, containers are already running as root, network policies are missing, and compliance violations are piling up across dozens of repositories.</p><p>Jim Bugwadia, co-founder and CEO of Nirmata and creator of Kyverno, joins Cory to talk about a different approach: policy as code. Instead of asking developers to remember security best practices across every repo, what if your cluster automatically enforced secure defaults and blocked non-compliant deployments before they ever reached production?</p><p>You'll learn...

<p>Is your Git repo really the source of truth for infrastructure - or just a suggestion?</p><p>Guest host Kelsey Hightower sits down with Cory O’Daniel to unpack why many teams hit dead ends with CI/CD for provisioning, where GitOps struggles with drift, and when TicketOps helps or hurts. They explore a different model: infrastructure as data with typed contracts, shared artifacts, and workflows that embed policy, validation, and upgrades from the start. You’ll hear practical ways to reduce cognitive load for developers while giving operations reliable control and better day‑2 levers.</p><p>You’ll learn...

<p>What if your production environment had a live, trustworthy blueprint you could zoom in and out of on demand?</p><p>Kelsey Hightower guest-hosts a candid conversation with Cory about why CI/CD pipelines and GitOps often break down for cloud infrastructure. They explore a simpler operational model: treat infrastructure as data, lean on clear checkpoints instead of rigid “golden paths,” and make production legible for both developers and ops.</p><p>You’ll learn:</p>Where CI/CD adds friction for infra and what to do insteadWhy GitOps works for apps but hits limits for databases, networks, and multi...

<p>Ever wonder why strong Terraform modules still lead to long review queues and fragile pipelines? From hand-built scripts and early data center migrations to cloud sprawl and Kubernetes, configuration management has changed a lot - but the core struggle remains: too many decisions, not enough guardrails. Guest host Kelsey Hightower sits down with Cory O’Daniel to unpack where Infrastructure as Code succeeds and where teams get stuck.</p><p>What you’ll learn:</p>How to avoid “choice overload” in cloud configs by moving decisions upstreamPractical ways to pair IaC with UX, policies, and SLAs to reduce toilWhen click-op...