
The podcast by and for AI Engineers! In 2024, over 2 million readers and listeners came to Latent Space to hear about news, papers and interviews in Software 3.0.We cover Foundation Models changing every domain in Code Generation, Multimodality, AI Agents, GPU Infra and more, directly from the founders, builders, and thinkers involved in pushing the cutting edge. Striving to give you both the definitive take on the Current Thing down to the first introduction to the tech you'll be using in the next 3 months! We break news and exclusive interviews from OpenAI, Anthropic, Gemini, Meta (Soumith Chintala), Sierra (Bret Taylor), tiny...
10

<p >From building internal AI labs to becoming CTO of Brex, James Reggio has helped lead one of the most disciplined AI transformations inside a real financial institution where compliance, auditability, and customer trust actually matter.</p><p >We sat down with Reggio to unpack Brex’s three-pillar AI strategy (corporate, operational, and product AI) [https://www.brex.com/journal/brex-ai-native-operations], how SOP-driven agents beat overengineered RL in ops, why Brex lets employees “build their own AI stack” instead of picking winners [https://www.conductorone.com/customers/brex/], and how a small, founder-heavy AI team is shipping production agents to 40,000+ compan...

<p >don’t miss George’s AIE talk: https://www.youtube.com/watch?v=sRpqPgKeXNk</p><p >—-</p><p >From launching a side project in a Sydney basement to becoming the independent gold standard for AI benchmarking—trusted by developers, enterprises, and every major lab to navigate the exploding landscape of models, providers, and capabilities—George Cameron and Micah Hill-Smith have spent two years building Artificial Analysis into the platform that answers the questions no one else will: Which model is actually best for your use case? What are the real speed-cost trade-offs? And how open is "open" really?</p><p >We d...

<p >don’t miss George’s AIE talk: https://www.youtube.com/watch?v=sRpqPgKeXNk</p><p >—-</p><p >From launching a side project in a Sydney basement to becoming the independent gold standard for AI benchmarking—trusted by developers, enterprises, and every major lab to navigate the exploding landscape of models, providers, and capabilities—George Cameron and Micah-Hill Smith have spent two years building Artificial Analysis into the platform that answers the questions no one else will: Which model is actually best for your use case? What are the real speed-cost trade-offs? And how open is "open" really?</p><p >We d...
![[State of Evals] LMArena's $1.7B Vision — Anastasios Angelopoulos, LMArena](/_next/image?url=https%3A%2F%2Fassets.flightcast.com%2FV2Uploads%2Fnvaja2542wefzb8rjg5f519m%2F01K4D8FB4MNA071BM5ZDSMH34N%2Fsquare.jpg&w=256&q=75)
<p >We are reupping this episode after LMArena announced their fresh Series A (https://www.theinformation.com/articles/ai-evaluation-startup-lmarena-valued-1-7-billion-new-funding-round?rc=luxwz4), raising $150m at a $1.7B valuation, with $30M annualized consumption revenue (aka $2.5m MRR) after their September evals product launch.</p><p >—-</p><p >From building LMArena in a Berkeley basement to raising $100M and becoming the de facto leaderboard for frontier AI, Anastasios Angelopoulos returns to Latent Space to recap 2025 in one of the most influential platforms in AI—trusted by millions of users, every major lab, and the entire industry to answer one question: whic...
![[NeurIPS Best Paper] 1000 Layer Networks for Self-Supervised RL — Kevin Wang et al, Princeton](/_next/image?url=https%3A%2F%2Fassets.flightcast.com%2FV2Uploads%2Fnvaja2542wefzb8rjg5f519m%2F01K4D8FB4MNA071BM5ZDSMH34N%2Fsquare.jpg&w=256&q=75)
<p >From undergraduate research seminars at Princeton to winning Best Paper award at NeurIPS 2025, Kevin Wang, Ishaan Javali, Michał Bortkiewicz, Tomasz Trzcinski, Benjamin Eysenbach defied conventional wisdom by scaling reinforcement learning networks to 1,000 layers deep—unlocking performance gains that the RL community thought impossible. We caught up with the team live at NeurIPS to dig into the story behind RL1000: why deep networks have worked in language and vision but failed in RL for over a decade (spoiler: it's not just about depth, it's about the objective), how they discovered that self-supervised RL (learning representations of states, actions, and future sta...
![[State of Code Evals] After SWE-bench, Code Clash & SOTA Coding Benchmarks recap — John Yang](/_next/image?url=https%3A%2F%2Fassets.flightcast.com%2FV2Uploads%2Fnvaja2542wefzb8rjg5f519m%2F01K4D8FB4MNA071BM5ZDSMH34N%2Fsquare.jpg&w=256&q=75)
<p >From creating SWE-bench in a Princeton basement to shipping CodeClash, SWE-bench Multimodal, and SWE-bench Multilingual, John Yang has spent the last year and a half watching his benchmark become the de facto standard for evaluating AI coding agents—trusted by Cognition (Devin), OpenAI, Anthropic, and every major lab racing to solve software engineering at scale. We caught up with John live at NeurIPS 2025 to dig into the state of code evals heading into 2026: why SWE-bench went from ignored (October 2023) to the industry standard after Devin's launch (and how Walden emailed him two weeks before the big reveal), how the be...
![[State of Post-Training] From GPT-4.1 to 5.1: RLVR, Agent & Token Efficiency — Josh McGrath, OpenAI](/_next/image?url=https%3A%2F%2Fassets.flightcast.com%2FV2Uploads%2Fnvaja2542wefzb8rjg5f519m%2F01K4D8FB4MNA071BM5ZDSMH34N%2Fsquare.jpg&w=256&q=75)
<p >From pre-training data curation to shipping GPT-4o, o1, o3, and now GPT-5 thinking and the shopping model, Josh McGrath has lived through the full arc of OpenAI's post-training evolution—from the PPO vs DPO debates of 2023 to today's RLVR era, where the real innovation isn't optimization methods but data quality, signal trust, and token efficiency. We sat down with Josh at NeurIPS 2025 to dig into the state of post-training heading into 2026: why RLHF and RLVR are both just policy gradient methods (the difference is the input data, not the math), how GRPO from DeepSeek Math was underappreciated as a...
![[State of RL/Reasoning] IMO/IOI Gold, OpenAI o3/GPT-5, and Cursor Composer — Ashvin Nair, Cursor](/_next/image?url=https%3A%2F%2Fassets.flightcast.com%2FV2Uploads%2Fnvaja2542wefzb8rjg5f519m%2F01K4D8FB4MNA071BM5ZDSMH34N%2Fsquare.jpg&w=256&q=75)
<p >From Berkeley robotics and OpenAI's 2017 Dota-era internship to shipping RL breakthroughs on GPT-4o, o1, and o3, and now leading model development at Cursor, Ashvin Nair has done it all. We caught up with Ashvin at NeurIPS 2025 to dig into the inside story of OpenAI's reasoning team (spoiler: it went from a dozen people to 300+), why IOI Gold felt reachable in 2022 but somehow didn't change the world when o1 actually achieved it, how RL doesn't generalize beyond the training distribution (and why that means you need to bring economically useful tasks into distribution by co-designing products and models), the...
![[State of AI Startups] Memory/Learning, RL Envs & DBT-Fivetran — Sarah Catanzaro, Amplify](/_next/image?url=https%3A%2F%2Fassets.flightcast.com%2FV2Uploads%2Fnvaja2542wefzb8rjg5f519m%2F01K4D8FB4MNA071BM5ZDSMH34N%2Fsquare.jpg&w=256&q=75)
<p >From investing through the modern data stack era (DBT, Fivetran, and the analytics explosion) to now investing at the frontier of AI infrastructure and applications at Amplify Partners, Sarah Catanzaro has spent years at the intersection of data, compute, and intelligence—watching categories emerge, merge, and occasionally disappoint. We caught up with Sarah live at NeurIPS 2025 to dig into the state of AI startups heading into 2026: why $100M+ seed rounds with no near-term roadmap are now the norm (and why that terrifies her), what the DBT-Fivetran merger really signals about the modern data stack (spoiler: it's not dead, just re...

<p >One year ago, Anthropic launched the Model Context Protocol (MCP)—a simple, open standard to connect AI applications to the data and tools they need. Today, MCP has exploded from a local-only experiment into the de facto protocol for agentic systems, adopted by OpenAI, Microsoft, Google, Block, and hundreds of enterprises building internal agents at scale. And now, MCP is joining the newly formed Agentic AI Foundation (AAIF) under the Linux Foundation, alongside Block's Goose coding agent, with founding members spanning the biggest names in AI and cloud infrastructure.</p><p >We sat down with David Soria Parra (MCP le...