Episode
36

AI and the Judgment Problem in Data Science

March 19, 2026
Listen on
spotify logoApple podcast logo
Guest
Dawn Woodward

LinkedIn

,

Dawn Woodward is a technical and executive leader in machine learning, data science, and statistics. With her team, Dawn partners cross-functionally to create deeply data-driven user experiences and the algorithmic platforms that power them. Before moving to tech, she was a tenured professor of statistics and operations research at Cornell. She has publications on topics including predictive modeling for vehicle fleet management, pricing technologies for ride-sharing, and statistical methods for datacenter management.

Guest
Andrés Bucchi

LATAM Airlines

,

Andrés Bucchi is the Chief Data Officer at LATAM Airlines, where he leads the company’s transformation into a data- and AI-driven organization His team spans domains from pricing and marketing to operations and safety, scaling experimentation and applied machine learning across one of the world’s most complex industries.

Before LATAM, Andrés was VP of Data & Analytics at Sodimac, Latin America’s largest home improvement retailer, where he built experimentation and analytics capabilities at scale. He previously spent four years at Uber in San Francisco and Chile, working across operations, strategy, and applied machine learning in pricing.

An entrepreneur at heart, Andrés also co-founded Experimento Social, a software and analytics consultancy in Chile, and today advises early-stage AI companies on strategy and adoption.

Guest
Jeremy Hermann

Delphina

,

Jeremy Hermann is the Co-Founder of Delphina. Prior to this, he was Head of ML Platform at Uber, architect of Michelangelo, and Co-Founder of Tecton.

Guest

,
HOST
Hugo Bowne-Anderson

Delphina

Hugo Bowne-Anderson is an independent data and AI consultant with extensive experience in the tech industry. He is the host of the industry podcast Vanishing Gradients, a podcast exploring developments in data science and AI. Previously, Hugo served as Head of Developer Relations at Outerbounds and held roles at Coiled and DataCamp, where his work in data science education reached over 3 million learners. He has taught at Yale University, Cold Spring Harbor Laboratory, and conferences like SciPy and PyCon, and is a passionate advocate for democratizing data skills and open-source tools.

Key Quotes

Key Takeaways

Semantic ambiguity kills AI utility. 

Dawn highlights that at Uber, having multiple conflicting definitions for a single concept like "user sessions" makes AI-driven analysis impossible. Rigorous data annotation and a single "source of truth" catalog are now more critical for AI than they ever were for human analysts.

Upstream validation replaces downstream cleaning. 

Andres argues that the traditional data engineering workflow is being flipped. By implementing "strong validation first" at the point of ingestion, organizations can use AI to maintain downstream data quality that was previously too expensive or complex to manage manually.

AB testing platforms are the new bottleneck. 

While AI allows teams to build features and variants at record speed, legacy experimentation platforms are hitting their limits. Dawn observed that the sheer volume of AI-generated experiments is creating statistical bias issues and "babysitting" requirements that legacy internal tools weren't designed to handle.

AI requires "headless" security architectures. 

Standard hierarchical permission models break when agents are introduced. Andres advocates for building "headless" AI services that live in the middleware, allowing agents to inherit and enforce the specific identity tokens and access rights of the human user they are representing.

The "Analyst" is now a "Verifier." 

As non-technical leaders use GenAI to self-serve data queries, the role of the data expert is shifting. Instead of wrangling data to answer a question, technical teams must now focus on "verifiable outputs"—auditing the AI’s chain of thought to ensure the analysis isn't based on a hallucination or a biased dataset.

AI imitates causal workflows without causal reasoning. 

Despite their next-token prediction capabilities, LLMs are not inherently causal or probabilistic. Dawn notes that while an agent can be prompted to imitate the steps of a causal analysis, true inference still requires the AI to execute specialized, bespoke statistical frameworks (like PyMC) rather than relying on its own reasoning.

Brownfield codebases remain the final frontier. 

There is a massive gap between "vibe coding" a greenfield application and integrating AI into a "brownfield" enterprise codebase. Andres points out that while AI can generate snippets, it cannot yet verify if those changes will scale or break existing complex systems, requiring high-level human architectural judgment.

Qualitative evals are replacing traditional metrics. 

In the era of conversational interfaces, traditional ranking metrics like AUC or precision are becoming insufficient. The panel suggests that evaluation is shifting toward "AI-on-AI" qualitative assessments, where models are used to grade the helpfulness and nuance of a conversational experience.

Data engineering is shrinking to expand.

 LATAM Airlines reduced its data engineering headcount by 20% by automating routine pipeline tasks. This wasn't a cost-cutting measure, but a strategic reallocation: moving those engineers to high-value areas where AI can bridge "tech gaps" that were previously prohibitively expensive to close.

You can find the full transcript here.

LINKS

Links From The Show

Transcript

featured

In the spotlight: Our most popular episodes

most recent

Listen up: Our latest discussions

Hear the hottest takes on data science and AI.

Get the latest episodes in your inbox

Never miss an episode of High Signal by signing up for the Delphina newsletter.

By clicking Sign Up you're confirming that you agree with our Terms and Conditions.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.