Guest

Martin Tingley

Microsoft

Martin Tingley is Head of Windows Experimentation at Microsoft and former Head of the Experimentation Platform Analysis Team at Netflix.

Guest

HOST

Hugo Bowne-Anderson

Delphina

Hugo Bowne-Anderson is an independent data and AI consultant with extensive experience in the tech industry. He is the host of the industry podcast Vanishing Gradients, a podcast exploring developments in data science and AI. Previously, Hugo served as Head of Developer Relations at Outerbounds and held roles at Coiled and DataCamp, where his work in data science education reached over 3 million learners. He has taught at Yale University, Cold Spring Harbor Laboratory, and conferences like SciPy and PyCon, and is a passionate advocate for democratizing data skills and open-source tools.

‍

Key Quotes

Key Takeaways

Experimentation capability is no longer a competitive edge.

With the proliferation of vendor solutions, the ability to run an A/B test has become a commodity. True competitive advantage now comes from how an organization climbs what Tingley describes as a five-level experimentation maturity ladder: moving beyond basic hypothesis testing into automated generative optimization.

‍

Success is the biggest trap for experimentation teams.

Most organizations are stuck at the second level of that ladder: shipping high-investment features based on individual hypotheses. Because these experiments work and get celebrated, teams don't notice that everything is just "okay" and that there's a better way.

‍

Shift from testing variants to optimizing parameter spaces.

Level 3 requires a mental leap: stop viewing A/B testing as a scientific lab report and start viewing it as hill-climbing. Add optionality into every decision point and use iterative testing to optimize over that space.

‍

Humans are too expensive for micro-optimization.

At Level 4, organizations seed decision-making to machines via contextual bandits. Human product managers are a bottleneck for high-frequency, low-stakes decisions like artwork selection or email subject lines: these only provide business value when automated at scale.

‍

Generative AI turns software into a self-optimizing system.

The Level 5 frontier is a closed loop: GenAI generates production-level variants, an experimentation platform evaluates them, and results feed back to generate better versions. Coframe is already doing this for Fortune 500 e-commerce companies, producing production-ready landing page variants in hours instead of weeks.

‍

Map experiments to product areas to inform strategy.

An "experimentation programs" concept Tingley and colleagues developed at Netflix: plot the distribution of treatment effects by product area. One team runs many small experiments with occasional wins: they should automate. Another team runs few experiments but finds high customer sensitivity: they need more throughput. This turns experiment-level data into a capital allocation tool.

‍

Looking at the mean is not enough.

A/B tests can have wildly different results across user segments: power users, geographic regions, cost-conscious vs. premium customers. Only looking at the average hides what matters: always examine heterogeneous treatment effects.

‍

Every failed experiment is a learning opportunity.

An experiment that "didn't work" overall may have worked for a subset of customers. It may have confused users in a way that reveals a new customer need. The culture of humility required for experimentation isn't just about accepting losses: it's about mining them for signal.

‍

Respect your product's "permission to play."

The amount of change users will tolerate varies by product. Netflix users might find a new UI exciting. Windows users open their machine to get a task done fast: radical UI changes break mental models and trust. Experimentation velocity must match the product's core utility.

‍

Incentivize shots on goal over perfect wins.

To democratize experimentation, shift incentives from rewarding "successful ships" to rewarding throughput and learning. Even if teams game the system by running more tests, the institutional capacity built by high-volume experimentation eventually surfaces non-obvious, high-impact winners that no one would have hypothesized.

‍

You can read the full transcript here.

‍

LINKS

Links From The Show

Transcript

featured

In the spotlight: Our most popular episodes

Episode

Why AI Won’t Fix Your Data Culture, It Will Only Amplify It (And What To Do About It)

Noah Bruegmann, President of Data CRT, joins High Signal to discuss how to move your data function from a cost center to a strategic "value center". He explains how AI amplifies your existing data culture, the importance of "no-assistance" reporting, and how rebranding documentation as "Context" can finally secure executive buy-in. Drawing on 15 years of experience spanning trading floors and Silicon Valley startups, Noah argues that for too long, data teams have been submerged under an "iceberg" of invisible data preparation. He details how the arrival of LLMs and agentic tools is fundamentally shifting this landscape, automating technical drudgery and allowing data professionals to transition into what he calls "Jack Ryan" mode: acting as high-level intelligence analysts rather than mere number crunchers.