top of page

Simpson's paradox

  • Leandro Santos
  • Nov 12, 2024
  • 2 min read

Updated: Oct 15

Consider the following situation:


Imagine a company planning to launch a new version of its product. It must choose between two flavors: spicy or smooth. To make an informed decision, the company randomly surveys 200 people for their preferences. The overall result is shown in the table below:

ree

The result shows that 80% of users liked the spicy flavor and 75% liked the smooth flavor, leading us to believe that launching the spicy product would be the best decision.


However, when we analyze the same data subdivided by the gender of users (male or female), an unexpected pattern emerges:

ree

The result now points to a preference for the smooth flavor, for both male and female users. This is an interesting effect called Simpson’s paradox.


Simpson's paradox is a phenomenon in probability and statistics in which a trend appears in several groups of data but disappears or reverses when the groups are combined. It can have profound implications for businesses, especially in KPIs where leaders can make decisions based on aggregated data without considering potential subgroup variations. Simpson’s paradox can also impact predictive analytics.


Consider another example:


A company decides to analyze the relationship between ad spending and number of clicks (thousands). The aggregated graph (TOTAL) shows a positive relationship, however, when analyzed by group (perhaps by age) this relationship may not exist or may even be reversed (from upward to downward trend).


ree

While an overall positive trend might be expected (i.e., more ads leading to more clicks), the downward trend can be attributed to ad fatigue. This occurs when excessive exposure to the same advertisements causes consumers to lose interest. They may engage less with the ads or even develop a negative perception of the brand. Moreover, if customers feel overwhelmed or annoyed by too many ads, they might tune out altogether, leading to a decline in sales.


These are two simple examples, but Simpson’s paradox can appear in various contexts, including production, customer service, sales, and more. By carefully segmenting data—by product, customer demographics, region, or other factors - you can ensure more accurate and actionable insights.


Recent Posts

See All
Intermittent demand analysis

Intermittent demand is often marked by many periods with zero demand and occasional periods with nonzero demand, making it challenging for demand planners. In this article I examine the performance of

 
 
 
The silent revolution of AI agents

You've probably heard of AI agents, right? If this term sounds strange to you, you may have already interacted with these systems in their simplest form: chatbots. However, the concept of agents goes

 
 
 

Comments


bottom of page