Simpson's paradox
- Leandro Santos
- Nov 12, 2024
- 2 min read
Updated: Oct 15
Consider the following situation:
Imagine a company planning to launch a new version of its product. It must choose between two flavors: spicy or smooth. To make an informed decision, the company randomly surveys 200 people for their preferences. The overall result is shown in the table below:

The result shows that 80% of users liked the spicy flavor and 75% liked the smooth flavor, leading us to believe that launching the spicy product would be the best decision.
However, when we analyze the same data subdivided by the gender of users (male or female), an unexpected pattern emerges:

The result now points to a preference for the smooth flavor, for both male and female users. This is an interesting effect called Simpson’s paradox.
Simpson's paradox is a phenomenon in probability and statistics in which a trend appears in several groups of data but disappears or reverses when the groups are combined. It can have profound implications for businesses, especially in KPIs where leaders can make decisions based on aggregated data without considering potential subgroup variations. Simpson’s paradox can also impact predictive analytics.
Consider another example:
A company decides to analyze the relationship between ad spending and number of clicks (thousands). The aggregated graph (TOTAL) shows a positive relationship, however, when analyzed by group (perhaps by age) this relationship may not exist or may even be reversed (from upward to downward trend).

While an overall positive trend might be expected (i.e., more ads leading to more clicks), the downward trend can be attributed to ad fatigue. This occurs when excessive exposure to the same advertisements causes consumers to lose interest. They may engage less with the ads or even develop a negative perception of the brand. Moreover, if customers feel overwhelmed or annoyed by too many ads, they might tune out altogether, leading to a decline in sales.
These are two simple examples, but Simpson’s paradox can appear in various contexts, including production, customer service, sales, and more. By carefully segmenting data—by product, customer demographics, region, or other factors - you can ensure more accurate and actionable insights.

Comments