Why Averages Can Mislead Decision-Making in Companies
- Feb 26
- 5 min read
Imagine you find yourself in the following situation at your company: the person responsible for purchasing says they need to analyze which supplier to choose based on delivery time. Then, the project manager approaches and states that they have doubts about whether the implementation of the WMS system brought any operational benefit, suggesting an analysis of picking time before and after the system implementation.
Since you can easily capture the historical data for these two cases, you agree to help them with the analysis. After capturing the data from the last 15 periods, you create 2 tables with the respective times:


Based on this, you perform the following analysis:
“There is a difference in times in both cases, however in the first case, the difference between delivery times is small and probably not very significant. Percentually, it corresponds to a difference of only 2.2% between the average delivery times (19.3 days versus 18.8 days). In the second case, there was a significant reduction in picking time, as this time decreased from 8.01 minutes to 7.44 minutes (a 7.2% reduction in time).”
And you conclude:
“We can say that the delivery time of the two suppliers is practically the same, with an insignificant difference. The same does not occur in the second case, as the implementation of the WMS brought a visible reduction in picking time.”
Well, here's the problem: statistically speaking, this analysis is wrong. This is because it simply disregarded the dispersion of the data. In fact, we can say with a 95% degree of confidence that the average delivery time of supplier 1 is significantly longer than that of supplier 2, and the implementation of the WMS did not significantly reduction in the picking time.
We'll now illustrate this visually, and to make things clearer, we'll start by looking at two datasets that follow a normal distribution.


A review of fundamental statistical principles indicates that, for a normal distribution, the shaded blue region under each curve represents 95% of the total area. Specifically, values within the interval from μ-1.96σ to μ+1.96σ encompass 95% of the observed data. Note that in the second curve this range is much smaller, because the observations are less dispersed.
To exemplify what this means: suppose that the first curve represents observations with a mean equal to 0 and a standard deviation equal to 1, and the second curve represents observations with a mean equal to 0 and a standard deviation equal to 0.5. Although both have the same average, we can say that in the first case, 95% of the values are between -1.96 and +1.96. In the second case, 95% of the values are between -0.98 and +0.98. If a distribution has a larger standard deviation, this interval will be wider. That is: greater dispersion means greater uncertainty about the mean.
Returning to our initial tables. How then do we compare whether the means of 2 samples are different or not with a 95% confidence level? In these cases, then, we must perform the t-test, which instead of using a population distribution (normal) uses a sample distribution (t-Student).
Unlike the normal distribution, the t-Student distribution has heavier tails when the number of observations in the sample is small and approaches the normal distribution as the number of observations increases. The t-test is used for an analysis of sample means (up to 2 samples) and, in cases of comparison of means of a larger number of samples (3 or more) another test should be applied, in this case, ANOVA.
Based on the data tables, let's make a visual representation of the data dispersions:
1 – Visual representation of the distribution of supplier lead times (independent samples).

By graphically observing the 95% confidence intervals for each supplier, it is noticeable that the upper limit of the supplier with the lowest average does not overlap with the lower limit of the supplier with the highest average. This visual analysis is only illustrative. The formal decision criterion, however, is given by the t-statistics and the p-value.
It is also important to emphasize that, in this example, we assume equal variances. Traditionally, this verification is done using the F-test. However, in modern applications, it is common to directly use Welch's t-test, which does not assume equal variances and adjusts the degrees of freedom automatically (we will address these two topics in later posts).
T-test for two independent samples and equal variances
Our interest is to verify if supplier 1 has a longer average time than supplier 2. Since the hypothesis is directional, we use a one-tailed test.
Hypotheses:
H₀: μ₁ ≤ μ₂
H₁: μ₁ > μ₂
The formula for the t-test for 2 independent samples and equal variances is given by:

And applying the formula we have:

The one-tailed critical value for a significance level of 5% is approximately 1.701. The t-value of 3.775 is well above the critical value, therefore, we reject H₀.
The p-value of this test is approximately 0.0004. This means that the result remains statistically significant even if we adopt a confidence level of approximately 99.96%!
A confidence level close to 100% may even be a type of choice widely used in medicine, for example, to attest to the effectiveness of a drug in a group of people (in this case, a paired t-test, as we will see below). However, in our case, we can say with a 95% confidence level that supplier 1 has a significantly longer average delivery time than supplier 2 (p-value below 0.05).
Let's now analyze the picking time data...
2 – Representation of the distribution of picking times before and after the implementation of the WMS (paired samples).

Note that, in our illustrative representation, the maximum limit of the time with the lowest average exceeds the minimum limit of the time with the highest average. This intersection is given by the shaded green area. In cases like this, we cannot, with a 95% confidence level, rule out the hypothesis that the times are equal.
T-test for two paired samples
Unlike the first case, where we performed the t-test for 2 independent and distinct samples, this time we will analyze the same sample, but at different times. Here we use the paired t-test, since we are comparing the same sample over time.
Hypotheses:
H₀: the mean of the differences is equal to zero
H₁: the mean of the differences is different from zero
The formula for the paired t-test is given by:

Applying the formula:

The one-tailed critical value for a level of significance of 5% is approximately +/- 1.761. The t-value of -1.675 is below the critical value, with a p-value of 0.058. Thus, with a 95% confidence level, we cannot reject the hypothesis that the times are equal (mean of the differences is equal to zero). That is, although the percentage reduction seems relevant (7.2%), the variability of the data prevents this difference from being considered statistically significant.
Important considerations
The t-test works well even in small samples (such as 15 observations), if there is no strong asymmetry in the data. Although the Central Limit Theorem indicates that means tend towards normality as the sample size increases, with small samples we use the t-distribution precisely to deal with this additional uncertainty.
Conclusion
Comparing only percentages can lead to flawed decisions.
In the first case, a seemingly small difference (2.2%) proved to be statistically significant. In the second case, a seemingly larger reduction (7.2%) was not sufficient to guarantee statistical significance.
The lesson is clear: In data analysis, averages don't tell the whole story. Dispersion is as important as the magnitude of the difference.

Comments