Statistical significance & how it acts as the benchmark for ‘valid’ theories
- Michelle Ncube
- Aug 1, 2025
- 3 min read
“Faster than light neutrinos?!”. In 2011 the scientific world went crazy. Einstein’s laws were thought to be violated as nothing can possibly go faster than light! However, this turned out to be false and was later debunked after repeated experiments.
In the scientific world, how are we able to distinguish between results that were due to random errors or results that are more likely to happen, making them valid foundations to support or develop new theories?
This is where statistical significance comes in.
Statistical significance and p values.
Every experiment has a null hypothesis (H₀) which is the idea we assume to be true until evidence shows us that it’s not.
For example for a fair coin, we’d expect the probability for it to land on tails 100 times is 50%, making it the null hypothesis.
There is also an alternative hypothesis (H₁) which is an alternative idea that acts as a somewhat valid conclusion if the null hypothesis was to be deemed false.
In the coin example, this would be the probability the coin lands on tails is greater ( or less than ) 0.5 aka 50%.
As experiments will always have random error, scientists use a p value to measure how strong the evidence collected is, against H₀.
P values tell us how extreme our data is, with the assumption that the null hypothesis is true.
If this p value goes beyond a certain significance level/certain value, this will indicate how likely the null hypothesis is to be true as what we have tested is more likely to happen because of chance, not because of a real phenomenon. However, if the p value is less than the significance level, this suggests what has happened is very unlikely to have occurred due to chance, therefore, we can reject the null hypothesis and accept H₁.
Significance levels
Different fields have different significance levels, the standard across the world is 5% but in physics this benchmark drastically lower. In physics, the standard is 5σ (1 in 3.5 million chance of it being unlikely that we observed was due to chance).
5σ is equivalent to p < 0.000003 which is a significance levels of 0.0003%
In the coin example let’s say the significance level is 0.01, if the p value we get is less than 0.01, this means the probability of the coin landing on tails 50% of the time out of 100 throws would be unlikely to happening randomly, making the coin biased. So, we’d reject the null hypothesis and accept H₁.
Wait, if we are assuming H₀ is true and there’s such a low probability of the data occurring being true, why do scientists say it disproves the null hypothesis??
This is because if the experiment is repeated and there is a p value of 0.01 consistently, this arises questions as to why the data that supposedly has a low probability of showing up, IS showing up after multiple experiments. That’s when we can say the null hypothesis is false as if it were true, data that is meant to show 1 % of the time is popping up too frequently.
So, why do we use statistical significance?
When scientists carry experiments, they often need to figure out a way to distinguish between random results and real results.
If scientists are able to carry out experiments that disprove a null hypothesis this can lead to discoveries in science which can not only help us understand more about the world we live in, but also improve lives, help check the suitability of drugs and help us be more confident in economic theory that’s been proposed.
Further reading prompts
Discovering the significance of 5σ
Higgs boson particle
Binomial distributions
How to find the p value
References (in no particular order):







Comments