Recently I came across an article that talked about how statistics is a funny language. Many times, statistics borrows words that we probably use in our daily vocabulary and gives it a whole new meaning. These words regularly baffle non-statisticians, even when used in context. Let’s look at an example before we proceed. The word ‘significance’ is defined by the oxford dictionary as “the quality of being worthy of attention; importance” or “the meaning to be found in words or events”. However, in statistics, the word means “The odds that the results observed are not just a chance result”. Similarly, the words that we are going to talk about that confuse people, even statisticians, are correlation and causation. The two terms often are interchanged when interpreting results and based on the type of project, could have severe costs.
Correlation means ‘relationship’. Two events may appear to be happening together but we cannot conclude that one of the events is causing the other event, especially when they always happen together. For example, every year, when a particular brand of sweaters is sold at a store, that stores’ employees, more than usual, always fall sick. The obvious conclusion would be that the sweaters are causing the employees to fall sick by causing allergies. However, on digging deeper, we’ll find that these sweaters are only manufactured and sold during the winter season and the extreme cold is what is causing the employees to fall sick and not the sweaters. In this case, the weather is causing the sickness (causation – cause and effect!) while the sweater and the sickness are correlated.
Let’s put our geeky hats on and get a bit technical. Causation is only applicable when event A causes event B. Even when a whole other event, say C, causes event B that in turn causes event A, events A and B are still correlated and the causation factor is event C. If you think this is confusing, brace yourself, the next section is going to be a doozy!
The below are examples of cases that are correlation and not causation –
1. The opposite is true – B is causing A
2. Events A and B are correlated happening together and both are caused by event C
3. A causes B but only when event C happens
4. A causes Z, which in turn causes C and that causes B (chain reaction but we see only events A and B)
One of the main reasons these two words are easily interchanged is because the human mind loves to find patterns even when no pattern exists. This confusion arises because we do not understand what causation is. It only applies to cases where one event directly causes the other. In all other cases, it is purely correlation and not causation.
Here is an example of what happens when the correlation of the factor is mistaken to be the causation. In business terms, attributing one factor to the success of a product or service when it isn’t can spell trouble. For example, during summer, both sunglasses and ice-creams are sold at a higher rate than during any other season. However, to falsely attribute sales of the sunglasses or ice creams to one another and not to ‘summer’ would be catastrophic to a business. If a particular year is going to have an abnormally warm winter, ice cream sales would continue to sell. However, sunglasses would probably not (due to cloudier skies maybe). If the business had attributed an increase in ice cream sales to an increase in sunglasses sales, they would now be sitting with a bulky closing stock of unsold sunglasses. If the business had rightly attributed the weather to sunglasses sales, this misfortune wouldn’t have happened. Rightly understanding which factor is causing which factor is the key to strategic decision making. Long working hours leading to employee dissatisfaction and fatigue is another example of causation. One factor directly leads to the other. To not take action on long working hours, in this case, would be a mistake for the business in the long run.
It can get a bit confusing but the vital point to keep in mind is to not conclude fast. The most difficult part of this exercise is to find out the real cause for an effect when there are multiple factors intertwined. Take your time to verify the factors and always check for any hidden ones. Finding the real cause can have multiple benefits such as,
- Explaining the reason for the current situation
- Forecast for future outcomes based on the current scenario and
- Invest in resources to correct unfavourable current outcomes
There are a multitude of statistical tools and concepts available for finding out if two factors are correlated or if one is causing the other. Make sure to be critical of the analysis. The best approach is to break down the problem into smaller chunks and to assess them in bits before making a generalized ‘big-picture’ conclusion.
Remember, it always gets easier with practice!