Correlation vs. Causation & No free lunch

Şengül Karaderili
3 min readJan 24, 2021

What is correlation?

Correlation measures the degree to which two variables are related to each other.

+1: Represents a strong positive relationship. E.g; The amount of time your study and your GPA.

-1: Represents the negative relationship. E.g; The cost of a car wash and how long it takes to buy a soda inside the station.

0: It means there is no relationship between them.

If the distance from the mean for a variable is compatible with the distance from the mean of the other variable, we can say that there is a positive correlation. If two variables tend to be close to a similar distance in the opposite direction, we can say that there is a negative correlation.

Let’s look at the image below for a better understanding. (A correlation is assumed to be linear)

Correlation with Scatter Plot

What is causation?

Even if there is a correlation between the two variables, we cannot conclude that one variable causes a change in another. This relationship could be accidental or a third factor could cause both variables to change. For any two correlated events, A and B, their possible relationships include:

A causes B (direct causation);

B causes A (reverse causation);

A and B are both caused by C

A causes B and B causes A (bidirectional or cyclic causation);

There is no connection between A and B; the correlation is a coincidence.

The difference between correlation and causation with an example.

A study conducted in the USA revealed that there is a positive correlation between a student’s SAT test and the number of televisions the student’s family has. Does this mean that families who want their children to be successful should buy 10 televisions? No.

Probably highly educated parents have more than one TV and children who score higher on the exam. The education level of the parents is the third variable here and affects the other two variables that are correlated. If the education level of the parents is higher, they may be making more money. They may be sending their children to better schools. They may also be buying more than one TV for their home. According to the College Board, the SAT math average of the children of families with an income of more than $ 200,000 is 586. Meanwhile, it supports that there is a third variable.

No free lunch theorem

In economics, this term implies that every option has an opportunity cost.

No universal learning algorithm which works better. For example, we cannot say that SVM always predicts better than the Decision Tree. So we are not looking for a general algorithm that will solve all problems. We are looking for a model that will best solve a particular problem.

Performance vs Type of Problem

The average performance of randomly selected sets of algorithms is the same. So if one algorithm is working very well, the others will perform poorly to compensate for it. For example, in the (f1) universe, a1 is a better algorithm and in the (F-f1) universe a2 is better.

NFL Theorem

Our aim should not be to find the best algorithm. The goal is to improve data quality, generate insightful data, see assumptions, and choose the best algorithm for our problem.

Thank you for reading.

Resources:

https://www.khanacademy.org/test-prep/praxis-math/praxis-math-lessons/gtp--praxis-math--lessons--statistics-and-probability/a/gtp--praxis-math--article--correlation-and-causation--lesson

https://books.google.com.tr/books?id=BgFJfC_CrTAC&printsec=frontcover&hl=tr&source=gbs_ge_summary_r&cad=0#v=onepage&q&f=false ( Chapter 4)

https://www.statisticshowto.com/probability-and-statistics/correlation-analysis/

https://en.wikipedia.org/wiki/Correlation_does_not_imply_causation

https://lytongblog.wordpress.com/2018/12/21/correlation-between-two-variables/

https://www.americansforthearts.org/sites/default/files/cbs2011_total_group_report.pdf

https://towardsdatascience.com/intuitions-behind-no-free-lunch-theorem-1d160f754513

https://towardsdatascience.com/a-blog-about-lunch-and-data-science-how-there-is-no-such-a-thing-as-free-lunch-e46fd57c7f27#:~

https://www.coursera.org/lecture/data-machine-learning/no-free-lunch-theorem-vn3jx

--

--