Lesson 4: Correlation vs. Causation

In this lesson, we are going to use our new spreadsheet skills, as well as our data visualization skills, to analyze some interesting patterns that occur in our everyday life.

Correlation vs. Causation

Just because two things happen to occur at the same time doesn’t mean that those two things are related. Many superstitious sports fans might disagree with this, but the fact remains: correlation does not inherently imply causation.
What are some examples of this that you can think of in your life?
The article at the link below examines a gentleman, Tyler Vigen, that has been exploring this phenomenon by plunging into open data sets to reveal such stunning coincidences as: as the per capita consumption of cheese has risen steadily since the year 2000, so too has the number of deaths related to becoming tangled in bedsheets.
Why is it important to be able to both parse data AND think critically about what that data means?


Google_Classroom_LogoData Mining

Now let’s take a look at the same data that Tyler Vigen looked at to see if we can reveal any other “stunning” correlations. You can view the data set below, and then download it from Google Classroom.

How To

For this activity we are going to have to use the correlation function in the formula bar: =correl(x#:x#,x#:x#). In case you can’t tell this formula automatically examines the correlation between to ranges of cells. The closer the formula results to 1, the higher the rate of correlation is between the two ranges.

Why Does this Matter?

Gizmodo.com shared a story in early 2014 that examined how certain media outlets occassionally engage in far-from-best practices when visualizing data. View the article at the link below and consider the following question:

  • What motives might someone have in defying data visualization conventions?
  • Why is it important to think critically about data – especially when the results are surprising?