The Gamma is a reserch project that aims to make it easier to create open data-driven articles that are linked directly to the data source. Open means both that you can see how an article is created (as in open source software) and that you can further improve it (open as in Wikipedia). This makes the articles:

  • Transparent and accountable — As a reader, you can review how data is used and find out when data is used in a misleading way.
  • Reproducible — You can run the data analysis again on your own and verify that it produces the same result.
  • Interactive and collaborative — You can adapt the article to explore different aspects of the data and share your visualizations.

This web site is an early prototype of how such open data-driven articles might look like. It shows a number of interactive visualizations built around the Olympic medals data set, partly inspired by the nice work of The Guardian around London 2012 Olympics. Could we make data-driven storytelling easy enough that it does not require a dedicated team of professionals?

Every article on this page includes source code that you can run to reproduce the visualziation. You can edit the code, change parameters and run the code in your browser to explore other interesting facts and share your visualizations.

All Time Olympic Medals Table

Everyone who has been following Olympic Games in London 2012 or Rio 2016 knows that the person with the largest number of medals of all time is Michael Phelps, but do you know who is the second and third? As you can see in the following table, the second person (for summer Olympic games) is Larisa Latynina who won 9 gold medals for Soviet Union between 1956 and 1964 and the third is Finnish runner Paavo Nurmi, also with 9 gold medals from 1920s.

options source

Loading content, please wait...

The table shows the most important facts, but there is a lot more information that you can get from the data if you change options of the visualization or if you change the source code that generates it. Here is a couple of simple things you can try on your own:

  • Find out where each athlete competed — To do this, click on the "options" button. This analyzes the visualization and automatically lets you change some parameters. In the "Group by athlete" table, you can add aggregated attributes for the table. Add "concatenate values of Games" and drop "concatenate values of Teams".

  • Who is the least lucky athlete — Counting gold medals is easy, but who has the largest number of bronze and silver medals? To find out, remove all items from "Sort the data" in "options" and specify your own criteria. Choose "by Bronze descending" to find the person with most bronze medals!

  • Look at medals from London 2012 only — You can find this in an alternative version of the visualization, but to do this on your own, click on "source" and change the second line from olympics.data to olympics.'by game'.'London (2012)'.data. This filters the data to only medals from London 2012. As you type olympics., the editor will let you specify other filters too. You can, for example, look at specific teams rather than specific games.

Olympic Medals Timeline

How has the geographical distribution of medals in Olympic games changed over the last century? In the first Olympic games in 1896, medals were awarded to 11 teams and all were either from Europe or from the United States. The number of teams with medals started growing rapidly after 1980 from 36 teams to 85 teams in 2012. The visualization tracks the number of medals awarded to different countries over time.

options source

Loading content, please wait...

As the visualization shows, the number of different countries winning medals in the Olympic games started growing rapidly after 1980. You can see this visualized in a separate chart. The visualization above is also easily adapted to show medals in different disciplines.

  • To see the timeline for a specific discipline, you can go to "options" and select disciplines you want to include in the first control. This lets you choose one or more disciplines. This will make the bubbles smaller - you can make them bigger by changing the size function in the code (change 0.5 to a bigger number between 0.5 and 2.0).

  • You can also edit the code to show not just specific disciplines, but individual events. For example, to see medals in long-distance running. To do this, you need to change 'by disciplines'.then on the second line to 'by sport' and then choose the sports you want to visualize. You can also use this to see only women Olympic medalists by using olympics.'by gender'.Women.data.

Long-distance Running Medalists

This is a variation on the Olympic Medals Timeline visualization, showing the countries winning Olympic medal in long-distance running (marathon, 10k or 5km, men or women). You can see the dominance of the Flying Finns in 1920s and 1930s. Starting with 1960s, independent African countries start competing in Olympic games and Ethiopian and Kenyan athletes become dominant winners in 1980s and even more so in 1990s.

Number of Teams Winning a Medal

In the early days of Olympic games, much of the world was a part of European colonies and the number of teams winning Olympic medals reflects this. Before the First World War, the maximal number of teams with a medal was 20. In the interwar period, the number grew to 32. Olympic games started becoming more diverse after the Second World War and especially after the end of colonialism. In London 2012, the number of teams winning a medal grew to 85.

A Visual History of the Olympics

In this visualization, we draw a timeline showing the number of medals of the top 5 countries over the entire history of the Olympic games. To find who the top medalists are, you can look at our 'medals by country' table. The visualization is inspired by the fantastic article A Visual History of Which Countries Have Dominated the Summer Olympics by the New York Times. We're drawing the chart using a simple area chart, so it is not as beautiful, but it shows many of the interesting facts:

options source

Loading content, please wait...

The visualizations shows a number of interesting facts that go well beyond the history of Olympic games, but reveal something about the last century:

  • World Wars — As you can see, no medals were awarded in 1916, 1940 and 1944. During the first and second world war, the Olympic games were cancelled.

  • Olympic Boycotts — The United States did not get any medals in 1980 because of the Moscow Olympic games boycott and Soviet Union did not get any medals in 1984 because of the L.A. Olympic games boycott.

To build the visualization (see the "source"), we had to explicitly list the countries to show and also add their names as labels for the chart. This could be a bit easier, but you can still quite easily modify the visualization to see different aspects of the history. Below, you can see one modification which uses the same method to visualize the countries dominating long distance runs. Alternatively, you can choose to compare different countries by modifying the list of countries written as [ .. ]. For example, compare medals by the teams of German, West Germany and East Germany as separate entities!

Countries Dominating Long Distance Runs

This visualization adapts the Visual History of Olympics to only show medals of long distance runs, including the marathon, 10km and 5km runs for both men and women. The visualization is based on the nice article A Visual History of Which Countries Have Dominated the Summer Olympics in the New York Times. The difference is that now you can verify how we processed the, but you can
also add other disciplines to the list!

Medals per Country Table

This visualization creates a simple table showing the countries sorted by the total number of medals. However, it is written in a way that makes it easy to adapt it to sort the countries by medals in a specific discipline. USA has the most medals overall, but what if you instead look at road and track cycling? Or perhaps Circket, which was at Olympic games only in 1900 with exactly one match?

If Michael Phelps Were a Country

Back in 2012, The Guardian put together an amazing table treating Michael Phelps as a country. We can do the same and count the total number of medals per country and total number of medals for Michael Phelps. Sorted by the number of gold medals, Michael Phelps beats for example Belarus and Kazakhstan. And after Rio 2016, probably also Zimbabwe, Nigeria and a few more countries!

options source

Loading content, please wait...

This visualization involves a bit more logic, so it is not as easy to modify, but you can easily change it to look at another athlete or even add multiple athletes. When you look at "options", you will see a number of parameters for both of the parts of the calculation, but the very first one lets you select a different athlete. When you do that, you'll also need to change what range of countries you are selecting. In the code, look for skip(47).take(10). This skips the first 47 countries (who have way more medals than Michael Phelps) and takes the next 10, so that we get a nice chart. You'll need to guess the right number for your favorite athlete.

Aside from showing one athlete, you can also modify the visualization to include multiple athletes. For example, see the visualization If Phelps and Latynina were Countries, which shows a similar chart with the two top athletes when sorted by gold medals.

All Medals of Michael Phelps

Until Rio 2016, Michael Phelps got 22 medals including 18 gold ones. This makes you wonder if he can remember all the medals he got. To make his life easier, we can easily generate a table with all the medals. To do this, we use 'by athlete' and filter all the Olympic medals to look only at Michael Phelps and then we display the result in a table.

Where Phelps Got All His Medals

Another way of looking at the data is to group it by the different games. Michael Phelps competed in 3 different games (excluding Rio 2016) and got 8 medals in Athens 2004, 8 medals in Beijing 2008 and 6 medals in London 2012. How many medals will he get in Rio 2016? We'll update the visualization when we know!

About the data

The ultimate goal of The Gamma project is to make data-driven articles such as the ones presented on this page fully open and reproducible. This means that they should contain all the code needed to obtain the data from the original source. The current version is not quite there yet - it focuses on letting readers reproduce all the computations that were done when building visualizations and also to create and share custom perspectives on the data.

However, you can still get the raw data in a CSV format from the project GitHub. This was obtained by combining data from The Guardian, which has a fantastic data set of medals until 2008 and adding results from 2012 by scraping data from the BBC. If you are interested, you can find the F# source code here (the file also tries to get data from olympic.org, but ironically, this is not nearly as complete as the Guardian table...).

When you run any visualization on this site, it accesses data live from a simple REST service that exposes the raw data and a more sophisticated REST service that implements the grouping operations. The services follow the protocol described here and can be also called from F# via the REST provider.

Visualizations Shared by Users

Code for building all the visualizations can be executed in the web browser and The Gamma project also uses the Monaco editor to let readers edit source code directly in the browser. This means that youu can modify existing visualizations, create new ones and share your results.

Some of the existing articles suggest interesting changes that you can make to the existing visualizations, but you can also just delete the existing code in any visualization and create a new one from scratch. When you then click the "share" button, you can add your visualization to the growing list of visualizations posted by users...