A More Appropriate Look at the Olympic Medal Data

Here we can see a new look at the Olympic medal data we began last week. I had an initial hypothesis that the more winning countries would have a higher percentage of Summer Olympic medals while the less winning countries would have a higher percentage of Winter Olympic medals. This hypothesis was based on the idea that the Summer Olympics are more popular and, therefore, would expect the more winning countries to thrive in these events in comparison to the winter sports. In addition, here in the US, we are a bit spoiled as we have are among the most winning countries in the world. My thought was that perhaps the Winter Olympics would be more popular in smaller and colder countries. Since there are less events in the Winter Olympics, maybe the data would show that there are countries that are less discussed that would show more dominance in winter sports than summer sports. 

My initial go at analyzing this data was a bit misleading. There are clear outliers in the data as there are some countries with very few total medals and countries with an abundance of medals. Initially, I arbitrarily chose at least 500 medals as a cutoff to be assessed in the data. However, this cuts a substantial amount of the data. Instead, I utilized an interquartile range (IQR). In other words, I divided the data in a way that was more systematic and organized. Dividing the data into quarters allows us to see the density of values. In other words, an interquartile range shows us the most influential data among the dataset. To find the IQR, we have to know the total number of data points, sort them, and separate them into quarters. Based on these splits, we can better understand the distribution of the data.


After analyzing the data, it turns out that the more winning countries have the higher percentage of winter medals in comparison to less winning countries. It is clear that most of the countries have more summer medals than winter medals. However, the distribution between summer and winter medals is very similar between the more winning and less winning countries. 

Another concern that I have with my data is the formulation of my pie charts. When creating these charts, I took the average among the data in the lower end of the IQR (25-50%) and the data in the higher end of the IQR (50-75%). I am unsure if this was the appropriate approach. In addition, perhaps titling these groups as more winning and less winning could be misinterpreted to mean the most winning and least winning. In other words, the top 10 and the bottom 10 were not included in the data since we utilized the IQR to assess the data. However, I find that this approach allows us to analyze datasets as cleanly as possible without outliers acting as skewing factors 

Comments

  1. Hi Kevin,

    This is really impressive. I'm not an expert in statistical methods but your IQR method is really creative. It's a little difficult for me to understand as I'm not super versed in statistics. So it appears that the data does not support your original claim? Also why was the averaging method insufficient? IQR intuitively seems better as you can group the data more effectively to spot the trends more clearly. Excellent graphs. I love how the pi chart looks like pac man (Really I do!)!

    Best,

    Mr. Baldi 2021

    ReplyDelete

Post a Comment

Popular Posts