Skip to content

antitoine/ADAEPFL-Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 

Repository files navigation

Applied Data Analysis - EPFL: project about DataSport

See Project presentation website

The source code of the study is in the Study folder:

The source of the website is in the Website folder.

Note: The output of plotly visualizations doesn't work well, please download and run the notebooks if you want to see our entire work

Abstract

This project aims to focus on Lausanne Marathon and, more precisely, on the analysis of the different runs (10km, semi-marathon and marathon).

To do so, we plan to manipulate available data, from 1999 to 2016, and use different information stored about the runners and provided by DataSport.

First part: analysis of Lausanne Marathon of 2016

In the present part, we will study performance of runners according to different parameters, like the sex or the age. We also plan to analyze the behavior of runners in different contexts (speed following the type of running, speed of male and female runners, ...) and see if we can find some statistical differences or some patterns.

Second part: analysis over the years (1999-2016)

This part of the study will focus on all runnings of Lausanne Marathon, between 1999 and 2016.

Among the analysis we plan to do, there will be a presentation of the evolution of the performance following the years. We will also verify if being in group ensure better score/time or not, and possibly, we'll check if, for identified groups, we can identify an improvement of performance over the years.

We also plan to conduct basic analysis as in the first part.

Finally, and if possible, we'll check the influence of the weather.

Third part: analysis of some runners

In the third and last part, we will select sufficient number of runners to conduct statistical analysis, like the evolution of individual performance (type of running chosen, average speed, etc.).

Data description

This project will use data from the www.datasport.com website, containing sport performance of athletes over time.

The website contains some data of running race from the entire Switzerland. Moreover the website contains data throughout the year, it’s going to be useful for studying the evolution of sport in our society and specially the running whose grows exponentially the last decade.

Feasibility and risks

Feasibility

  • Gather the data by screen scraping the website (see homework 2)
  • Use visualization tools to show the evolution of sport in our society throughout years (see homework 3)
  • Statistic study of differents effects on the performance (the relief, the weather, the season of the years)

### Difficulties

  • Differents format of the data
  • Size of data
  • The language of the data (some data are in German, some other in French…)
  • Get the weather of the running session
  • Building visualisation (with unknown tools)
  • Interpret correctly the results and build reliable statistics

Deliverables

The deliverable is going to be a website exposing our result. We are going to focus on visualization of the different data throughout the years.

We should focus our study on multiple points:

  • General overview of the running in Switzerland
  • The demographic (maybe a competition between Swiss people and strangers)
  • Analysis of results (maybe follow some people throws the years)
  • Analysis focused on teams (see the effect on the performance if people compete alone or in team)
  • Measure the effect of the relief and how people can react to it (oxygen missing, etc.) (maybe Map Visualization)
  • Focus on the running through the years (youtu.be/jbkSRLYSojo)

Timeplan

We plan to schedule our project on the following big steps:

  • Gathering data and wrangling (2 month)
  • Key point study (1 month)
  • Visualization / Website (3 month)

Posters

Posters used during presentation of project are available in Doc folder.