Skip to content

Methods workshop at the Africalics 2018 PhD Academy, Marrakesh

Notifications You must be signed in to change notification settings

RJuro/Africalics-PhD-Academy-2018

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

56 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Africalics-PhD-Academy-2018

Methods workshop at the Africalics 2018 PhD Academy, Marrakesh

Dr. Daniel S. Hain, [email protected] Dr. Roman Jurowetzki, [email protected]

Aalborg University, Denmark


In this repository, you will find all notebooks, presentations and materials from the workshop. We will also use it to link to the Kaggle kernels that will be used in the interactive exercises.

Please register on kaggle.com to follow the exercises.

In this workshop, we will not teach you one particular trending method or approach but rather introduce to Data Science as a field and its approach to working with data.

Sure, we can only do so much in 3 days, and therefore we tried to find a good balance of broad overview and specific applications.

Hopefully, this will give you a good foundation or at least starting point to learn more. Today, it is really easy to find excellent resources and get skilled at sophistic analytical techniques. But, you need to know what to look for and how all the different things out there relate to each other.

kdn

While for several reasons – mostly path dependancy – the innovation studies (and general social science) community are relying on expensive proprietory packages (e.g. SPSS, Stata, SAS or EViews), the people that work with Big Data analytics are working with R and/or Python. We decided not to focus on just one language but will present you both so you can decide which one you find most approachable.

Below you will find links to the different things presented during the workshop. We will update this repository during and after the workshop.


Useful resources

Bibliometrics

Freshp aplication paper by Daniel, mapping the "Innovation Systems" research field. Gives a good intro in methodology, also maps the whole IS field (might be handy for you):

Rakas, M and Hain, D.S. (under review). "Innovation System Research: Where It Came From, and What It Is Now – A Bibliometric Network Analysis" https://www.sciencedirect.com/science/article/pii/S0048733319301027 --> working paper also available open access here

Vosviewer Easy software for bibliometrics

Citespace More complex bibliometrix software including geospacial features and mapping.

Papers on Machine Learning & Applied Econometrics

Athey, S., & Imbens, G. W. (2017). "The state of applied econometrics: Causality and policy evaluation". Journal of Economic Perspectives, 31(2), 3-32. --> Available open access here

Athey, S. (forthcoming). "The Impact of Machine Learning on Economics", in The Economics of Artificial Intelligence: An Agenda. University of Chicago Press, January 2018 --> Available open access here

Varian, H. R. (2014). "Big data: New tricks for econometrics". Journal of Economic Perspectives, 28(2), 3-28. --> Available open access here

Einav, L., & Levin, J. (2014). "The data revolution and economic analysis". Innovation Policy and the Economy, 14(1), 1-24. --> Available open access here

Mullainathan, S., & Spiess, J. (2017). "Machine learning: an applied econometric approach". Journal of Economic Perspectives, 31(2), 87-106. --> Available open access here

Demonstration

MIT Startup Cartography Project: Initiative by Scott Stern (MIT) using Machine Leaning for real-time forecast of entrepreneurial activity and quality.

AAU Global Patent Explorer: Our own project on mapping patent quality and knowledge flows across the globe. Just recently won the 1st OECD "Big Data Analytics" challange.

Courses

Datacamp Online courses. Intro to R, Python, Github, Excel and Sheets are free Recommended courses:

  • R basics: "Introduction to R" (free course)
  • R unsupervised ML: "Unsupervised Learning in R" (chapter 1 free)
  • R Supervised ML: "The Machine Learning Toolbox" (chapter 1 free, introduces to the caret train-control workflow)
  • R Data visualization: "Data Visualization with ggplot2 (Part 1)" (chapter 1 free)
  • Datacamp Tutorials: Free R & Python Tutorials for specific problems and methods

Dataquest Similar to datacamp. Python focused. Also more advanced courses on data engineering

Open Data Science Masters Curriculum Collection of free online resources on all kinds of Data Science topics.

Data and scripts from the ML A-Z course from Udemy R and Python scripts from the course including the course data. The course can be found on Udemy and is usually available for around 12USD.

Software

Installing R on your machine

Installing the RStudio IDE on your machine

Installing Python on Windows

Installing Python on Mac

Network analysis and visualization software

Help

Stackoverflow: Programming help & advice forum

Others

Informative podcast about professional analytics

R-Bloggers:R news and tutorials


Day 1: Intro and Data Preprocessing

Session 1 - Introduction

Slides

Notebook - Adittional infos, will not be covered

Session 2 - Data Munging

Notebook (read-only version)

Notebook (Kaggle) executable

Day 2: Unsupervised and Supervised Machine Learning

Session 3 - Unsupervised Machine Learning

Notebook (read-only version)

Notebook (Kaggle) executable

Notebook (Kaggle) executable: Exercise in Python - Exploring the nomadlist.io city dataset

Session 4 - Supervised Machine Learning

Notebook (read-only version)

Notebook (Kaggle) executable

Notebook (Kaggle) executable: Exercise in Python - Predicting Italian Wines

Day 3: Natural Language Processing

Session 5 - Natural Language Processing

Notebook (read-only version): Intro: Natrual Language Processing

Notebook (Kaggle) executable: Intro: Natrual Language Processing

Notebook (Kaggle) executable: Exercise in Python - Natrual Language Processin

About

Methods workshop at the Africalics 2018 PhD Academy, Marrakesh

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages