Skip to content

Latest commit

 

History

History
141 lines (101 loc) · 14.4 KB

c.-Data-Science.md

File metadata and controls

141 lines (101 loc) · 14.4 KB
Buy me a coffeeBuy me a coffee or Venmo me (@JoscelinRocha)

I separate this section from the statistics one because when I needed to learn how to run statistics in R, I would feel frustrated when the documents I found were not directly telling me about specific tests I need to learn about. It may not be generalizable but it makes sense to me now. Sorry!

What is this?.
Excerpt from ebook: Over the course of this book, you will develop your “data science toolbox,” equipping yourself with tools such as data visualization, data formatting, data wrangling, and data modeling using regression.

In particular, this book will lean heavily on data visualization. In today’s world, we are bombarded with graphics that attempt to convey ideas. We will explore what makes a good graphic and what the standard ways are used to convey relationships within data. In general, we’ll use visualization as a way of building almost all of the ideas in this book.

  1. Link for free ebook here: https://moderndive.com/
  2. Link to repo here: https://github.com/moderndive/ModernDive_book
  3. Links to buy it here: Amazon or CRC Press using promo code ASA18 for a discounted price.

Added Sep 12th, 2020
What is this?
This course is designed for PhD students at Johns Hopkins Bloomberg School of Public Health. We are usually pretty flexible about permitting outside students but we want everyone to be aware of the goals and assumptions so no one feels like they are surprised by how the class works.

The primary goal of the course is to teach you how to deconstruct, perform, and communicate professional data analyses across diverse media.

The goal is to help you to organize your thinking around how to combine the things you have learned about statistics, data manipulation, and visualization into complete data analyses that answer important questions about the world around you.

  1. Link to ebook here: http://jtleek.com/ads2020/

What is this?
Excerpt from ebook: This book will cover several of the statistical concepts and data analytic skills needed to succeed in data-driven life science research. We go from relatively basic concepts related to computing p-values to advanced topics related to analyzing high-throughput data. While statistics textbooks focus on mathematics, this book focuses on using a computer to perform data analysis. Instead of explaining the mathematics and theory, and then showing examples, we start by stating a practical data-related challenge. This book also includes the computer code that provides a solution to the problem and helps illustrate the concepts behind the solution.

  1. Link to ebook: https://leanpub.com/dataanalysisforthelifesciences

What is this?
Excerpt from site: This is a graduate economics seminar taught by Grant McDermott at the University of Oregon.

Please read the syllabus before you go through any of the lectures. This will detail software requirements and installation, and give you a better sense of the aims and scope of the course. I also have an "FAQ" section at the end that covers frequently asked questions (or, at least, potentially asked questions). Speaking of which, here follow answers to some questions that are more specifically related to this repo.

  1. Link to lectures repo: https://github.com/uo-ec607/lectures#data-science-for-economists

What is it?
Excerpt from site: The core content of the course focuses on data acquisition and wrangling, exploratory data analysis, data visualization, inference, modelling, and effective communication of results. Time permitting, the course also introduces additional concepts and tools like interactive visualization and reporting, text analysis, and Bayesian inference.

  1. Link to site: https://datasciencebox.org/
  2. Link to repo: https://github.com/rstudio-education/datascience-box

Added on Apr 15th, 2021.
What is it?
Excerpt from site: We wrote this book assuming you’re at the start of your journey learning R and using data science in your education job. The book takes you from installing R to practicing more advanced data science skills like text analysis.

If you’ve never written a line of R code, we welcome you to the community! We wrote this book for you. Consider reading the book cover to cover and doing all the analysis walkthroughs. Remember that you’ll get more from a few minutes of practice every day than you will from long hours of practice every once in awhile. Typing code every day, even if it doesn’t always run, is a daily practice that invites learning and “a-ha” moments.

  1. Link to e-book: https://datascienceineducation.com/

What is this?
Data Science course by the wonderful Danielle Navarro.

  1. Link to e-course here: http://robust-tools.djnavarro.net/

What is it?
Excerpt from site: This course provides an overview of skills needed for reproducible research and open science using the statistical programming language R. Students will learn about data visualisation, data tidying and wrangling, archiving, iteration and functions, probability and data simulations, general linear models, and reproducible workflows. Learning is reinforced through weekly assignments that involve working with different types of data.

  1. Course here: https://psyteachr.github.io/msc-data-skills/

What is this?
Excerpt from e-booK: This book covers the essential exploratory techniques for summarizing data with R. These techniques are typically applied before formal modeling commences and can help inform the development of more complex statistical models. Exploratory techniques are also important for eliminating or sharpening potential hypotheses about the world that can be addressed by the data you have. We will cover in detail the plotting systems in R as well as some of the basic principles of constructing informative data graphics. We will also cover some of the common multivariate statistical techniques used to visualize high-dimensional data.

  1. https://bookdown.org/rdpeng/exdata/

What is this?
Excerpt from site: this vignette aims to work with the following three questions, using the tools developed in naniar and another package, visdat. Namely, how do we: Start looking at missing data?, Explore missingness mechanisms?, Model missingness?

  1. Link to site here: https://cran.r-project.org/web/packages/naniar/vignettes/getting-started-w-naniar.html

What is this
Excerpt from site: Gain experience in data collection, wrangling, and visualization, exploratory data analysis, predictive modeling, and effective communication of results while working on problems and case studies inspired by and based on real-world questions. The course will focus on the R statistical computing language.

  1. Link to e-course here: https://introds.org/

What is this
Excerpt from e-book: This book introduces concepts from probability, statistical inference, linear regression and machine learning and R programming skills.

  1. Link to e-book here: https://leanpub.com/datasciencebook

What is this?
Excerpt from ebook: This chapter presented Spark as a modern and powerful computing platform, R as an easy-to-use computing language with solid foundations in statistical methods, and sparklyr as a project bridging both technologies and communities. In a world in which the total amount of information is growing exponentially, learning how to analyze data at scale will help you to tackle the problems and opportunities humanity is facing today. However, before we start analyzing data, Chapter 2 will equip you with the tools you will need throughout the rest of this book. Be sure to follow each step carefully and take the time to install the recommended tools, which we hope will become familiar resources that you use and love.

  1. Link to free ebook: https://therinspark.com/

What is this?
Excerpt from site: This is the website for “R for Data Science”. This book will teach you how to do data science with R: You’ll learn how to get your data into R, get it into the most useful structure, transform it, visualize it and model it. In this book, you will find a practicum of skills for data science. Just as a chemist learns how to clean test tubes and stock a lab, you’ll learn how to clean data and draw plots—and many other things besides. These are the skills that allow data science to happen, and here you will find the best practices for doing each of these things with R. You’ll learn how to use the grammar of graphics, literate programming, and reproducible research to save time. You’ll also learn how to manage cognitive resources to facilitate discoveries when wrangling, visualising, and exploring data.

  1. Web-book here: https://r4ds.had.co.nz/
  2. Want to buy it?: Amazon Link
  3. What to give back? you can donate here: https://www.doc.govt.nz/kakapo-donate

What is this?
Excerpt from site: This book is about the fundamentals of R programming. You will get started with the basics of the language, learn how to manipulate datasets, how to write functions, and how to debug and optimize code. With the fundamentals provided in this book, you will have a solid foundation on which to build your data science toolbox.

  1. Web-book here: https://bookdown.org/rdpeng/rprogdatascience/

Added Fri Oct 8th, 2021
What is this
One day Workshop that includes the following topics: Motivations, Manipulating data in the tidyverse, Visualising data in the tidyverse, Writing dynamic and reproducible documents with R Markdown, Versioning with Git and GitHub in RStudio, and Take-home messages.

  1. Link to materials and videos here: https://oliviergimenez.github.io/reproducible-science-workshop/

Added Thu Dec 31st, 2020
What is this
Excerpt from e-book: Here’s an intro about why R is great and the cool things you can do with it and new problems you can address.

  1. Link to e-book here: https://www.sds.pub/index.html

What is this?
Excerpt from ebook: This book serves as an introduction of text mining using the tidytext package and other tidy tools in R. The functions provided by the tidytext package are relatively simple; what is important are the possible applications. Thus, this book provides compelling examples of real text mining problems.

  1. Link to free ebook here: https://www.tidytextmining.com/
  2. Buy the book here: Amazon
  3. Link to repo here: https://github.com/dgrtwo/tidy-text-mining

What is this?
Excerpt from site: You know R, especially the dplyr 📦. Even though the dplyr 📦 is so well written to mimic the SQL syntax - select(), group_by(), left_join() etc. there is still a cognitive load when you switch between using R syntax, and SQL syntax (ask me, who has often written == in SQL syntax on Athena only to wonder why I am getting an error 🤐).

You only have so much memory in your local environment, and may want your RDBMS to do the heavy lifting (most of the computation), and only pull data into R when you need to (e.g. pull in aggregated data to create plots for a report).

In this tutorial you will learn how to use dbplyr, which is a database back-end of dplyr, to execute queries directly in your RDBMS all the while writing R tidyverse syntax 😮 ⭐.

  1. Blog Part 1 here: https://sciencificity-blog.netlify.app/posts/2020-12-12-using-the-tidyverse-with-databases/
  2. Blog Part 2 here: https://sciencificity-blog.netlify.app/posts/2020-12-20-using-the-tidyverse-with-dbs-partii/