Skip to content

pohjois-savon-tietoallas/dbricks_test

Repository files navigation

Databricks big data analysis test project and example.

This project includes RStudio + Databricks examples. You can run either RStudio in databricks cluster or use desktop RStudio and databricks connect. Third option is to have a RStudio server which has connection to databricks, but this is not tested out yet.

Guides

  • rstudio_cluster_sparklyr.Rmd how to use sparklyr command in databricks environment

  • rstudio_cluster_sparklyr_text-mining.Rmd how to do text mining in databricks environment

  • rstudio_desktop_sparkr.Rmd how to use sparkr package in rstudio desktop + databricks connect

How to clone project in databricks RStudio cluster

  1. Start RStudio Cluster in databricks
  1. Identify yourself in terminal for git
git config --global user.email "[email protected]"
git config --global user.name "User Name"
  1. Get access token from Github and add it to databricks https://docs.databricks.com/notebooks/github-version-control.html.

  2. Clone the project


Material edited from project: https://github.com/rstudio-conf-2020/big-data

This work is licensed under a Creative Commons Attribution 4.0 International License.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages