Development of data science and AI becomes easier than ever before thanks to cloud computing. The Github repo site collects a set of R packages, tools, and case-studies for doing R data science on Azure cloud.
These packages and tools are categoried into four groups, representing four typical tasks data scientists or AI developers may frequently work on.
Category | Features |
---|---|
Cloud resource operation and administration | Simplify the way to interact with Azure cloud platform and operate resouces on Azure for various tasks. |
Scalable and advanced analytics | Enable large-scale and parallel data analytics in R environment. |
Remote interaction and access to Cloud instance | Enhance work efficiency on cloud for R based analytics. |
Application and service deployment | Make operationalizing solution and deploying it as service easy. |
R packages and tools in this category are featured by offering a simplified way to interact with Azure cloud platform and operate resouces (e.g., blob storage, Data Science Virtual Machine, Azure Batch Service, etc.) on Azure for various tasks.
- AzureSMR - R package for managing a selection of Azure resources. Targeted at Data Scientists who need to control Azure Resources without needing to both Administrators. APIs include Storage Blobs, HDInsight(Nodes, Hive, Spark), ARM, VMs.
- AzureDSVM - R package that offers convenient harness of Azure DSVM, remote execution of scalable and elastic data science work, and monitoring of on-demand resource consumption.
- doAzureParallel - R package that allows users to submit parallel workloads in Azure.
- rAzureBatch - a HTTP proxy library written in R for Azure.
- AzureML - an R interface to AzureML experiments, datasets, and web services.
- AzureR - Family of packages for interacting with Azure from R
- AzureRMR - Base functionality for Azure Resource Management: authenticate, get subscriptions, get resource groups.
- AzureAuth - R package for OAuth 2.0 authentication with Azure Active Directory
- AzureKeyVault - R interface to Azure Key Vault
- AzureKusto - R interface to Azure Data Explorer, aka Kusto
- AzureStor - R package for Azure Storage management.
- AzureContainers - R supports for container related services on Azure, that is, Azure Container Instances, Azure Kubernetes Services, and Azure Container Registry.
- AzureGraph - a simple interface to the Microsoft Graph API.
- AzureVM - R package for managing virtual machines in Azure
R packages and tools in this category allow one to performan large-scale R-based analytics on cloud with the bleeding-edge frameworks such as Spark, Hadoop, Microsoft Cognitive Toolkit, Tensorflow, Keras, etc. NOTE: many of the tools are pre-installed and configured for direct use on Azure Data Science Virtual Machine.
- dplyrXdf - a dplyr backend for Revolution Analytics xdf files.
- sparklyr - R interface for Apache Spark.
- SparkR - SparkR is an R package that provides a light-weight frontend to use Apache Spark from R.
- CNTK-R - R bindings to the CNTK library.
- tensorflow - R interface to Tensorflow.
- mxnet - The MXNet R package brings flexible and efficient GPU computing and state-of-art deep learning to R.
- keras - R interface to Keras.
- darch - Create deep architectures in R.
- deepnet - Implement some deep learning architectures and neural network algorithms, including BP,RBM,DBN,Deep autoencoder and so on.
- gpuR - R interface to use GPU.
- RevoScaleR - a collection of portable, scalable, and distributable R functions for importing, transforming, and analyzing data at scale.
- MicrosoftML - a package that provides state-of-the-art fast, scalable machine learning algorithms and transforms for R.
- h2o - R interface to H2O.
The R packages and tools in this category help data scientists or developers to easily remote access or interact with Azure cloud instances or services for convenient development.
- mrsdeploy - an R package that provides functions for establishing a remote session in a console application and for publishing and managing a web service that is backed by the R code block or script you provided.
- R Tools for Visual Studio - IDE with R support.
- RStudio Server - IDE for remote R session with access via Internet browser.
- JupterHub - Jupyter notebook with multi-user access.
- IRKernel - R kernel for Jupyter notebook.
The R packages and tools in this category are used for deploying an R-based analytics or applicaiton as services or interfaces that can be conveniently consumed by end-users or developers.
- mrsdeploy - an R package that provides functions for deploying easily-consumable service within R session.
- AzureML - an R package to allow one to interact with Azure Machine Learning Studio for publishing R functions as API services.
- Azure Container Instances - Azure service to allow running a containerized R analytics on cloud.
- Azure Container Services - Azure service that simplifies deployment, management, and operation of orchestrated containers of R analytics.
- Shiny server - Develop and publish Shiny based web applications online.
The real-world use cases below show case Azure cloud-based analytical solutions that involve the aforementioned R packages or tools.
Use case | Key R packages or tools |
---|---|
Campaign management | RevoScaleR, RTVS/RStudio |
Customer churn prediction | RevoScaleR, MicrosoftML, RTVS/RStudio |
Energy demand forecasting | RevoScaleR, MicrosoftML, RTVS/RStudio |
Fraud detection | RevoScaleR, RTVS/RStudio |
Galaxies classification | RevoScaleR, mrsdeploy, MicrosoftML, RTVS/RStudio |
Performance test tuning | RevoScaleR, RTVS/RStudio |
Predictive maintenance | RevoScaleR, RTVS/RStudio |
Retail forecasting | RevoScaleR, RTVS/RStudio |
Credit risk scoring | MicrosoftML, mrsdeploy, Shiny, RTVS/RStudio |
Drop-out prediction | MicrosoftML, Jupyter Notebook |
Product demand forecasting | RevoScaleR, RTVS/RStudio |
Solar panel forecasting | AzureSMR, AzureDSVM, keras, RTVS/RStudio |
Employee attrition prediction | AzureSMR, AzureDSVM, Azure Container Services, Shiny, RTVS/RStudio |
Flight delay prediction | AzureSMR, AzureDSVM, MicrosoftML, SparkR, RTVS/RStudio |
Monte Carlo price simulation | doAzureParallel, RTVS/RStudio |