I noticed that though this is site has not been activey maintained by myself for a while there are still people starred/forked the repository. Gladly the repository content provides help to these people. I will keep an eye on the repository - for all the readers and/or users of the repository, please feel free to make any contributions to update the listed resources and references whenever you feel necessary. Thanks!
Development of data science and AI becomes easier than ever before thanks to cloud computing. The Github repo site collects a set of R packages, tools, and case-studies for doing R data science on Azure cloud.
These packages and tools are categoried into four groups, representing four typical tasks data scientists or AI developers may frequently work on.
Category | Features |
---|---|
Cloud resource operation and administration | Simplify the way to interact with Azure cloud platform and operate resouces on Azure for various tasks. |
Scalable and advanced analytics | Enable large-scale and parallel data analytics in R environment. |
Remote interaction and access to Cloud instance | Enhance work efficiency on cloud for R based analytics. |
Application and service deployment | Make operationalizing solution and deploying it as service easy. |
R packages and tools in this category are featured by offering a simplified way to interact with Azure cloud platform and operate resouces (e.g., blob storage, Data Science Virtual Machine, Azure Batch Service, etc.) on Azure for various tasks.
- AzureSMR - R package for managing a selection of Azure resources. Targeted at Data Scientists who need to control Azure Resources without needing to both Administrators. APIs include Storage Blobs, HDInsight(Nodes, Hive, Spark), ARM, VMs.
- AzureDSVM - R package that offers convenient harness of Azure DSVM, remote execution of scalable and elastic data science work, and monitoring of on-demand resource consumption.
- doAzureParallel - R package that allows users to submit parallel workloads in Azure.
- rAzureBatch - a HTTP proxy library written in R for Azure.
- AzureML - an R interface to AzureML experiments, datasets, and web services.
- AzureR - Family of packages for interacting with Azure from R
- AzureRMR - Base functionality for Azure Resource Management: authenticate, get subscriptions, get resource groups.
- AzureAuth - R package for OAuth 2.0 authentication with Azure Active Directory
- AzureKeyVault - R interface to Azure Key Vault
- AzureKusto - R interface to Azure Data Explorer, aka Kusto
- AzureStor - R package for Azure Storage management.
- AzureContainers - R supports for container related services on Azure, that is, Azure Container Instances, Azure Kubernetes Services, and Azure Container Registry.
- AzureGraph - a simple interface to the Microsoft Graph API.
- AzureVM - R package for managing virtual machines in Azure
R packages and tools in this category allow one to performan large-scale R-based analytics on cloud with the bleeding-edge frameworks such as Spark, Hadoop, Microsoft Cognitive Toolkit, Tensorflow, Keras, etc. NOTE: many of the tools are pre-installed and configured for direct use on Azure Data Science Virtual Machine.
- dplyrXdf - a dplyr backend for Revolution Analytics xdf files.
- sparklyr - R interface for Apache Spark.
- SparkR - SparkR is an R package that provides a light-weight frontend to use Apache Spark from R.
- CNTK-R - R bindings to the CNTK library.
- tensorflow - R interface to Tensorflow.
- mxnet - The MXNet R package brings flexible and efficient GPU computing and state-of-art deep learning to R.
- keras - R interface to Keras.
- darch - Create deep architectures in R.
- deepnet - Implement some deep learning architectures and neural network algorithms, including BP,RBM,DBN,Deep autoencoder and so on.
- gpuR - R interface to use GPU.
- RevoScaleR - a collection of portable, scalable, and distributable R functions for importing, transforming, and analyzing data at scale.
- MicrosoftML - a package that provides state-of-the-art fast, scalable machine learning algorithms and transforms for R.
- h2o - R interface to H2O.
The R packages and tools in this category help data scientists or developers to easily remote access or interact with Azure cloud instances or services for convenient development.
- mrsdeploy - an R package that provides functions for establishing a remote session in a console application and for publishing and managing a web service that is backed by the R code block or script you provided.
- R Tools for Visual Studio - IDE with R support.
- RStudio Server - IDE for remote R session with access via Internet browser.
- JupterHub - Jupyter notebook with multi-user access.
- IRKernel - R kernel for Jupyter notebook.
The R packages and tools in this category are used for deploying an R-based analytics or applicaiton as services or interfaces that can be conveniently consumed by end-users or developers.
- mrsdeploy - an R package that provides functions for deploying easily-consumable service within R session.
- AzureML - an R package to allow one to interact with Azure Machine Learning Studio for publishing R functions as API services.
- Azure Container Instances - Azure service to allow running a containerized R analytics on cloud.
- Azure Container Services - Azure service that simplifies deployment, management, and operation of orchestrated containers of R analytics.
- Shiny server - Develop and publish Shiny based web applications online.
The real-world use cases below show case Azure cloud-based analytical solutions that involve the aforementioned R packages or tools.
Use case | Key R packages or tools |
---|---|
Campaign management | RevoScaleR, RTVS/RStudio |
Customer churn prediction | RevoScaleR, MicrosoftML, RTVS/RStudio |
Energy demand forecasting | RevoScaleR, MicrosoftML, RTVS/RStudio |
Fraud detection | RevoScaleR, RTVS/RStudio |
Galaxies classification | RevoScaleR, mrsdeploy, MicrosoftML, RTVS/RStudio |
Performance test tuning | RevoScaleR, RTVS/RStudio |
Predictive maintenance | RevoScaleR, RTVS/RStudio |
Retail forecasting | RevoScaleR, RTVS/RStudio |
Credit risk scoring | MicrosoftML, mrsdeploy, Shiny, RTVS/RStudio |
Drop-out prediction | MicrosoftML, Jupyter Notebook |
Product demand forecasting | RevoScaleR, RTVS/RStudio |
Solar panel forecasting | AzureSMR, AzureDSVM, keras, RTVS/RStudio |
Employee attrition prediction | AzureSMR, AzureDSVM, Azure Container Services, Shiny, RTVS/RStudio |
Flight delay prediction | AzureSMR, AzureDSVM, MicrosoftML, SparkR, RTVS/RStudio |
Monte Carlo price simulation | doAzureParallel, RTVS/RStudio |