Skip to content

Commit 2d41c00

Browse files
committed
version 1.0.39
1 parent 8b1bffc commit 2d41c00

File tree

76 files changed

+4428
-3717
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

76 files changed

+4428
-3717
lines changed

NBSETUP.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -24,8 +24,8 @@ pip install azureml-sdk
2424
git clone https://github.com/Azure/MachineLearningNotebooks.git
2525

2626
# below steps are optional
27-
# install the base SDK and a Jupyter notebook server
28-
pip install azureml-sdk[notebooks]
27+
# install the base SDK, Jupyter notebook server and tensorboard
28+
pip install azureml-sdk[notebooks,tensorboard]
2929

3030
# install model explainability component
3131
pip install azureml-sdk[explain]

README.md

Lines changed: 3 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -11,8 +11,7 @@ pip install azureml-sdk
1111
Read more detailed instructions on [how to set up your environment](./NBSETUP.md) using Azure Notebook service, your own Jupyter notebook server, or Docker.
1212

1313
## How to navigate and use the example notebooks?
14-
If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, go through the [Configuration](./configuration.ipynb) notebook first if you haven't already to establish your connection to the AzureML Workspace.
15-
It configures your notebook library to connect to an Azure Machine Learning workspace, and sets up your workspace and compute to be used by many of the other examples.
14+
If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, you should always run the [Configuration](./configuration.ipynb) notebook first when setting up a notebook library on a new machine or in a new environment. It configures your notebook library to connect to an Azure Machine Learning workspace, and sets up your workspace and compute to be used by many of the other examples.
1615

1716
If you want to...
1817

@@ -21,7 +20,7 @@ If you want to...
2120
* ...learn about experimentation and tracking run history, first [train within Notebook](./how-to-use-azureml/training/train-within-notebook/train-within-notebook.ipynb), then try [training on remote VM](./how-to-use-azureml/training/train-on-remote-vm/train-on-remote-vm.ipynb) and [using logging APIs](./how-to-use-azureml/training/logging-api/logging-api.ipynb).
2221
* ...train deep learning models at scale, first learn about [Machine Learning Compute](./how-to-use-azureml/training/train-on-amlcompute/train-on-amlcompute.ipynb), and then try [distributed hyperparameter tuning](./how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-pytorch/train-hyperparameter-tune-deploy-with-pytorch.ipynb) and [distributed training](./how-to-use-azureml/training-with-deep-learning/distributed-pytorch-with-horovod/distributed-pytorch-with-horovod.ipynb).
2322
* ...deploy models as a realtime scoring service, first learn the basics by [training within Notebook and deploying to Azure Container Instance](./how-to-use-azureml/training/train-within-notebook/train-within-notebook.ipynb), then learn how to [register and manage models, and create Docker images](./how-to-use-azureml/deployment/register-model-create-image-deploy-service/register-model-create-image-deploy-service.ipynb), and [production deploy models on Azure Kubernetes Cluster](./how-to-use-azureml/deployment/production-deploy-to-aks/production-deploy-to-aks.ipynb).
24-
* ...deploy models as a batch scoring service, first [train a model within Notebook](./how-to-use-azureml/training/train-within-notebook/train-within-notebook.ipynb), learn how to [register and manage models](./how-to-use-azureml/deployment/register-model-create-image-deploy-service/register-model-create-image-deploy-service.ipynb), then [create Machine Learning Compute for scoring compute](./how-to-use-azureml/training/train-on-amlcompute/train-on-amlcompute.ipynb), and [use Machine Learning Pipelines to deploy your model](./how-to-use-azureml/machine-learning-pipelines/pipeline-mpi-batch-prediction.ipynb).
23+
* ...deploy models as a batch scoring service, first [train a model within Notebook](./how-to-use-azureml/training/train-within-notebook/train-within-notebook.ipynb), learn how to [register and manage models](./how-to-use-azureml/deployment/register-model-create-image-deploy-service/register-model-create-image-deploy-service.ipynb), then [create Machine Learning Compute for scoring compute](./how-to-use-azureml/training/train-on-amlcompute/train-on-amlcompute.ipynb), and [use Machine Learning Pipelines to deploy your model](https://aka.ms/pl-batch-scoring).
2524
* ...monitor your deployed models, learn about using [App Insights](./how-to-use-azureml/deployment/enable-app-insights-in-production-service/enable-app-insights-in-production-service.ipynb) and [model data collection](./how-to-use-azureml/deployment/enable-data-collection-for-models-in-aks/enable-data-collection-for-models-in-aks.ipynb).
2625

2726
## Tutorials
@@ -55,9 +54,5 @@ Visit following repos to see projects contributed by Azure ML users:
5554

5655
- [Fine tune natural language processing models using Azure Machine Learning service](https://github.com/Microsoft/AzureML-BERT)
5756
- [Fashion MNIST with Azure ML SDK](https://github.com/amynic/azureml-sdk-fashion)
58-
59-
60-
![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/README.png)
6157

62-
63-
58+
![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/README.png)

configuration.ipynb

Lines changed: 0 additions & 92 deletions
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,6 @@
3232
" 1. Workspace parameters\n",
3333
" 1. Access your workspace\n",
3434
" 1. Create a new workspace\n",
35-
" 1. Create compute resources\n",
3635
"1. [Next steps](#Next%20steps)\n",
3736
"\n",
3837
"---\n",
@@ -235,97 +234,6 @@
235234
"ws.write_config()"
236235
]
237236
},
238-
{
239-
"cell_type": "markdown",
240-
"metadata": {},
241-
"source": [
242-
"### Create compute resources for your training experiments\n",
243-
"\n",
244-
"Many of the sample notebooks use Azure ML managed compute (AmlCompute) to train models using a dynamically scalable pool of compute. In this section you will create default compute clusters for use by the other notebooks and any other operations you choose.\n",
245-
"\n",
246-
"To create a cluster, you need to specify a compute configuration that specifies the type of machine to be used and the scalability behaviors. Then you choose a name for the cluster that is unique within the workspace that can be used to address the cluster later.\n",
247-
"\n",
248-
"The cluster parameters are:\n",
249-
"* vm_size - this describes the virtual machine type and size used in the cluster. All machines in the cluster are the same type. You can get the list of vm sizes available in your region by using the CLI command\n",
250-
"\n",
251-
"```shell\n",
252-
"az vm list-skus -o tsv\n",
253-
"```\n",
254-
"* min_nodes - this sets the minimum size of the cluster. If you set the minimum to 0 the cluster will shut down all nodes while note in use. Setting this number to a value higher than 0 will allow for faster start-up times, but you will also be billed when the cluster is not in use.\n",
255-
"* max_nodes - this sets the maximum size of the cluster. Setting this to a larger number allows for more concurrency and a greater distributed processing of scale-out jobs.\n",
256-
"\n",
257-
"\n",
258-
"To create a **CPU** cluster now, run the cell below. The autoscale settings mean that the cluster will scale down to 0 nodes when inactive and up to 4 nodes when busy."
259-
]
260-
},
261-
{
262-
"cell_type": "code",
263-
"execution_count": null,
264-
"metadata": {},
265-
"outputs": [],
266-
"source": [
267-
"from azureml.core.compute import ComputeTarget, AmlCompute\n",
268-
"from azureml.core.compute_target import ComputeTargetException\n",
269-
"\n",
270-
"# Choose a name for your CPU cluster\n",
271-
"cpu_cluster_name = \"cpucluster\"\n",
272-
"\n",
273-
"# Verify that cluster does not exist already\n",
274-
"try:\n",
275-
" cpu_cluster = ComputeTarget(workspace=ws, name=cpu_cluster_name)\n",
276-
" print(\"Found existing cpucluster\")\n",
277-
"except ComputeTargetException:\n",
278-
" print(\"Creating new cpucluster\")\n",
279-
" \n",
280-
" # Specify the configuration for the new cluster\n",
281-
" compute_config = AmlCompute.provisioning_configuration(vm_size=\"STANDARD_D2_V2\",\n",
282-
" min_nodes=0,\n",
283-
" max_nodes=4)\n",
284-
"\n",
285-
" # Create the cluster with the specified name and configuration\n",
286-
" cpu_cluster = ComputeTarget.create(ws, cpu_cluster_name, compute_config)\n",
287-
" \n",
288-
" # Wait for the cluster to complete, show the output log\n",
289-
" cpu_cluster.wait_for_completion(show_output=True)"
290-
]
291-
},
292-
{
293-
"cell_type": "markdown",
294-
"metadata": {},
295-
"source": [
296-
"To create a **GPU** cluster, run the cell below. Note that your subscription must have sufficient quota for GPU VMs or the command will fail. To increase quota, see [these instructions](https://docs.microsoft.com/en-us/azure/azure-supportability/resource-manager-core-quotas-request). "
297-
]
298-
},
299-
{
300-
"cell_type": "code",
301-
"execution_count": null,
302-
"metadata": {},
303-
"outputs": [],
304-
"source": [
305-
"from azureml.core.compute import ComputeTarget, AmlCompute\n",
306-
"from azureml.core.compute_target import ComputeTargetException\n",
307-
"\n",
308-
"# Choose a name for your GPU cluster\n",
309-
"gpu_cluster_name = \"gpucluster\"\n",
310-
"\n",
311-
"# Verify that cluster does not exist already\n",
312-
"try:\n",
313-
" gpu_cluster = ComputeTarget(workspace=ws, name=gpu_cluster_name)\n",
314-
" print(\"Found existing gpu cluster\")\n",
315-
"except ComputeTargetException:\n",
316-
" print(\"Creating new gpucluster\")\n",
317-
" \n",
318-
" # Specify the configuration for the new cluster\n",
319-
" compute_config = AmlCompute.provisioning_configuration(vm_size=\"STANDARD_NC6\",\n",
320-
" min_nodes=0,\n",
321-
" max_nodes=4)\n",
322-
" # Create the cluster with the specified name and configuration\n",
323-
" gpu_cluster = ComputeTarget.create(ws, gpu_cluster_name, compute_config)\n",
324-
"\n",
325-
" # Wait for the cluster to complete, show the output log\n",
326-
" gpu_cluster.wait_for_completion(show_output=True)"
327-
]
328-
},
329237
{
330238
"cell_type": "markdown",
331239
"metadata": {},

how-to-use-azureml/automated-machine-learning/classification-with-onnx/auto-ml-classification-with-onnx.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -249,7 +249,7 @@
249249
"metadata": {},
250250
"outputs": [],
251251
"source": [
252-
"from azureml.train.automl._vendor.automl.client.core.common.onnx_convert import OnnxConverter\n",
252+
"from azureml.automl.core.onnx_convert import OnnxConverter\n",
253253
"onnx_fl_path = \"./best_model.onnx\"\n",
254254
"OnnxConverter.save_onnx_model(onnx_mdl, onnx_fl_path)"
255255
]

how-to-use-azureml/automated-machine-learning/classification/auto-ml-classification.ipynb

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -328,6 +328,12 @@
328328
" print()\n",
329329
" for estimator in step[1].estimators:\n",
330330
" print_model(estimator[1], estimator[0]+ ' - ')\n",
331+
" elif hasattr(step[1], '_base_learners') and hasattr(step[1], '_meta_learner'):\n",
332+
" print(\"\\nMeta Learner\")\n",
333+
" pprint(step[1]._meta_learner)\n",
334+
" print()\n",
335+
" for estimator in step[1]._base_learners:\n",
336+
" print_model(estimator[1], estimator[0]+ ' - ')\n",
331337
" else:\n",
332338
" pprint(step[1].get_params())\n",
333339
" print()\n",

how-to-use-azureml/automated-machine-learning/dataprep-remote-execution/auto-ml-dataprep-remote-execution.ipynb

Lines changed: 58 additions & 77 deletions
Original file line numberDiff line numberDiff line change
@@ -117,21 +117,34 @@
117117
"outputs": [],
118118
"source": [
119119
"# You can use `auto_read_file` which intelligently figures out delimiters and datatypes of a file.\n",
120-
"# The data referenced here was pulled from `sklearn.datasets.load_digits()`.\n",
121-
"simple_example_data_root = 'https://dprepdata.blob.core.windows.net/automl-notebook-data/'\n",
122-
"X = dprep.auto_read_file(simple_example_data_root + 'X.csv').skip(1) # Remove the header row.\n",
123-
"\n",
120+
"# The data referenced here was a 1MB simple random sample of the Chicago Crime data into a local temporary directory.\n",
124121
"# You can also use `read_csv` and `to_*` transformations to read (with overridable delimiter)\n",
125122
"# and convert column types manually.\n",
126-
"# Here we read a comma delimited file and convert all columns to integers.\n",
127-
"y = dprep.read_csv(simple_example_data_root + 'y.csv').to_long(dprep.ColumnSelector(term='.*', use_regex = True))"
123+
"example_data = 'https://dprepdata.blob.core.windows.net/demo/crime0-random.csv'\n",
124+
"dflow = dprep.auto_read_file(example_data).skip(1) # Remove the header row.\n",
125+
"dflow.get_profile()"
126+
]
127+
},
128+
{
129+
"cell_type": "code",
130+
"execution_count": null,
131+
"metadata": {},
132+
"outputs": [],
133+
"source": [
134+
"# As `Primary Type` is our y data, we need to drop the values those are null in this column.\n",
135+
"dflow = dflow.drop_nulls('Primary Type')\n",
136+
"dflow.head(5)"
128137
]
129138
},
130139
{
131140
"cell_type": "markdown",
132141
"metadata": {},
133142
"source": [
134-
"You can peek the result of a Dataflow at any range using `skip(i)` and `head(j)`. Doing so evaluates only `j` records for all the steps in the Dataflow, which makes it fast even against large datasets."
143+
"### Review the Data Preparation Result\n",
144+
"\n",
145+
"You can peek the result of a Dataflow at any range using `skip(i)` and `head(j)`. Doing so evaluates only `j` records for all the steps in the Dataflow, which makes it fast even against large datasets.\n",
146+
"\n",
147+
"`Dataflow` objects are immutable and are composed of a list of data preparation steps. A `Dataflow` object can be branched at any point for further usage."
135148
]
136149
},
137150
{
@@ -140,7 +153,8 @@
140153
"metadata": {},
141154
"outputs": [],
142155
"source": [
143-
"X.skip(1).head(5)"
156+
"X = dflow.drop_columns(columns=['Primary Type', 'FBI Code'])\n",
157+
"y = dflow.keep_columns(columns=['Primary Type'], validate_column_exists=True)"
144158
]
145159
},
146160
{
@@ -162,9 +176,8 @@
162176
" \"iteration_timeout_minutes\" : 10,\n",
163177
" \"iterations\" : 2,\n",
164178
" \"primary_metric\" : 'AUC_weighted',\n",
165-
" \"preprocess\" : False,\n",
166-
" \"verbosity\" : logging.INFO,\n",
167-
" \"n_cross_validations\": 3\n",
179+
" \"preprocess\" : True,\n",
180+
" \"verbosity\" : logging.INFO\n",
168181
"}"
169182
]
170183
},
@@ -181,7 +194,7 @@
181194
"metadata": {},
182195
"outputs": [],
183196
"source": [
184-
"dsvm_name = 'mydsvmc'\n",
197+
"dsvm_name = 'mydsvmb'\n",
185198
"\n",
186199
"try:\n",
187200
" while ws.compute_targets[dsvm_name].provisioning_state == 'Creating':\n",
@@ -257,6 +270,23 @@
257270
"remote_run"
258271
]
259272
},
273+
{
274+
"cell_type": "markdown",
275+
"metadata": {},
276+
"source": [
277+
"### Pre-process cache cleanup\n",
278+
"The preprocess data gets cache at user default file store. When the run is completed the cache can be cleaned by running below cell"
279+
]
280+
},
281+
{
282+
"cell_type": "code",
283+
"execution_count": null,
284+
"metadata": {},
285+
"outputs": [],
286+
"source": [
287+
"remote_run.clean_preprocessor_cache()"
288+
]
289+
},
260290
{
261291
"cell_type": "markdown",
262292
"metadata": {},
@@ -376,7 +406,8 @@
376406
"source": [
377407
"## Test\n",
378408
"\n",
379-
"#### Load Test Data"
409+
"#### Load Test Data\n",
410+
"For the test data, it should have the same preparation step as the train data. Otherwise it might get failed at the preprocessing step."
380411
]
381412
},
382413
{
@@ -385,20 +416,16 @@
385416
"metadata": {},
386417
"outputs": [],
387418
"source": [
388-
"from sklearn import datasets\n",
389-
"\n",
390-
"digits = datasets.load_digits()\n",
391-
"X_test = digits.data[:10, :]\n",
392-
"y_test = digits.target[:10]\n",
393-
"images = digits.images[:10]"
419+
"dflow_test = dprep.auto_read_file(path='https://dprepdata.blob.core.windows.net/demo/crime0-test.csv').skip(1)\n",
420+
"dflow_test = dflow_test.drop_nulls('Primary Type')"
394421
]
395422
},
396423
{
397424
"cell_type": "markdown",
398425
"metadata": {},
399426
"source": [
400427
"#### Testing Our Best Fitted Model\n",
401-
"We will try to predict 2 digits and see how our model works."
428+
"We will use confusion matrix to see how our model works."
402429
]
403430
},
404431
{
@@ -407,65 +434,19 @@
407434
"metadata": {},
408435
"outputs": [],
409436
"source": [
410-
"#Randomly select digits and test\n",
411-
"from matplotlib import pyplot as plt\n",
412-
"import numpy as np\n",
437+
"from pandas_ml import ConfusionMatrix\n",
413438
"\n",
414-
"for index in np.random.choice(len(y_test), 2, replace = False):\n",
415-
" print(index)\n",
416-
" predicted = fitted_model.predict(X_test[index:index + 1])[0]\n",
417-
" label = y_test[index]\n",
418-
" title = \"Label value = %d Predicted value = %d \" % (label, predicted)\n",
419-
" fig = plt.figure(1, figsize=(3,3))\n",
420-
" ax1 = fig.add_axes((0,0,.8,.8))\n",
421-
" ax1.set_title(title)\n",
422-
" plt.imshow(images[index], cmap = plt.cm.gray_r, interpolation = 'nearest')\n",
423-
" plt.show()"
424-
]
425-
},
426-
{
427-
"cell_type": "markdown",
428-
"metadata": {},
429-
"source": [
430-
"## Appendix"
431-
]
432-
},
433-
{
434-
"cell_type": "markdown",
435-
"metadata": {},
436-
"source": [
437-
"### Capture the `Dataflow` Objects for Later Use in AutoML\n",
439+
"y_test = dflow_test.keep_columns(columns=['Primary Type']).to_pandas_dataframe()\n",
440+
"X_test = dflow_test.drop_columns(columns=['Primary Type', 'FBI Code']).to_pandas_dataframe()\n",
438441
"\n",
439-
"`Dataflow` objects are immutable and are composed of a list of data preparation steps. A `Dataflow` object can be branched at any point for further usage."
440-
]
441-
},
442-
{
443-
"cell_type": "code",
444-
"execution_count": null,
445-
"metadata": {},
446-
"outputs": [],
447-
"source": [
448-
"# sklearn.digits.data + target\n",
449-
"digits_complete = dprep.auto_read_file('https://dprepdata.blob.core.windows.net/automl-notebook-data/digits-complete.csv')"
450-
]
451-
},
452-
{
453-
"cell_type": "markdown",
454-
"metadata": {},
455-
"source": [
456-
"`digits_complete` (sourced from `sklearn.datasets.load_digits()`) is forked into `dflow_X` to capture all the feature columns and `dflow_y` to capture the label column."
457-
]
458-
},
459-
{
460-
"cell_type": "code",
461-
"execution_count": null,
462-
"metadata": {},
463-
"outputs": [],
464-
"source": [
465-
"print(digits_complete.to_pandas_dataframe().shape)\n",
466-
"labels_column = 'Column64'\n",
467-
"dflow_X = digits_complete.drop_columns(columns = [labels_column])\n",
468-
"dflow_y = digits_complete.keep_columns(columns = [labels_column])"
442+
"\n",
443+
"ypred = fitted_model.predict(X_test)\n",
444+
"\n",
445+
"cm = ConfusionMatrix(y_test['Primary Type'], ypred)\n",
446+
"\n",
447+
"print(cm)\n",
448+
"\n",
449+
"cm.plot()"
469450
]
470451
}
471452
],

0 commit comments

Comments
 (0)