-
Create a new resource group
-
In the resource group, create a regular blob storage account. In the storage account, create two private containers named
staging
andmodels
. -
In the resource group, create an empty Azure Synapse Analytics workspace.
-
Create the following file systems in the primary storage account of the workspace:
dev
,staging
, andwwi
. -
Create a linked service to the first pool of the workspace. Configure it to connect with username and password and use the workspace's SQL admin account credentials. It is recommended to name the linked service
sqlpool01
to simplify importing datasets, data flows, and pipelines later. -
Create a linked service to the primary storage account. Configure it to connect with the account key. It is recommended to name the linked service
asadatalake01
to simplify importing datasets, data flows, and pipelines later. -
Create a linked service to the blob storage account. Configure it to connect with the account key. It is recommended to name the linked service
asastore01
to simplify importing datasets, data flows, and pipelines later. -
Create an Apache Spark pool named
SparkPool01
. Ensure Apache Spark version is set to 3.1. -
Create another Apache Spark pool named
SparkPool02
. Ensure Apache Spark version is set to 2.4.NOTE:
This second Spark pool is required for Exercise 08 which runs AutoML in the Spark session. AutoML libraries are currently not supported by the Synapse Spark 3.1 runtime.
-
For the remainder of this guide, the following terms will be used for various ASA-related resources (make sure you replace them with actual names and values):
ASA resource To be referred to as Workspace resource group WorkspaceResourceGroup
Workspace Workspace
Identity used to create Workspace
MasterUser
Primary storage account PrimaryStorage
Blob storage account BlobStorage
First Spark pool SparkPool01
Second Spark pool SparkPool02
First SQL pool SQLPool01
SQL admin account asa.sql.admin
Linked service to first SQL pool sqlpool01
Linked service to primary storage account asadatalake01
Linked service to blob storage account asastore01
-
Ensure the
Workspace
security principal (which has the same name as theWorkspace
) and theMasterUser
(the one used to create theWorkspace
) are added with theStorage Blob Data Owner
role to thePrimaryStorage
.
-
Create a folder named
bronze
in thedev
file system ofPrimaryStorage
. -
Create a folder named
sentiment
in thebronze
folder in thedev
file system ofPrimaryStorage
. -
Upload the following data files to the
bronze
folder created above: -
Create a file system named
database1
inPrimaryStorage
. -
Create a folder named
fact-sale
in thedatabase1
file system. -
Create a folder named
Day=20191201
in thefact-sale
folder. -
Upload the file
sale-small-20191201-snappy.parquet
(https://solliancepublicdata.blob.core.windows.net/wwi-02/sale-small/Year%3D2019/Quarter%3DQ4/Month%3D12/Day%3D20191201/sale-small-20191201-snappy.parquet) to theDay=20191201
folder. -
Create a file system named
database1-staging
inPrimaryStorage
. -
Upload the file
customer.csv
(https://solliancepublicdata.blob.core.windows.net/wwi-02/data-generators/customer.csv) to the root of thedatabase1-staging
file system. -
Create a folder named
sale-small-telemetry
in thewwi
file system inPrimaryStorage
. -
Upload the following data files to the
sale-small-telemetry
folder:
-
Navigate to Create Cognitive Services page and fill in the appropriate values exemplified below to create an All-In-One Cognitive Services account.
-
Go to Cognitive Service > Keys and Endpoint (1). Copy Key 1 (2) and Endpoint (3) to a text editor of your choice to be used in the following steps.
-
Navigate to Create Key Vault page and fill in the appropriate values exemplified below to create a Key Vault account.
-
Go to Key Vault > Access policies (1), and grant (2) the Azure Synapse workspace Managed Service Identity permissions (3) to read secrets from Azure Key Vault.
-
Go to Key Vault > Secret to create a new secret. Select +Generate/Import to continue.
-
Set the secret's name to CognitiveKey, and then paste the key from the previous step into the Value field. Finally, select Create.
-
In the Synapse Workspace, create an Azure Key Vault linked service by pointing to the key vault you just created. It is recommended to name the linked service
KeyVault
. Verify the connection by selecting the Test connection button. If the connection is green, select Create and then select Publish all to save your change. -
In the Synapse Workspace, create an Azure Cognitive Service linked service by pointing to the cognitive service and key vault you just created. Make sure Secret name is set to CognitiveKey. It is recommended to name the linked service
CognitiveService
. Select Create (2) and then select Publish all to save your change. -
In the Synapse Workspace, create a REST linked service. Set Base URL to the previously copied Cognitive Service Endpoint value. Set Authentication Type to Anonymous (3). It is recommended to name the linked service
CognitiveRESTEndpoint
. Select Create (2) and then select Publish all to save your change. -
Create a new, empty dataset with the
cognitive_rest_dataset
name. -
Switch to code view and replace the code with the content of the cognitive_rest_dataset.json JSON file.
-
If the name used for the linked service to the REST Linked Service is not
CognitiveRESTEndpoint
, replace theproperties.linkedServiceName.referenceName
value in JSON with the actual name of the linked service. -
Save and publish the dataset.
-
Open the Exercise 2 - Enrich Data.json file in a text editor. Find the
REPLACE-WITH-YOUR-COGNITIVE-KEY
text and replace it with the previously copied Cognitive Service Key 1. Save the file for future use. This is the new pipeline definition file you will when creating pipeline during the following steps.
-
In Synapse Studio, go to the
Manage
hub and create a new Data Explorer pool. Use theCompute optimized
workload and theExtra Small (2 cores)
size. Leave all other options with defaults. -
Add the Synapse workspace managed identiy as
Contributor
to the Data Explorer workspace. -
Once the Data Explorer pool is created, create a linked service pointing to it. Make sure you use the options highlighed below. Once the linked service is created, make sure you publish it to the Synapse workspace.
-
In the resource group, create an Azure Purview workspace.
-
In Synapse Studio, navigate to the
Manage
hub and select theAzure Purview
section. Connect the Azure Purview workspace to the Synapse Analytics workspace. -
Add the Purview managed identity to the
Reader
role of the Synapse workspace object.Add the Purview managed identity to the
Storage Blob Data Reader
role of the primary data lake storage account assigned to the Synapse workspace.Create a login for the Purview managed identity in the Synapse serverless SQL pool by running the following script on the
Built-in
,master
database (make sure you replace<purview_workspace_name>
with the name of your Purview workspace):CREATE LOGIN [<purview_workspace_name>] FROM EXTERNAL PROVIDER;
Provide read permissions for the Purview managed identity in the
SQLPool01
Synapse dedicated SQL pool by running the following script on theSQLPool01
SQL pool:CREATE USER [asapurview529607] FROM EXTERNAL PROVIDER GO EXEC sp_addrolemember 'db_datareader', [asapurview529607] GO
-
In Purview Studio, navigate to the
Data map
hub and select the root collection. Add the Synapse workspace managed identity to theData curators
role. -
In Purview Studio, register your Synapse workspace as a source into the root collection.
-
Initiate a new scan using the newly registered source. Make sure the scan completes successfully.
-
Configure the scan to target the
SQLPool01
dedicated SQL pool and make sure you test the connection before selectingContinue
. -
Select the
AzureSynapseSQL
scan rule set andContinue
. -
Select
Once
in the scan trigger section andContinue
. -
Select
Save and run
to start the scan. Wait until the scna completes successfully.
Perform the following steps for each dataset to be imported:
-
Create a new, empty dataset with the same name as the one to be imported.
-
Switch to code view and replace the code with the content of the associated JSON file.
-
If the name used for the linked service to
PrimaryStorage
is notasadatalake01
, replace theproperties.linkedServiceName.referenceName
value in JSON with the actual name of the linked service. -
Save and publish the dataset. Optionally, you can publish all datasets at once at the end of the import procedure.
The following datasets pointing to PrimaryStorage
must be imported:
Perform the following steps for each dataset to be imported:
-
Create a new, empty dataset with the same name as the one to be imported.
-
Switch to code view and replace the code with the content of the associated JSON file.
-
If the name used for the linked service to
SQLPool01
is notsqlpool01
, replace theproperties.linkedServiceName.referenceName
value in JSON with the actual name of the linked service. -
Save and publish the dataset. Optionally, you can publish all datasets at once at the end of the import procedure.
The following datasets pointing to SQLPool01
must be imported:
Dataset | Source code |
---|---|
wwi_dimcity_asa |
wwi_dimcity_asa.json |
wwi_dimcustomer_asa |
wwi_dimcustomer_asa.json |
wwi_dimdate_asa |
wwi_dimdate_asa.json |
wwi_dimemployee_asa |
wwi_dimemployee_asa.json |
wwi_dimpaymentmethod_asa |
wwi_dimpaymentmethod_asa.json |
wwi_dimstockitem_asa |
wwi_dimstockitem_asa.json |
wwi_dimsupplier_asa |
wwi_dimsupplier_asa.json |
wwi_dimtransactiontype_asa |
wwi_dimtransactiontype_asa.json |
wwi_factmovement_asa |
wwi_factmovement_asa.json |
wwi_factorder_asa |
wwi_factorder_asa.json |
wwi_factpurchase_asa |
wwi_factpurchase_asa.json |
wwi_factsale_asa |
wwi_factsale_asa.json |
wwi_factstockholding_asa |
wwi_factstockholding_asa.json |
wwi_facttransaction_asa |
wwi_facttransaction_asa.json |
wwi_perf_factsale_fast_asa |
wwi_perf_factsale_fast_asa.json |
wwi_perf_factsale_slow_asa |
wwi_perf_factsale_slow_asa.json |
wwi_staging_dimcustomer_asa |
wwi_staging_dimcustomer_asa.json |
wwi_staging_enrichedcustomer_asa |
wwi_staging_enrichedcustomer_asa.json |
Perform the following steps for each dataset to be imported:
-
Create a new, empty dataset with the same name as the one to be imported.
-
Switch to code view and replace the code with the content of the associated JSON file.
-
If the name used for the linked service does not match the one of the previously created Data Explorer linked service, replace the
properties.linkedServiceName.referenceName
value in JSON with the actual name of the linked service. -
Save and publish the dataset. Optionally, you can publish all datasets at once at the end of the import procedure.
The following datasets pointing to the Data Explorer pool must be imported:
Dataset | Source code |
---|---|
wwi_sale_small_telemetry_ade |
wwi_sale_small_telemetry_ade.json |
Perform the following steps for each data flow to be imported:
-
Create a new, empty data flow with the same name as the one to be imported.
-
Switch to code view and replace the code with the content of the associated JSON file.
-
Save and publish the data flow. Optionally, you can publish all data flows at once at the end of the import procedure.
The following data-flows must be imported:
Data flow | Source code |
---|---|
EnrichCustomerData |
EnrichCustomerData.json |
Perform the following steps for each pipeline to be imported:
-
Create a new, empty pipeline with the same name as the one to be imported.
-
Switch to code view and replace the code with the content of the associated JSON file.
-
Save and publish the pipeline. Optionally, you can publish all pipelines at once at the end of the import procedure.
The following pipelines must be imported:
Pipeline | Source code |
---|---|
Exercise 2 - Enrich Data * |
Exercise 2 - Enrich Data.json |
Import WWI Data |
Import WWI Data.json |
Import WWI Data - Fact Sale Full |
Import WWI Data - Fact Sale Full.json |
Import WWI Perf Data - Fact Sale Fast |
Import WWI Perf Data - Fact Sale Fast.json |
Import WWI Perf Data - Fact Sale Slow |
Import WWI Perf Data - Fact Sale Slow.json |
Exercise 5 - Import sales telemetry data |
Exercise 5 - Import sales telemetry data |
* Make sure you are using the updated Pipeline file where you replaced the REPLACE-WITH-YOUR-COGNITIVE-KEY
value with the actual Cognitive Services Key 1 value.
-
Import the Setup - Export Sales to Data Lake notebook.
-
Replace
<primary_storage>
with the actual data lake account name ofPrimaryStorage
in cells 1, 3, 4 and 5. -
Run the notebook to populate
PrimaryStorage
with data.
-
Create a SQL on-demand database running the following script on the
master
database of the SQL on-demand pool:CREATE DATABASE SQLOnDemand01
-
Ensure the SQL on-demand pool can query the storage account using the following script:
CREATE CREDENTIAL [https://<primary_storage>.dfs.core.windows.net] WITH IDENTITY='User Identity';
In the script above, replace
<primary_storage>
with the name ofPrimaryStorage
.
-
Connect with either the SQL Active Directory admin or the
asa.sql.admin
account toSQLPool01
using the tool of your choice. -
Run the WWI-Reset-SQLPoolSchema SQL script to initialize the schema of the SQL pool.
-
Create the
asa.sql.staging
login in themaster
database ofWorkspace
using the following script:CREATE LOGIN [asa.sql.staging] WITH PASSWORD = '<password>' GO
In the script above, replace
<password>
with the actual password of the login. -
Create the
asa.sql.staging
user inSQLPool01
using the following script:CREATE USER [asa.sql.staging] FOR LOGIN [asa.sql.staging] WITH DEFAULT_SCHEMA = dbo GO -- Add user to the required roles EXEC sp_addrolemember N'db_datareader', N'asa.sql.staging' GO EXEC sp_addrolemember N'db_datawriter', N'asa.sql.staging' GO EXEC sp_addrolemember N'db_ddladmin', N'asa.sql.staging' GO
-
Configure access control to workspace pipeline runs in
SQLPool01
using the following script:--Create user in DB CREATE USER [<workspace>] FROM EXTERNAL PROVIDER; --Granting permission to the identity GRANT CONTROL ON DATABASE::<sqlpool> TO [<workspace>];
In the script above, replace
<workspace>
with the actual name ofWorkspace
and<sqlpool>
with the actual name ofSQLPool01
. -
Run the
Import WWI Data
pipeline to import all data except the sale facts intoSQLPool01
. -
Run the
Import WWI Data - Fact Sale Full
pipeline to import the sale facts intoSQLPool01
. -
Run the
Import WWI Perf Data - Fact Sale Fast
andImport WWI Perf Data - Fact Sale Slow
pipelines to import the large-sized sale facts intoSQLPool01
.
-
Ensure the
MasterUser
has a Power BI Pro subscription assigned. -
Sign in to the Power BI portal using the credentials of
MasterUser
and create a new Power BI workspace. In the remainder of this guide, this workspace will be referred to asPowerBIWorkspace
. -
Perform all the steps described in Exercise 3 - Task 1. In step 11, instead of using the suggested naming convention, name your dataset
wwifactsales
. -
In the Power BI portal, edit the security settings of the
wwifactsales
dataset and configure it to authenticate toSQLPool01
using the credentials of theasa.sql.admin
account. This allows theDirect Query
option to work correctly for all participants in the exercise.
-
In the resource group, create a new Azure Machine Learning workspace (with default settings).
-
Add the managed identity of the Synapse Analytics workspace as
Contributor
to the newly created Azure Machine Learning workspace. -
In Synapse Analytics Studio, create a new Azure Machine Learning linked service pointing to the newly created Azure Machine Learning workspace. Set the linked service name to be the same as the Azure Machine Learning workspace name and select it from the subscription. Make sure you test the connection to validate that all settings are correct. After the linked service is created, make sure to publish it in Synapse Studio.
Import the following SQL scripts into Workspace
:
SQL script name | Source code | Replacements |
---|---|---|
Exercise 1 - Read with serverless SQL Pool |
Exercise 1 - Read with serverless SQL Pool.sql | <primary_storage> with the actual name of PrimaryStorage |
Exercise 3 - Analyze Transactions |
Exercise 3 - Analyze Transactions.sql | None |
Exercise 3 - Investigate query performance |
Exercise 3 - Investigate query performance.sql | None |
Import the following Spark notebooks into Workspace
:
Spark notebook name | Source code | Spark Pool to use | Replacements |
---|---|---|---|
Exercise 1 - Read with Spark |
Exercise 1 - Read with Spark.ipynb | SparkPool01 |
<primary_storage> with the actual name of PrimaryStorage |
Exercise 2 - Ingest Sales Data |
Exercise 2 - Ingest Sales Data.ipynb | SparkPool01 |
In cell 1 - <primary_storage> with the actual name of PrimaryStorage |
Exercise 2 - Bonus Notebook with CSharp |
Exercise 2 - Bonus Notebook with CSharp.ipynb | SparkPool01 |
In cell 1 - <primary_storage> with the actual name of PrimaryStorage ; In cell 3 - <sql_staging_password> with the password of asa.sql.staging created above in Task 9, step 3; In cell 3 - <workspace> with the name of the Workspace ; In cell 3 - <sql_pool> with the name of SQLPool1 |
Exercise 8 - AutoML with Spark |
Exercise 8 - Automl with Spark.ipynb | SparkPool02 |
In cell 4 - <subscription_id> with the subscription id, <resource_group_name> with the name of the resource group, <aml_workspace_name> with the name of the Azure Machine Learning workspace, <aml_workspace_location> with the location of the Azure Machine Learning workspace. |
For each additional user that needs to have access to Workspace
and run exercises 1 through 8, the following steps must be performed:
-
Assign the
Reader
role on theWorkspaceResourceGroup
to the user. -
Assign the
Reader
andBlob Data Contributor
roles on thePrimaryStorage
to the user. -
Assign the
Workspace admin
role in theWorkspace
to the user. -
Grant access to the
SQLPool01
to the user using the script below. You must be signed in with theMasterUser
credentials (use SQL Server Management Studio if the script fails in Synapse Studio).CREATE USER [<user_principal_name>] FROM EXTERNAL PROVIDER; EXEC sp_addrolemember 'db_owner', '<user_principal_name>';
In the script above, replace
<user_principal_name>
with Azure Active Directory user principal name of the user. -
Assign the
Contributor
role on the Power BI workspace of theMasterUser
created in Task 10, step 2.
-
In Synapse Workspace, open Exercise 2 - Enrich Data (2) pipeline. Select the Edit (3) button for Activities in the ForEachComment ForEach activity.
-
Select Sentiment Analysis (1) copy data activity. Switch to the Source (2) tab. Paste the Cognitive Service access key previously copied into a text editor.