Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support Azure cloud storage service principal authentication - expedite #5926

Open
wants to merge 76 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
76 commits
Select commit Hold shift + click to select a range
0e81f1b
Fix bug when failure
FrsECM Apr 23, 2024
897435a
Add SPI
FrsECM Apr 23, 2024
d7c7165
Add Dependancy
FrsECM Apr 23, 2024
de5ad0d
Docker Build fix
FrsECM Apr 23, 2024
45603b1
Clean-up properties
FrsECM Apr 23, 2024
dfb85e3
Add Encryption
FrsECM May 6, 2024
a9b1e67
Add storage in io_storages.tavern.yml
FrsECM May 23, 2024
8866176
Merge branch 'feature/azure-service-principal' of https://github.com/…
machov May 28, 2024
5eeb056
run poetry update;
machov May 28, 2024
7c3ac4d
ruff
machov May 28, 2024
71c4bb7
run blue
machov May 28, 2024
c97bb43
merge remote
machov May 28, 2024
381b534
update poetry x2
machov May 28, 2024
1a4c3b5
merge remote
machov Jun 16, 2024
2366f0a
update poetry lock
machov Jun 16, 2024
fc5b8fc
Merge branch 'develop' of https://github.com/HumanSignal/label-studio…
machov Jun 17, 2024
3543450
Merge branch 'develop' of https://github.com/HumanSignal/label-studio…
machov Jun 17, 2024
b9a0b50
update tests
machov Jun 17, 2024
407e308
fix test #2 test_urls_mismatch_with_registered
machov Jun 17, 2024
7f27ec3
fix test io_storages_presign_proxy.tavern.yml::get_import_export_stor…
machov Jun 17, 2024
2d19c20
tmp adls test progress
machov Jun 17, 2024
660ecb4
add sdk vars after comparing
machov Jun 20, 2024
510439b
add models
machov Jun 20, 2024
8e1f980
add updates to azure SP auth
machov Jun 23, 2024
c3f3aa7
Merge branch 'feature/azure-service-principal' of https://github.com/…
machov Jun 23, 2024
f9a3ef0
add io storage
machov Jun 24, 2024
87ff3a9
skip secure creds
machov Jun 24, 2024
d77449b
merge remote
machov Jun 24, 2024
e0bd8c3
add SP tests
machov Jun 24, 2024
577cf87
add SP tests
machov Jun 24, 2024
a275507
add SP tests
machov Jun 24, 2024
26f395c
add SP tests
machov Jun 24, 2024
0f827b5
roll back secure encryption
machov Jun 26, 2024
bc7169e
fix mock pytest azure spi
machov Jun 26, 2024
d7381ef
add tests back up
machov Jun 26, 2024
d7a79da
Merge branch 'develop' into man/friend
machov Jun 26, 2024
df98cfd
clean based on prev set up
machov Jun 26, 2024
f985c69
Merge branch 'feat/sp_all_tests' into man/friend
machov Jun 26, 2024
6bee525
roll back tests
machov Jun 26, 2024
0e1c416
ruff ruff
machov Jun 26, 2024
25b4787
blue
machov Jun 26, 2024
c094e00
update lock
machov Jun 26, 2024
bc01fe8
add pyproject
machov Jun 27, 2024
4db9a11
Delete label_studio/README.md
machov Jun 27, 2024
8e84faa
fix test return value
machov Jun 30, 2024
fdc5e38
Merge branch 'man/friend' of https://github.com/machov/label-studio i…
machov Jul 1, 2024
c2a5fad
Merge branch 'develop' of https://github.com/HumanSignal/label-studio…
machov Jul 1, 2024
b435bc7
add docs
machov Jul 1, 2024
66fdec7
Update dataset_create.md
machov Jul 1, 2024
6c62cdd
Update dataset_create.md
machov Jul 1, 2024
b54a32c
merge remote
machov Jul 4, 2024
163e102
fix resolve uris!
machov Jul 8, 2024
eb824d7
Merge branch 'develop' of https://github.com/HumanSignal/label-studio…
machov Jul 8, 2024
860796f
Merge branch 'dev/man/debug' into man/friend
machov Jul 8, 2024
6d148b2
update migrations
machov Jul 8, 2024
3ea1704
resolve regex pattern
machov Jul 8, 2024
5247e49
update tests to meet new functionality
machov Jul 8, 2024
a7658a4
ruff
machov Jul 8, 2024
a271e6d
blue
machov Jul 8, 2024
24f1bb8
improve docs
machov Jul 8, 2024
1d3b70e
update poetry
machov Jul 8, 2024
acc6443
remove docs
machov Jul 8, 2024
22a6feb
Merge branch 'man/friend' of https://github.com/machov/label-studio i…
machov Jul 8, 2024
2d52d37
Update storage.md
machov Jul 8, 2024
306ad20
update API
machov Jul 8, 2024
8dec377
clean up migrations
machov Jul 8, 2024
b03ea71
update urls
machov Jul 8, 2024
e5d4be5
Merge branch 'man/friend' of https://github.com/machov/label-studio i…
machov Jul 8, 2024
fab0b82
generate new all_urls.json
machov Jul 9, 2024
7c98d52
Merge branch 'develop' into man/friend
machov Jul 9, 2024
7b7dc4e
remove frontend
machov Jul 9, 2024
ac5ed8d
Merge branch 'develop' into man/friend
machov Jul 15, 2024
8f47543
merge master
machov Jul 15, 2024
8a473c8
merge master
machov Jul 15, 2024
1c53fa8
merge master
machov Jul 15, 2024
20465c9
poetry lock
machov Jul 15, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
68 changes: 66 additions & 2 deletions docs/source/guide/storage.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,8 @@ Integrate popular cloud and external storage systems with Label Studio to collec
Set up the following cloud and other storage systems with Label Studio:
- [Amazon S3](#Amazon-S3)
- [Google Cloud Storage](#Google-Cloud-Storage)
- [Microsoft Azure Blob storage](#Microsoft-Azure-Blob-storage)
- [Microsoft Azure Blob storage - Account Key](#Microsoft-Azure-Blob-storag-Account-Key)
- [Microsoft Azure Blob storage - Service Principal](#Microsoft-Azure-Blob-storage-Service-Principal)
- [Redis database](#Redis-database)
- [Local storage](#Local-storage) <div class="enterprise-only">(for On-prem only)</div>

Expand Down Expand Up @@ -461,7 +462,7 @@ You can also create a storage connection using the Label Studio API.
- See [Create new import storage](/api#operation/api_storages_gcs_create) then [sync the import storage](/api#operation/api_storages_gcs_sync_create).
- See [Create export storage](/api#operation/api_storages_export_gcs_create) and after annotating, [sync the export storage](/api#operation/api_storages_export_gcs_sync_create).

## Microsoft Azure Blob storage
## Microsoft Azure Blob storage - Account Key

Connect your [Microsoft Azure Blob storage](https://docs.microsoft.com/en-us/azure/storage/blobs/storage-blobs-introduction) container with Label Studio. For details about how Label Studio secures access to cloud storage, see [Secure access to cloud storage](security.html#Secure-access-to-cloud-storage).

Expand Down Expand Up @@ -500,6 +501,69 @@ You can also create a storage connection using the Label Studio API.
- See [Create new import storage](/api#operation/api_storages_azure_create) then [sync the import storage](/api#operation/api_storages_azure_sync_create).
- See [Create export storage](/api#operation/api_storages_export_azure_create) and after annotating, [sync the export storage](/api#operation/api_storages_export_azure_sync_create).


## Microsoft Azure Blob storage - Service Principal

You can connect your [Microsoft Azure Blob storage](https://docs.microsoft.com/en-us/azure/storage/blobs/storage-blobs-introduction) container with Label Studio with a service principal authentication. Service Principal can provide more granular Access Controls for your organization. For details about how Label Studio secures access to cloud storage, see [Secure access to cloud storage](security.html#Secure-access-to-cloud-storage).

### Prerequisites
You must set two environment variables in Label Studio to connect to Azure Blob storage:
- `AZURE_BLOB_ACCOUNT_NAME` to specify the name of the storage account.
- `AZURE_CLIENT_ID` to specify the client id for the service principal.
- `AZURE_TENANT_ID` to specify the tenant id for the service principal.
- `AZURE_CLIENT_SECRET` to specify the secret id for the service principal.


Configure the specific Azure Blob container that you want Label Studio to use in the UI. In most cases involving CORS issues, the GET permission (*/GET/*/Access-Control-Allow-Origin/3600) is necessary within the Resource Sharing tab:

<img src="/images/azure-storage-cors.png" class="gif-border">


###### Create a Service Principal

1. Create a service principal via Azure docs [How to Create a Service Principal](https://learn.microsoft.com/en-us/entra/identity-platform/howto-create-service-principal-portal).

###### Create a Service Principal Secret (Client Secret)
1. Create a new client secret by following the steps below
2. Browse to Identity > Applications > App registrations, then select your application.
3. Select Certificates & secrets.
4. Select Client secrets, and then select New client secret.
Provide a description of the secret, and a duration.
5. Select Add.

###### Create the Contributor Role Based Access Cotnrol

When you create a storage account, assign a Contributor role to your service principal - read more about [role assignment steps](https://learn.microsoft.com/en-us/azure/role-based-access-control/role-assignments-steps).


### Set up connection in the Label Studio UI
In the Label Studio UI, do the following to set up the connection:

1. Open Label Studio in your web browser.
2. For a specific project, open **Settings > Cloud Storage**.
3. Click **Add Source Storage**.
4. In the dialog box that appears, select **Microsoft Azure** as the storage type.
5. In the **Storage Title** field, type a name for the storage to appear in the Label Studio UI.
6. Specify the name of the Azure Blob container, and if relevant, the container prefix to specify an internal folder or container.
7. Adjust the remaining optional parameters:
- In the **File Filter Regex** field, specify a regular expression to filter bucket objects. Use `.*` to collect all objects.
- In the **Account Name** field, specify the account name for the Azure storage. You can also set this field as an environment variable,`AZURE_BLOB_ACCOUNT_NAME`.
- In the **Client ID** field, specify the Client ID of the service principal to access the storage account. You can also set this field as an environment variable,`AZURE_CLIENT_ID`.
- In the **Tenant ID** field, specify the Tenant ID of the service principal to access the storage account. You can also set this field as an environment variable,`AZURE_TENANT_ID`.
- In the **Client Secret** field, specify the Client Secret of the service principal to access the storage account. You can also set this field as an environment variable,`AZURE_CLIENT_SECRET`.
- Enable **Treat every bucket object as a source file** if your bucket contains BLOB storage files such as JPG, MP3, or similar file types. This setting creates a URL for each bucket object to use for labeling, for example `azure-spi://container-name/image.jpg`. Leave this option disabled if you have multiple JSON files in the bucket with one task per JSON file.
- Choose whether to disable **Use pre-signed URLs**, or [shared access signatures](https://docs.microsoft.com/en-us/rest/api/storageservices/delegate-access-with-shared-access-signature). If your tasks contain azure-spi://... links, they must be pre-signed in order to be displayed in the browser.
- Adjust the counter for how many minutes the shared access signatures are valid.
8. Click **Add Storage**.
9. Repeat these steps for **Target Storage** to sync completed data annotations to a container.

After adding the storage, click **Sync** to collect tasks from the container, or make an API call to [sync import storage](/api#operation/api_storages_azure_spi_sync_create).

### Add storage with the Label Studio API
You can also create a storage connection using the Label Studio API.
- See [Create new import storage](/api#operation/api_storages_azure_spi_create) then [sync the import storage](/api#operation/api_storages_azure_spi_sync_create).
- See [Create export storage](/api#operation/api_storages_export_azure_spi_create) and after annotating, [sync the export storage](/api#operation/api_storages_export_azure_spi_sync_create).

## Redis database

You can also store your tasks and annotations in a [Redis database](https://redis.io/). You must store the tasks and annotations in different databases. You might want to use a Redis database if you find that relying on a file-based cloud storage connection is slow for your datasets.
Expand Down
62 changes: 61 additions & 1 deletion label_studio/core/all_urls.json
Original file line number Diff line number Diff line change
Expand Up @@ -743,6 +743,66 @@
"name": "storages:api:export-storage-azure-form",
"decorators": ""
},
{
"url": "/api/storages/azure_spi/",
"module": "io_storages.azure_serviceprincipal.api.AzureServicePrincipalImportStorageListAPI",
"name": "storages:api:storage-azure_spi-list",
"decorators": ""
},
{
"url": "/api/storages/azure_spi/<int:pk>",
"module": "io_storages.azure_serviceprincipal.api.AzureServicePrincipalImportStorageDetailAPI",
"name": "storages:api:storage-azure_spi-detail",
"decorators": ""
},
{
"url": "/api/storages/azure_spi/<int:pk>/sync",
"module": "io_storages.azure_serviceprincipal.api.AzureServicePrincipalImportStorageSyncAPI",
"name": "storages:api:storage-azure_spi-sync",
"decorators": ""
},
{
"url": "/api/storages/azure_spi/validate",
"module": "io_storages.azure_serviceprincipal.api.AzureServicePrincipalImportStorageValidateAPI",
"name": "storages:api:storage-azure_spi-validate",
"decorators": ""
},
{
"url": "/api/storages/azure_spi/form",
"module": "io_storages.azure_serviceprincipal.api.AzureServicePrincipalImportStorageFormLayoutAPI",
"name": "storages:api:storage-azure_spi-form",
"decorators": ""
},
{
"url": "/api/storages/export/azure_spi",
"module": "io_storages.azure_serviceprincipal.api.AzureServicePrincipalExportStorageListAPI",
"name": "storages:api:export-storage-azure_spi-list",
"decorators": ""
},
{
"url": "/api/storages/export/azure_spi/<int:pk>",
"module": "io_storages.azure_serviceprincipal.api.AzureServicePrincipalExportStorageDetailAPI",
"name": "storages:api:export-storage-azure_spi-detail",
"decorators": ""
},
{
"url": "/api/storages/export/azure_spi/<int:pk>/sync",
"module": "io_storages.azure_serviceprincipal.api.AzureServicePrincipalExportStorageSyncAPI",
"name": "storages:api:export-storage-azure_spi-sync",
"decorators": ""
},
{
"url": "/api/storages/export/azure_spi/validate",
"module": "io_storages.azure_serviceprincipal.api.AzureServicePrincipalExportStorageValidateAPI",
"name": "storages:api:export-storage-azure_spi-validate",
"decorators": ""
},
{
"url": "/api/storages/export/azure_spi/form",
"module": "io_storages.azure_serviceprincipal.api.AzureServicePrincipalExportStorageFormLayoutAPI",
"name": "storages:api:export-storage-azure_spi-form",
"decorators": ""
},
{
"url": "/api/storages/gcs/",
"module": "io_storages.gcs.api.GCSImportStorageListAPI",
Expand Down Expand Up @@ -943,7 +1003,7 @@
},
{
"url": "/api/ml/<int:pk>/predict/test",
"module": "ml.api.MLBackendPredictAPI",
"module": "ml.api.MLBackendPredictTestAPI",
"name": "ml:api:ml-predict-test",
"decorators": ""
},
Expand Down
13 changes: 13 additions & 0 deletions label_studio/core/settings/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -400,6 +400,7 @@
# STATIC_URL = FORCE_SCRIPT_NAME + STATIC_URL
logger.info(f'=> Static URL is set to: {STATIC_URL}')


STATIC_ROOT = os.path.join(BASE_DIR, 'static_build')
STATICFILES_DIRS = [os.path.join(BASE_DIR, 'static')]
STATICFILES_FINDERS = (
Expand Down Expand Up @@ -532,6 +533,7 @@
'io_storages_s3importstoragelink',
'io_storages_gcsimportstoragelink',
'io_storages_azureblobimportstoragelink',
'io_storages_azureserviceprincipalimportstoragelink',
'io_storages_localfilesimportstoragelink',
'io_storages_redisimportstoragelink',
]
Expand Down Expand Up @@ -682,6 +684,17 @@ def collect_versions_dummy(**kwargs):
AZURE_URL_EXPIRATION_SECS = int(get_env('STORAGE_AZURE_URL_EXPIRATION_SECS', '86400'))
AZURE_LOCATION = get_env('STORAGE_AZURE_FOLDER', default='')

if get_env('STORAGE_TYPE') == 'azure_spi':
CLOUD_FILE_STORAGE_ENABLED = True
DEFAULT_FILE_STORAGE = 'core.storage.CustomAzureStorage'
AZURE_ACCOUNT_NAME = get_env('STORAGE_AZURE_ACCOUNT_NAME')
AZURE_CLIENT_ID = get_env('STORAGE_AZURE_CLIENT_ID')
AZURE_CLIENT_SECRET = get_env('STORAGE_AZURE_CLIENT_SECRET')
AZURE_TENANT_ID = get_env('STORAGE_AZURE_TENANT_ID')
AZURE_CONTAINER = get_env('STORAGE_AZURE_CONTAINER_NAME')
AZURE_URL_EXPIRATION_SECS = int(get_env('STORAGE_AZURE_URL_EXPIRATION_SECS', '86400'))
AZURE_LOCATION = get_env('STORAGE_AZURE_FOLDER', default='')

if get_env('STORAGE_TYPE') == 'gcs':
CLOUD_FILE_STORAGE_ENABLED = True
# DEFAULT_FILE_STORAGE = 'storages.backends.gcloud.GoogleCloudStorage'
Expand Down
2 changes: 2 additions & 0 deletions label_studio/io_storages/azure_serviceprincipal/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
"""This file and its contents are licensed under the Apache License 2.0. Please see the included NOTICE for copyright information and LICENSE for a copy of the license.
"""
Loading