Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Feature/azure service principal - expedite #5926

Open
wants to merge 49 commits into
base: develop
Choose a base branch
from

Conversation

machov
Copy link

@machov machov commented May 28, 2024

PR fulfills these requirements

  • Commit message(s) and PR title follows the format [fix|feat|ci|chore|doc]: TICKET-ID: Short description of change made ex. fix: DEV-XXXX: Removed inconsistent code usage causing intermittent errors
  • Tests for the changes have been added/updated (for bug fixes/features)
  • Docs have been added/updated (for bug fixes/features)
  • Best efforts were made to ensure docs/code are concise and coherent (checked for spelling/grammatical errors, commented out code, debug logs etc.)
  • Self-reviewed and ran all changes on a local instance (for bug fixes/features)

Change has impacts in these area(s)

  • Product design
  • Backend (Database)
  • Backend (API)
  • Frontend

Describe the reason for change

For the moment, there is only an authentication through account key to Azure. The problem is that account key give a lot of rights on the storage that may not be necessary in order to perform readonly operations.

What does this fix?

Now it is possible to have an azure integration that is based on a service principal.

What is the new behavior?

We have to create an app registration in Azure :
We give this registration specific rights depending on what we want to do in labelstudio :
In the UI, we create a new storage integration :

What is the current behavior?

There is no impact on the previous behavior, it's just a new one.

What libraries were added/updated?

Azure Identity

Does this change affect performance?

Get a delegation key is quite slow, that's why there is :

  • 24h cache for delegation key
  • 15mn cache for sas token
    There is a second minor change :
  • When a "thumbnail" tag is present in the datarow, it will be used in the grid view
  • When there is multiple layer in the image, the gridview will only render the first channel.
    (Today there is a buggy behavior).

Does this change affect security?

It should improve the security for people who just need specific rights on the container. For the moment, the "write" behavior have not been tested / developped on the container.

What alternative approaches were there?

We can use an enterprise application and SSO in order to be linked to a user right and act on behalf of him.

What feature flags were used to cover this change?

None

Does this PR introduce a breaking change?

(check only one)

  • Yes, and covered entirely by feature flag(s)
  • Yes, and covered partially by feature flag(s)
  • No
  • Not sure (briefly explain the situation below)

What level of testing was included in the change?

  • e2e
  • integration
  • unit

Which logical domain(s) does this change affect?

  • Authentication
  • UI

Copy link

netlify bot commented May 28, 2024

👷 Deploy request for label-studio-docs-new-theme pending review.

Visit the deploys page to approve it

Name Link
🔨 Latest commit 6c62cdd

Copy link

netlify bot commented May 28, 2024

👷 Deploy request for heartex-docs pending review.

Visit the deploys page to approve it

Name Link
🔨 Latest commit 6c62cdd

Copy link

sentry-io bot commented May 28, 2024

🔍 Existing Issues For Review

Your pull request is modifying functions with the following pre-existing issues:

📄 File: label_studio/io_storages/base_models.py

Function Unhandled Issue
info_set_in_progress ValueError: Storage status (in_progress) must be QUEUED to move it IN_PROGRESS io_storages.base...
Event Count: 1

Did you find this useful? React with a 👍 or 👎

@machov machov marked this pull request as ready for review May 28, 2024 05:07
@machov machov changed the title Man/friend feat: Feature/azure service principal May 28, 2024
@machov machov changed the title feat: Feature/azure service principal feat: Feature/azure service principal - expedite May 28, 2024
@makseq
Copy link
Member

makseq commented May 28, 2024

Is it a copy of #5765 ?

@machov machov closed this May 28, 2024
@machov
Copy link
Author

machov commented May 30, 2024

I also created a PR to @FrsECM branch FrsECM#1

@machov machov reopened this Jun 16, 2024
@machov machov marked this pull request as ready for review June 26, 2024 15:41
@machov
Copy link
Author

machov commented Jun 26, 2024

tests are now passing, do you guys want to add this PR to the tests @makseq ?
image

@FrsECM
Copy link
Contributor

FrsECM commented Jun 27, 2024

Hi !
Why did you rollback secure encryption in db ?
Thanks,

@machov
Copy link
Author

machov commented Jun 27, 2024

I was having issues with encryption/decryption at the time of testing credentials manually from the UI

I also thought not necessary cause the other Azure storage (key auth option) is not encrypting keys and we are also popping secure fields out here
https://github.com/machov/label-studio/blob/4db9a11d5a822c3d7e5c7b4efe70fea890356564/label_studio/io_storages/azure_serviceprincipal/serializers.py#L25

@machov
Copy link
Author

machov commented Jun 27, 2024

let me know what you think @FrsECM , we can also add them back

@machov machov closed this Jun 27, 2024
@machov machov reopened this Jun 27, 2024
@FrsECM
Copy link
Contributor

FrsECM commented Jun 27, 2024

I was having issues with encryption/decryption at the time of testing credentials manually from the UI

I also thought not necessary cause the other Azure storage (key auth option) is not encrypting keys and we are also popping secure fields out here https://github.com/machov/label-studio/blob/4db9a11d5a822c3d7e5c7b4efe70fea890356564/label_studio/io_storages/azure_serviceprincipal/serializers.py#L25

I understand that testing with the UI may be an issue. To give you some context, at the beggining i didn't implement this but i did a security review within my company and i've been asked to make sure that credentials are not stored "clear" in the database.
For password, you do sha256 hash, because you have the hand on the verification process on the application side. But an account key is sensitive like if not more than the user password because a leak could not only impact labelling but the data.

A account key that is leaking is a risk that an attacker removes the data directly from the bucket what was not acceptable from our side.

That's why i implemented this quick fix for SPI, but of course it should be relevant the same way for AzureBlob Account key.

Because this credential have to be given to another provider, we can not only hash, we need to encrypt and decrypt, that's why i coded this this way, and in order to do this on the fly, i added this step in the "validation" function.

What is your position concerning that ?

Maybe it can be included in another PR but it should definitely be in your roadmap in order to improve security of your application.

@machov
Copy link
Author

machov commented Jun 27, 2024

I try adding it back but Windows pytest fails for some reason https://github.com/machov/label-studio/actions/runs/9691945908/job/26745179790?pr=10

@machov
Copy link
Author

machov commented Jun 27, 2024

For the current PR, I am not able to successfully create the pre-signed urls, any suggestions?

- name: stage
  request:
    method: GET
    url: '{django_live_url}/api/projects/{project_pk}/next'
  response:
    json:
      data:
        image_url: !re_match "/tasks/\\d+/presign/\\?fileuri=YXp1cmUtYmxvYjovL3B5dGVzdC1henVyZS1pbWFnZXMvYWJj"
    status_code: 200

this test above fails, the endpoint returns "azure-spi://<path/to/file>" instead of "/tasks/\d+/presign/\?fileuri=YX..."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants