Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: make fips compliant by replacing md5 with sha256 in postgres and usedforsecurity=False for python md5 #21014

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

actionjax
Copy link

@actionjax actionjax commented Aug 8, 2022

fix(superset): allows superset to run on fips compliance machines

SUMMARY

added usedforsecurity=False for md5(..) methods and replaced postgres md5(...) with sha256(..)

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TESTING INSTRUCTIONS

Run a fips complaint Linux VM (we used Rocky8) but any Linux with fips enabled should work.

ADDITIONAL INFORMATION

  • Has associated issue: support fips compliance by replacing md5 with sha256 #20904
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

@actionjax actionjax requested a review from a team as a code owner August 8, 2022 17:27
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Congrats on making your first PR and thank you for contributing to Superset! 🎉 ❤️
We hope to see you in our Slack community too!

md5_obj = md5()
md5_obj.update(seed.encode("utf-8"))
return UUID(md5_obj.hexdigest())
rd = random.Random()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't this break existing namespaces? It seems that this logic was intended to keep UUIDs deterministic based on some input

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, at first I thought the uuid was only used in cache, looks like that hash is stored in the db. To address your concern about deterministic I've replaced random with sha256. Will look to see how we can make this backwards compatible.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you would need to make a migration that updates all existing entries from the MD5 based UUIDs to using SHA256 as the base. I think it should be doable, but we do need to be very careful with this. Ping @michael-s-molina , do you have any concerns here?

@@ -21,7 +21,7 @@


def md5_sha_from_str(val: str) -> str:
return hashlib.md5(val.encode("utf-8")).hexdigest()
return hashlib.sha256(val.encode("utf-8")).hexdigest()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above

Copy link
Member

@craig-rueda craig-rueda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like there's a few potential issues with this. Pls provide some tests that validate backwards compatibility.

@ccravens
Copy link

WIth the recent release of 3.0.0 was hoping to see this as a lot of orgs run on FIPS compliant environments and this prevents superset from operating in those environments.

@rusackas
Copy link
Member

Seems there are some loose threads here... is anyone planning to pick this back up, or should we close it out?

@villebro
Copy link
Member

@actionjax it should be easy to make this configurable, thus keeping backwards compatibility. Let me know if you need help implementing it.

@actionjax
Copy link
Author

@actionjax it should be easy to make this configurable, thus keeping backwards compatibility. Let me know if you need help implementing it.

Hey @villebro, unfortunately I do not have the cycles to look and implement this currently. I would recommend someone else taking over.

@rusackas rusackas added the need:community-volunteer Requires a community volunteer label Feb 1, 2024
@rusackas rusackas marked this pull request as draft February 9, 2024 18:48
@rusackas
Copy link
Member

rusackas commented Feb 9, 2024

Converting this to draft mode, rather than closing this. In due time, if nobody picks it up, we might close it out, but I remain hopeful. I'll close and re-open it just to jump-start CI and see how that plays out.

@rusackas rusackas closed this Feb 9, 2024
@rusackas rusackas reopened this Feb 9, 2024
@github-actions github-actions bot added the risk:db-migration PRs that require a DB migration label Feb 9, 2024
@villebro
Copy link
Member

villebro commented Feb 9, 2024

Ping @ccravens if you're interested in adding optional support for FIPS compliance

@digitalkristurner
Copy link

I would be interested in working on this. I think Superset is a great project and would be really helpful in other projects I'm working on.

@rusackas
Copy link
Member

@digitalkristurner fantastic! We'd love to see that.

@digitalkristurner
Copy link

Hey all, changed all the references from MD5 to SHA256 and then checked and corrected for all the possible overflows within the databases. I've ran the unit tests and they're all passing now. Just need to try and figure out the integration tests - any steer or guidance on that would be much appreciated.

Quick note postgres only added support for SHA256 in Postgres 11 - not sure how much of a problem that'll be?

@actionjax
Copy link
Author

Hey all, changed all the references from MD5 to SHA256 and then checked and corrected for all the possible overflows within the databases. I've ran the unit tests and they're all passing now. Just need to try and figure out the integration tests - any steer or guidance on that would be much appreciated.

Quick note postgres only added support for SHA256 in Postgres 11 - not sure how much of a problem that'll be?

Hey @digitalkristurner, sorry for the late reply, one of my coworkers pointed out that since the md5 hash isn't really used for security purposes in superset but rather just to create a UUID / hash we should be able to get away with just changing the code to have a security flag set to false

md5(..., usedforsecurity=False)

this should pass fips security requirements, if the superset team agrees, I think this is the path of least resistance.

@digitalkristurner
Copy link

digitalkristurner commented Mar 21, 2024

Will that work for the MD5 reference in the PGDialect uuid_by_dialect call? I'm not sure how Postgres deals with FIPS.

@andrewzah
Copy link

andrewzah commented Mar 21, 2024

the md5 hash isn't really used for security purposes

Was this tested on a fips-enabled system? Attempting to use md5 at all should error, giving something like this

$ openssl md5 /dev/null

Error setting digest md5
140647163811744:error:060800A3:digital 

edit: this was last year so I don't have the logs anymore, but we did have runtime errors with md5, so we had to fork and replace all the md5 calls with sha256.

@digitalkristurner
Copy link

I'll build try building the container and deploying it in our dev environment.

@actionjax
Copy link
Author

the md5 hash isn't really used for security purposes

Was this tested on a fips-enabled system? Attempting to use md5 at all should error, giving something like this

$ openssl md5 /dev/null

Error setting digest md5
140647163811744:error:060800A3:digital 

ah yea, I overlooked the postgres part. I don't think there's a workaround for that unlike python. I think it will complain so you can continue the work you've already done.

@digitalkristurner
Copy link

I tried your method and it seemed to work - it ran within a FIPS environment for me. Might just need to change the two strings within the DB but otherwise everything else works as usual.

@digitalkristurner
Copy link

I'm not sure what I'm supposed to do now to share the code changes I made (considering this isn't my pull request).

@github-actions github-actions bot removed the risk:db-migration PRs that require a DB migration label Mar 27, 2024
@actionjax
Copy link
Author

I'm not sure what I'm supposed to do now to share the code changes I made (considering this isn't my pull request).

I updated the code based on your findings @digitalkristurner, does this look correct? I don't mind you forking the code and submitting the same pull request for credit

@digitalkristurner
Copy link

Yeah that looks right and then PGDialect: "uuid_in(sha256(random()::text || clock_timestamp()::text)::cstring)" in superset/migrations/shared.

@@ -21,7 +21,7 @@


def md5_sha_from_str(val: str) -> str:
return hashlib.md5(val.encode("utf-8")).hexdigest()
return hashlib.md5(val.encode("utf-8"), usedforsecurity=False).hexdigest()

Check failure

Code scanning / CodeQL

Use of a broken or weak cryptographic hashing algorithm on sensitive data High

Sensitive data (certificate)
is used in a hashing algorithm (MD5) that is insecure.
Sensitive data (certificate)
is used in a hashing algorithm (MD5) that is insecure.
Sensitive data (certificate)
is used in a hashing algorithm (MD5) that is insecure.
Sensitive data (certificate)
is used in a hashing algorithm (MD5) that is insecure.
Sensitive data (certificate)
is used in a hashing algorithm (MD5) that is insecure.
Sensitive data (certificate)
is used in a hashing algorithm (MD5) that is insecure.
Sensitive data (certificate)
is used in a hashing algorithm (MD5) that is insecure.
Sensitive data (id)
is used in a hashing algorithm (MD5) that is insecure.
Sensitive data (id)
is used in a hashing algorithm (MD5) that is insecure.
Sensitive data (id)
is used in a hashing algorithm (MD5) that is insecure.
Sensitive data (id)
is used in a hashing algorithm (MD5) that is insecure.
Sensitive data (id)
is used in a hashing algorithm (MD5) that is insecure.
Sensitive data (id)
is used in a hashing algorithm (MD5) that is insecure.
Sensitive data (id)
is used in a hashing algorithm (MD5) that is insecure.
Sensitive data (id)
is used in a hashing algorithm (MD5) that is insecure.
Sensitive data (id)
is used in a hashing algorithm (MD5) that is insecure.
Sensitive data (certificate)
is used in a hashing algorithm (MD5) that is insecure.
Copy link

codecov bot commented Mar 27, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 69.89%. Comparing base (349e496) to head (f08a6b1).
Report is 11 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master   #21014      +/-   ##
==========================================
+ Coverage   69.77%   69.89%   +0.11%     
==========================================
  Files        1911     1911              
  Lines       75056    75060       +4     
  Branches     8362     8362              
==========================================
+ Hits        52374    52461      +87     
+ Misses      20630    20547      -83     
  Partials     2052     2052              
Flag Coverage Δ
hive 48.93% <25.00%> (?)
mysql 77.91% <100.00%> (+0.02%) ⬆️
postgres 78.02% <100.00%> (+<0.01%) ⬆️
presto 53.64% <50.00%> (?)
python 83.15% <100.00%> (+0.23%) ⬆️
sqlite 77.46% <100.00%> (+<0.01%) ⬆️
unit 56.76% <50.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@github-actions github-actions bot added the risk:db-migration PRs that require a DB migration label Mar 27, 2024
@villebro
Copy link
Member

@actionjax could you update the description to reflect the current state of the PR? Also, I'm not sure we can universally declare usedforsecurity=False, as different security teams may have a different stance on what constitutes security usage:

image

Source: https://docs.python.org/3/library/hashlib.html#constructors

I'd be curious to hear other security experts chime in here. Also, I think it'd be a good idea to discuss this in the next town hall before proceeding with this change, and maybe even attaching a SIP to it. Ping @michael-s-molina

@actionjax actionjax changed the title fix: make fips compliant by replacing md5 with sha256 and random lib fix: make fips compliant by replacing md5 with sha256 in postgres and usedforsecurity=False for python md5 Mar 27, 2024
@digitalkristurner
Copy link

It's not a lot of work changing it completely to SHA256 - I got that building and the unit tests worked (just had to change the set variables). I never got the integration tests to run properly so can't confirm that.

@codecov-commenter
Copy link

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 69.89%. Comparing base (349e496) to head (f08a6b1).
Report is 124 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master   #21014      +/-   ##
==========================================
+ Coverage   69.77%   69.89%   +0.11%     
==========================================
  Files        1911     1911              
  Lines       75056    75060       +4     
  Branches     8362     8362              
==========================================
+ Hits        52374    52461      +87     
+ Misses      20630    20547      -83     
  Partials     2052     2052              
Flag Coverage Δ
hive 48.93% <25.00%> (?)
mysql 77.91% <100.00%> (+0.02%) ⬆️
postgres 78.02% <100.00%> (+<0.01%) ⬆️
presto 53.64% <50.00%> (?)
python 83.15% <100.00%> (+0.23%) ⬆️
sqlite 77.46% <100.00%> (+<0.01%) ⬆️
unit 56.76% <50.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
need:community-volunteer Requires a community volunteer review:draft risk:db-migration PRs that require a DB migration size/S
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants