-
Notifications
You must be signed in to change notification settings - Fork 13.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: make fips compliant by replacing md5 with sha256 in postgres and usedforsecurity=False for python md5 #21014
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Congrats on making your first PR and thank you for contributing to Superset! 🎉 ❤️
We hope to see you in our Slack community too!
superset/key_value/utils.py
Outdated
md5_obj = md5() | ||
md5_obj.update(seed.encode("utf-8")) | ||
return UUID(md5_obj.hexdigest()) | ||
rd = random.Random() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't this break existing namespaces? It seems that this logic was intended to keep UUIDs deterministic based on some input
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, at first I thought the uuid was only used in cache, looks like that hash is stored in the db. To address your concern about deterministic I've replaced random with sha256. Will look to see how we can make this backwards compatible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you would need to make a migration that updates all existing entries from the MD5 based UUIDs to using SHA256 as the base. I think it should be doable, but we do need to be very careful with this. Ping @michael-s-molina , do you have any concerns here?
superset/utils/hashing.py
Outdated
@@ -21,7 +21,7 @@ | |||
|
|||
|
|||
def md5_sha_from_str(val: str) -> str: | |||
return hashlib.md5(val.encode("utf-8")).hexdigest() | |||
return hashlib.sha256(val.encode("utf-8")).hexdigest() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like there's a few potential issues with this. Pls provide some tests that validate backwards compatibility.
WIth the recent release of 3.0.0 was hoping to see this as a lot of orgs run on FIPS compliant environments and this prevents superset from operating in those environments. |
Seems there are some loose threads here... is anyone planning to pick this back up, or should we close it out? |
@actionjax it should be easy to make this configurable, thus keeping backwards compatibility. Let me know if you need help implementing it. |
Hey @villebro, unfortunately I do not have the cycles to look and implement this currently. I would recommend someone else taking over. |
Converting this to draft mode, rather than closing this. In due time, if nobody picks it up, we might close it out, but I remain hopeful. I'll close and re-open it just to jump-start CI and see how that plays out. |
Ping @ccravens if you're interested in adding optional support for FIPS compliance |
I would be interested in working on this. I think Superset is a great project and would be really helpful in other projects I'm working on. |
@digitalkristurner fantastic! We'd love to see that. |
Hey all, changed all the references from MD5 to SHA256 and then checked and corrected for all the possible overflows within the databases. I've ran the unit tests and they're all passing now. Just need to try and figure out the integration tests - any steer or guidance on that would be much appreciated. Quick note postgres only added support for SHA256 in Postgres 11 - not sure how much of a problem that'll be? |
Hey @digitalkristurner, sorry for the late reply, one of my coworkers pointed out that since the md5 hash isn't really used for security purposes in superset but rather just to create a UUID / hash we should be able to get away with just changing the code to have a security flag set to false
this should pass fips security requirements, if the superset team agrees, I think this is the path of least resistance. |
Will that work for the MD5 reference in the PGDialect uuid_by_dialect call? I'm not sure how Postgres deals with FIPS. |
Was this tested on a fips-enabled system? Attempting to use md5 at all should error, giving something like this
edit: this was last year so I don't have the logs anymore, but we did have runtime errors with md5, so we had to fork and replace all the md5 calls with sha256. |
I'll build try building the container and deploying it in our dev environment. |
ah yea, I overlooked the postgres part. I don't think there's a workaround for that unlike python. I think it will complain so you can continue the work you've already done. |
I tried your method and it seemed to work - it ran within a FIPS environment for me. Might just need to change the two strings within the DB but otherwise everything else works as usual. |
I'm not sure what I'm supposed to do now to share the code changes I made (considering this isn't my pull request). |
880da14
to
be985e9
Compare
I updated the code based on your findings @digitalkristurner, does this look correct? I don't mind you forking the code and submitting the same pull request for credit |
Yeah that looks right and then |
@@ -21,7 +21,7 @@ | |||
|
|||
|
|||
def md5_sha_from_str(val: str) -> str: | |||
return hashlib.md5(val.encode("utf-8")).hexdigest() | |||
return hashlib.md5(val.encode("utf-8"), usedforsecurity=False).hexdigest() |
Check failure
Code scanning / CodeQL
Use of a broken or weak cryptographic hashing algorithm on sensitive data High
Sensitive data (certificate)
Sensitive data (certificate)
Sensitive data (certificate)
Sensitive data (certificate)
Sensitive data (certificate)
Sensitive data (certificate)
Sensitive data (certificate)
Sensitive data (id)
Sensitive data (id)
Sensitive data (id)
Sensitive data (id)
Sensitive data (id)
Sensitive data (id)
Sensitive data (id)
Sensitive data (id)
Sensitive data (id)
Sensitive data (certificate)
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #21014 +/- ##
==========================================
+ Coverage 69.77% 69.89% +0.11%
==========================================
Files 1911 1911
Lines 75056 75060 +4
Branches 8362 8362
==========================================
+ Hits 52374 52461 +87
+ Misses 20630 20547 -83
Partials 2052 2052
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
@actionjax could you update the description to reflect the current state of the PR? Also, I'm not sure we can universally declare Source: https://docs.python.org/3/library/hashlib.html#constructors I'd be curious to hear other security experts chime in here. Also, I think it'd be a good idea to discuss this in the next town hall before proceeding with this change, and maybe even attaching a SIP to it. Ping @michael-s-molina |
It's not a lot of work changing it completely to SHA256 - I got that building and the unit tests worked (just had to change the set variables). I never got the integration tests to run properly so can't confirm that. |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #21014 +/- ##
==========================================
+ Coverage 69.77% 69.89% +0.11%
==========================================
Files 1911 1911
Lines 75056 75060 +4
Branches 8362 8362
==========================================
+ Hits 52374 52461 +87
+ Misses 20630 20547 -83
Partials 2052 2052
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
fix(superset): allows superset to run on fips compliance machines
SUMMARY
added usedforsecurity=False for md5(..) methods and replaced postgres md5(...) with sha256(..)
BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
TESTING INSTRUCTIONS
Run a fips complaint Linux VM (we used Rocky8) but any Linux with fips enabled should work.
ADDITIONAL INFORMATION