-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
update licenses.json and parsing without xml conversion #94
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, thanks! Added some minor comments.
As for compatibility, that seems fine to me (#92).
I'm not sure what you mean about changing GPL-2.0
to GPL-2.0-or-later
- don't they still have both?
private def normalizeUrl(url: String): String = url.toLowerCase | ||
.replaceFirst("^https://", "http://") | ||
.replaceFirst("\\.html$", "") | ||
.replaceFirst("\\.txt$", "") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this safe? Wouldn't this risk changing the URL in a way that might not resolve anymore?
If we do do this, wouldn't it be nicer to normalize towards using https instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The normalization is really only used for comparing urls. Sadly the licenses that sbt returns often don't match with the urls from the license.json. using these three replacements got me really far to match most license ids. Yes https would probably be better even if it doesn't really matter that much.
|
||
def findByUrl(url: String): Option[License] = licensesByUrl.get(url) | ||
def findByNormalizedUrl(url: String): Option[License] = licensesByNormalizedUrl.get(normalizeUrl(url)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need this method? It seems only to be used in tests, or am I misunderstanding that?
|
||
licenses.xml is generated from original licenses.json using online xml converter: | ||
http://convertjson.com/json-to-xml.htm | ||
https://github.com/spdx/license-list-data/blob/main/json/licenses.json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 much nicer
Just checked and you are right. There are both but it searches them by url and there are multiple licenses with the GPL-2.0 url and currently it matches the GPL-2.0-or-later. I am currently thinking of ways to fix this. Is this even harmful? |
That's what I was wondering as well - it sounds dangerous to have a method that may return |
I have made the license matching unambiguous. Tbh I don't really like the normalization steps but they catch a whole lot of common variations in license naming. |
👍
Yeah, data quality is a mess out there, anything we can do to (safely) clean things up is nice to have. CI is saying scalafix prefers
|
|
Hi, I have done a few updates to this plugin prior to it being transferred to the sbt org and now I want to start upstreaming these changes now that the repository is active again.
This first change allows to update the now quite outdated licenses.json file without the manual xml conversion step via some random website.
I used circe to parse the json files. I don't know if you are willing to add it as a dependency.
These changes are not source compatible. Also the new license file changes "GPL-2.0" to "GPL-2.0-or-later".