Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update licenses.json and parsing without xml conversion #94

Merged
merged 6 commits into from
Dec 9, 2024

Conversation

lhns
Copy link
Contributor

@lhns lhns commented Dec 6, 2024

Hi, I have done a few updates to this plugin prior to it being transferred to the sbt org and now I want to start upstreaming these changes now that the repository is active again.

This first change allows to update the now quite outdated licenses.json file without the manual xml conversion step via some random website.

I used circe to parse the json files. I don't know if you are willing to add it as a dependency.

These changes are not source compatible. Also the new license file changes "GPL-2.0" to "GPL-2.0-or-later".

@lhns lhns mentioned this pull request Dec 6, 2024
Copy link
Contributor

@raboof raboof left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thanks! Added some minor comments.

As for compatibility, that seems fine to me (#92).

I'm not sure what you mean about changing GPL-2.0 to GPL-2.0-or-later - don't they still have both?

Comment on lines 21 to 24
private def normalizeUrl(url: String): String = url.toLowerCase
.replaceFirst("^https://", "http://")
.replaceFirst("\\.html$", "")
.replaceFirst("\\.txt$", "")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this safe? Wouldn't this risk changing the URL in a way that might not resolve anymore?

If we do do this, wouldn't it be nicer to normalize towards using https instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The normalization is really only used for comparing urls. Sadly the licenses that sbt returns often don't match with the urls from the license.json. using these three replacements got me really far to match most license ids. Yes https would probably be better even if it doesn't really matter that much.


def findByUrl(url: String): Option[License] = licensesByUrl.get(url)
def findByNormalizedUrl(url: String): Option[License] = licensesByNormalizedUrl.get(normalizeUrl(url))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need this method? It seems only to be used in tests, or am I misunderstanding that?


licenses.xml is generated from original licenses.json using online xml converter:
http://convertjson.com/json-to-xml.htm
https://github.com/spdx/license-list-data/blob/main/json/licenses.json
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 much nicer

@lhns
Copy link
Contributor Author

lhns commented Dec 6, 2024

Nice, thanks! Added some minor comments.

As for compatibility, that seems fine to me (#92).

I'm not sure what you mean about changing GPL-2.0 to GPL-2.0-or-later - don't they still have both?

Just checked and you are right. There are both but it searches them by url and there are multiple licenses with the GPL-2.0 url and currently it matches the GPL-2.0-or-later. I am currently thinking of ways to fix this. Is this even harmful?

@raboof
Copy link
Contributor

raboof commented Dec 6, 2024

There are both but it searches them by url and there are multiple licenses with the GPL-2.0 url and currently it matches the GPL-2.0-or-later. I am currently thinking of ways to fix this. Is this even harmful?

That's what I was wondering as well - it sounds dangerous to have a method that may return GPL-2.0-or-later for an artifact that's actually GPL-2.0, but I got the impression that this method is not actually used anywhere - so perhaps we should just remove it? If we do want to keep it, perhaps we should make it return a list, so we make it the callers' problem to decide when there are multiple matches.

@lhns
Copy link
Contributor Author

lhns commented Dec 9, 2024

I have made the license matching unambiguous. Tbh I don't really like the normalization steps but they catch a whole lot of common variations in license naming.

@raboof
Copy link
Contributor

raboof commented Dec 9, 2024

I have made the license matching unambiguous.

👍

Tbh I don't really like the normalization steps but they catch a whole lot of common variations in license naming.

Yeah, data quality is a mess out there, anything we can do to (safely) clean things up is nice to have.

CI is saying scalafix prefers _ over *:

[error] --- /home/runner/work/sbt-sbom/sbt-sbom/src/main/scala/com/github/sbt/sbom/licenses/LicensesArchiveJsonParser.scala
[error] +++ <expected fix>
[error] @@ -2,7 +2,7 @@
[error]  
[error]  import io.circe.Decoder
[error]  import io.circe.generic.semiauto.deriveDecoder
[error] -import io.circe.parser.*
[error] +import io.circe.parser._
[error]  
[error]  import scala.util.control.NonFatal
[error]  
[error] (scalafixAll) scalafix.sbt.ScalafixFailed: TestError
[error] Total time: 13 s, completed Dec 9, 2024 10:38:39 AM

@lhns
Copy link
Contributor Author

lhns commented Dec 9, 2024

CI is saying scalafix prefers _ over *:
I have to fix my IntelliJ settings :)

@raboof raboof merged commit 7b63539 into sbt:main Dec 9, 2024
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants