v32.0.0rc1 #3217
pombredanne
announced in
Announcements
v32.0.0rc1
#3217
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
This is a major new release with API breaking changes.
v32.0.0rc1 is the first release candidate and we expect to have a few more.
Important API changes:
This is a major release with major API and output format changes and significant
feature updates.
In particular changed to the output format for the licenses and packages, and
we changed some of the command line options.
The output format version is now 3.0.0.
Package detection:
Update
GemfileLockParser
to track the gem which the Gemfile.lock is for,which we assign to the new
GemfileLockParser.primary_gem
field. UpdateGemfileLockHandler.parse()
to handle the case where there is a primary gemdetected from a gemfile.lock. If there is a primary gem, a single
Package
is created and the detected gem data within the gemfile.lock are assigned as
dependencies. If there is no primary gem, then all of the dependencies are
collected into Package with no name and yielded.
Repeated package and dependency results when scanning extracted rubygem #3072
Fix issue where dependencies were not reported when scanning an extracted
Python project by modifying
BaseExtractedPythonLayout.assemble()
to favorusing package data from a PKG-INFO file from an egg-info directory. Package
data from a PKG-INFO file from an egg-info directory contains the dependency
information collected from the requirements.txt file along side PKG-INFO.
No dependency results when scanning celery-5.2.7.tar.gz #3083
Fix issue where we were returning incorrect purl package
type
for cocoapods.pods
was being returned as a purl type for cocoapods, it should becocoapods
instead.https://github.com/package-url/purl-spec/blob/master/PURL-TYPES.rst#cocoapods
Incorrect purl type for cocoapods #3081
Code for parsing a Maven POM, npm package.json, freebsd manifest and haxelib
JSON have been separated into two functions: one that creates a PackageData
object from the parsed Resource, and another that calls the previous function
and yields the PackageData. This was done such that we can use the package
manifest data parsing code outside of the scancode-toolkit context in other
libraries.
License detection:
The SPDX license list has been updated to the latest v3.19
This is a major update to license detection where we now combine one or more
license matches in a larger license detection. This approach improves the
accuracy of license detection and removes a larger number of false positive
or ambiguous license detections. See for details
RFC: a plan for false positive license detection #2878
There is a new
license_detections
codebase level attribute with all theunique license detections in the whole scan, both in resources and packages.
This has the 3 attributes also present in package/resource level license
detections:
license_expression
,matches
anddetection_log
and hastwo additional attributes:
identifier
: which is thelicense_expression
with an UUID created outof the detection contents and is the same for same detections.
count
: Number of times in the codebase this unique license detectionwas encountered.
The data structure of the JSON output has changed for licenses at file level:
The
licenses
attribute is deleted.A new
for_license_detections
attribute is aded which references the codebaselevel unique license detections, and this is a list of
identifer
strings fromthe codebase level license detections it references.
A new
license_detections
attribute contains license detections in that file.This object has three attributes:
license_expression
,detection_log
and
matches
.matches
is a list of license matches and is roughlythe same as
licenses
in the previous version with additional structurechanges detailed below.
A new attribute
license_clues
contains license matches with thesame data structure as the
matches
attribute inlicense_detections
.This contains license matches that are mere clues and where not considered
to be a proper conclusive license detection.
The
license_expressions
list of license expressions is deleted andreplaced by a
detected_license_expression
single expression.Similarly
spdx_license_expressions
was removed and replaced bydetected_license_expression_spdx
.See
license updates documentation <https://scancode-toolkit.readthedocs.io/en/latest/explanations/license-detection-reference.html#change-in-license-data-format-resource>
_for examples and details.
The data structure of license attributes in
package_data
and the codebaselevel
packages
has been updated accordingly:There is a new
license_detections
attribute for the primary, top-leveldeclared licenses of a package and an
other_license_detections
attributefor the other secondary detections.
The
license_expression
is replaced by thedeclared_license_expression
and
other_license_expression
attributes with their SPDX counterpartsdeclared_license_expression_spdx
andother_license_expression_spdx
.These expressions are parallel to detections.
The
declared_license
attribute is renamedextracted_license_statement
and is now a YAML-encoded string.
See
license updates documentation <https://scancode-toolkit.readthedocs.io/en/latest/explanations/license-detection-reference.html#change-in-license-data-format-package>
_for examples and details.
The license matches structure has changed: we used to report one match for each
license
key
of a matched license expression. We now report instead onesingle match for each matched license expression, and list the license keys
as a
licenses
attribute. This avoids data duplication.Inside each match, we list each match and matched rule attributred directly
avoiding nesting. See
license updates doc <https://scancode-toolkit.readthedocs.io/en/latest/explanations/license-detection-reference.html#licensematch-result-data>
_for examples and details.
There are new and codebase level attributes default with
--licenses
to reportreference license metadata and texts once for each license matched across the
scan; we now have two codebase level attributes:
license_references
andlicense_rule_references
that list unique detected license and license rules.for examples and details. This reference data is also removed from license matches
in all levels i.e. from codebase, package and resource level license detections and
resource level license clues.
See
license updates documentation <https://scancode-toolkit.readthedocs.io/en/latest/explanations/license-detection-reference.html#comparision-before-after-license-references>
_We replaced the
scancode --reindex-licenses
command line option with anew separate command named
scancode-reindex-licenses
.The
--reindex-licenses-for-all-languages
CLI option is also moved tothe
scancode-reindex-licenses
command as an option--all-languages
.We can now detect licenses using custom license texts and license rules
stored in a directory or packaged as a plugin for consistent reuse and deployment.
There is an
--additional-directory
option with thescancode-reindex-licenses
command to add the licenses from a directory.
There is also a
--only-builtin
option to use ony builtin licensesignoring any additional license plugins.
See Add support for "extra", e.g. private or local licenses #480 for more details.
We combined the licensedata file and text file of each license in a single
file with a .LICENSE extension. The .yml data file is now included at the
top of each .LICENSE file as "YAML frontmatter". The same applies to license
rules and their .RULE and .yml files. This halves the number of data files
from about 60,000 to 30,000. Git line history is preserved for the combined
text + yml files.
There is a new console script
scancode-license-data
to exportlicense data in JSON, YAML and HTML, with indexes and a static website for use
in the licensedb web site. This becomes the API way to getr scancode license
data.
See Add a command line option to dump the license data #2738
The deprecated "--is-license-text" option has been removed.
This is now built-in with the --license-text option and --info
and exposed with the "percentage_of_license_text" attribute.
All Changes
scancode-reindex-licenses
subcommand instead of using--reindex-licenses
flag by @abhi-kr-2100 in Fix issue 3155 by runningscancode-reindex-licenses
subcommand instead of using--reindex-licenses
flag #3159New Contributors
scancode-reindex-licenses
subcommand instead of using--reindex-licenses
flag #3159Full Changelog: v31.2.4...v32.0.0rc1
This discussion was created from the release v32.0.0rc1.
Beta Was this translation helpful? Give feedback.
All reactions