LicenseScout is a tool to identify third-party artifacts (libraries) and their licenses, in Java as well as JavaScript projects. The goal is to get an overview over the used licenses, and the artifacts for which no license could be detected.
Key features include:
-
Scanning Java and Javascript/NPM artifacts
-
Writing reports in different file formats
-
Writing results to a database
-
Defining licenses manually for certain artifacts
-
Exclude artifacts completely
-
You need a working Maven 3 installation
-
check out all LicenseScout projects
-
cd org.aposin.licensescout.quickstart
-
mvn clean install
-
The reports are written to the directory
org.aposin.licensescout.licensereport/target
-
HTML report:
licenses_org.aposin.licensescout.licensereport.html
-
CSV report:
licenses_org.aposin.licensescout.licensereport.csv
-
TXT report:
licenses_org.aposin.licensescout.licensereport.txt
-
artifactId | Purpose |
---|---|
|
Parent POM for all other LicenseScout projects |
|
Code for LicenseScout maven plugin |
|
Sample configuration files |
|
Sample for LicenseScout build integration, generates license reports for LicenseScout itself (using the sample configuration) |
|
Compiles the Maven plugin, creates the sample configuration bundle and generates a license report for LicenseScout itself. |
LicenseScout is implemented as a Maven plugin. It scans a given base directory for artifacts.
Initialisation includes:
-
Validity checks on the input parameters
-
Creation of output directory, if necessary
-
Reading of configuration files (licenses, mappings, filters, etc)
The configured base directory is scanned recursively, either for Java archives or for NPM packages. The archives found are collected in a list. If license information associated with an archive is found, the license is added to the list associated with the archive. During this step a mapping of license URLs to license names is done (so that licenses given as URL can be found from the internal list).
Processings are done:
-
Manually checked licenses for specific archives are added to the archives from the checked archives list.
-
a license status for each archive is calculated.
-
Global filters are applied: artifact names are matched against a global filter that applies regular expressions on the artifact path name. Artifacts that match are removed.
-
The vendor filter is applied, using the configured vendor names to filter out: artifacts with a certain vendor name in the
MANIFEST.MF
are removed (this can be used for own artifacts, which should not appear at all). -
Clean output filtering is applied, if configured
-
Statistics on the total number of archives, the total number of considered license files and the number of archives per license state are calculated.
From the collected, processed and filtered information reports are generated, depending on the output configuration. Either a HTML report, a CSV report, a TXT report, or a combination of these are generated.
The CSV Output is suitable for automatic comparison between versions.
The HTML output uses velocity templates to generate the output. The template used for this is org.aposin.licensescout/src/main/resources/templates/license_report.vm
.
The TXT output uses hard-coded text fragments.
LicenseScout is configured as a Maven plugin. Below you can see a typical configuration.
<plugin>
<groupId>org.aposin.licensescout</groupId>
<artifactId>licensescout-maven-plugin</artifactId>
<version>1.1.4</version>
<executions>
<execution>
<id>find-licenses</id>
<phase>verify</phase>
<goals>
<goal>scanJava</goal>
</goals>
<configuration>
<scanDirectory>${project.build.directory}/products/my.product/win32/win32/x86/plugins/</scanDirectory>
<outputDirectory>${licensescout.outputDirectory}</outputDirectory>
<outputs>
<output>
<type>HTML</type>
<filename>${licensescout.outputFilename.html}</filename>
<url>${licensereport.url.html}</url>
</output>
<output>
<type>CSV</type>
<filename>${licensescout.outputFilename.csv}</filename>
<url>${licensereport.url.csv}</url>
</output>
<output>
<type>TXT</type>
<filename>${licensescout.outputFilename.txt}</filename>
<url>${licensereport.url.txt}</url>
</output>
</outputs>
<licensesFilename>${licensescout-configuration.dir}/licenses.xml</licensesFilename>
<providersFilename>${licensescout-configuration.dir}/providers.xml</providersFilename>
<noticesFilename>${licensescout-configuration.dir}/notices.xml</noticesFilename>
<checkedArchivesFilename>${licensescout-configuration.dir}/checkedarchives.csv</checkedArchivesFilename>
<licenseUrlMappingsFilename>${licensescout-configuration.dir}/urlmappings.csv</licenseUrlMappingsFilename>
<licenseNameMappingsFilename>${licensescout-configuration.dir}/namemappings.csv</licenseNameMappingsFilename>
<globalFiltersFilename>${licensescout-configuration.dir}/globalfilters.csv</globalFiltersFilename>
<filteredVendorNamesFilename>${licensescout-configuration.dir}/filteredvendornames.csv</filteredVendorNamesFilename>
</configuration>
</execution>
</executions>
</plugin>
This section describes the configuration for the LicenseScout maven plugin that is done in the POM file where you want to execute the LicenseScout. For information on the format of configuration files see the next section.
In one execution, LicenseScout can either scan for Java artifacts or for Javascript/NPM artifacts.
-
For Java executions, the Maven goal
scanJava
is used. -
For Javascript/NPM executions the Maven goal
scanNpm
is used.
The base directory where archives are searched for (recursively and also inside JARs) is configured by the parameter scanDirectory
.
Output can be configured with the output directory (parameter outputDirectory
) and output types.
The actual output files to generate are configured by the output types. the available output types are 'HTML', 'CSV' and 'TXT'. One or multiple output types can be configured. If no output type is configured, no output files will be written.
File names of the output files are the result of appending the filename given by parameters filename
in the output
section to the output directory. An additional parameter url
can be used to specify the URL associated with the output that will be written to the database, if enabled.
Example:
<properties>
<licensescout.outputDirectory>...<licensescout.outputDirectory>
licensescout.outputFilename.html TODO
</properties>
...
<configuration>
...
<outputDirectory>${licensescout.outputDirectory}</outputDirectory>
<outputs>
<output>
<type>HTML</type>
<filename>${licensescout.outputFilename.html}</filename>
</output>
<output>
<type>CSV</type>
<filename>${licensescout.outputFilename.csv}</filename>
</output>
<output>
<type>TXT</type>
<filename>${licensescout.outputFilename.txt}</filename>
</output>
</outputs>
...
</configuration>
Vendor names that should be used to filter out archives from the result list can be configured either directly with the parameter filteredVendorNamesFilename
or with a configuration file (see below). Example:
<configuration>
...
<filteredVendorNames>
<filteredVendorName>My company</filteredVendorName>
<filteredVendorName>Another Company</filteredVendorName>
</filteredVendorNames>
...
</configuration>
If no vendor names are configured, no filtering by vendor name takes place.
Directory names that should be ignored in scanning for NPM modules can be configured using the parameter npmExcludedDirectoryNames
/ npmExcludedDirectoryName
. Example:
<configuration>
...
<npmExcludedDirectoryNames>
<npmExcludedDirectoryName>.bin</npmExcludedDirectoryName>
<npmExcludedDirectoryName>@angular</npmExcludedDirectoryName>
<npmExcludedDirectoryName>@ngtools</npmExcludedDirectoryName>
<npmExcludedDirectoryName>@types</npmExcludedDirectoryName>
</npmExcludedDirectoryNames>
...
</configuration>
If no excludes are given, no directories are excluded.
LicenseScout accesses an external Maven repository to download parent POM files if it is necessary to find out license information. The base URL used for this can be configured.
In an enterprise environment, this can be used to point to an artifact server like Nexus that mirrors the Maven central repository.
Example:
<configuration>
...
<nexusCentralBaseUrl>http://nexus.company.com:8081/nexus/content/repositories/central/</nexusCentralBaseUrl>
...
</configuration>
Note
|
If no Maven central URL is given, the default is to access Maven Central directly (value https://repo.maven.apache.org/maven2/ ).
|
The resulting output list of archives can be filtered to remove archives with certain legal state or certain licenses. A list of
legal states to filter out can be given with cleanOutputLegalStates
/ cleanOutputLegalState
. Any archive that has one of the states given will be filtered out from the result list. Also, a list of license identifiers can be given with cleanOutputLicenseSpdxIdentifiers
/ cleanOutputLicenseSpdxIdentifier
. These values are matched against the SPDX identifiers given as spdxIdentifier
in the license XML file (see below). Any archive that contains one of the licenses given will be filtered out.
The filtering can be activated and deactivated with a switch (cleanOutputActive
) with values true
or false
. Example:
<configuration>
...
<cleanOutputActive>true</cleanOutputActive>
<cleanOutputLegalStates>
<cleanOutputLegalState>NOT_ACCEPTED</cleanOutputLegalState>
<cleanOutputLegalState>CONFLICTING</cleanOutputLegalState>
</cleanOutputLegalStates>
<cleanOutputLicenseSpdxIdentifiers>
<cleanOutputLicenseSpdxIdentifier>WTFPL</cleanOutputLicenseSpdxIdentifier>
</cleanOutputLicenseSpdxIdentifiers>
...
</configuration>
If cleanOutputActive
is not configured or if no states or licenses to filter out are configured, no filtering takes place.
The resulting output files (HTML, CSV and TXT) can be configured to contain or not contain specific Information.
The documentation URL from the checked licenses list can be used in the output report. This can be activated with a switch (showDocumentationUrl
) with values true
or false
. Example:
<configuration> ... <showDocumentationUrl>true</showDocumentationUrl> ... </configuration>
If showDocumentationUrl
is not configured the documentation URL is included into the output.
LicenseScout can use a database to write core information of the report to.
With the parameter writeResultsToDatabase
writing to the database can be enabled or disabled.
The parameter writeResultsToDatabaseForSnapshotBuilds
determines if records should be written to the database also for snapshot versions. If the value is not true, version numbers (taken from the parameter buildVersion
) that contain -SNAPSHOT
are not processed further.
The record resultDatabaseConfiguration
with the parameter jdbcUrl
, username
and password
is used to configure the target database.
If writing to the result database is enabled, further parameters are used to obtain information to write to the database. There are parameters for the build name, the build version, the build URL and (inside output
) for the URLs of the output files.
Example:
<properties>
<licensescout.writeResultsToDatabase>true</licensescout.writeResultsToDatabase>
<licensescout.database.url>...</licensescout.database.url>
<licensescout.database.username>...</licensescout.database.username>
<licensescout.database.password>...</licensescout.database.password>
<licensescout.buildName>${project.artifactId}</licensescout.buildName>
<licensescout.buildVersion>${project.version}</licensescout.buildVersion>
<licensescout.buildUrl>...</licensescout.buildUrl>
</properties>
...
<configuration>
...
<writeResultsToDatabase>${licensescout.writeResultsToDatabase}</writeResultsToDatabase>
<writeResultsToDatabaseForSnapshotBuilds>false</writeResultsToDatabaseForSnapshotBuilds>
<resultDatabaseConfiguration>
<jdbcUrl>${licensescout.database.url}</jdbcUrl>
<username>${licensescout.database.username}</username>
<password>${licensescout.database.password}</password>
</resultDatabaseConfiguration>
<buildName>${licensescout.buildName}</buildName>
<buildVersion>${licensescout.buildVersion}</buildVersion>
<buildUrl>${licensescout.buildUrl}</buildUrl>
...
</configuration>
It is recommended to configure the values for username
and password
via settings.xml
.
LicenseScout can use eight configuration files for
-
licenses
-
providers
-
notices
-
manually checked archives
-
mappings names to licenses
-
mappings of URLs to licenses
-
global filters on archives
-
vendor names to filter out (vendor names can be configured both via XML or via configuration file)
The following sections describe the file Format and the effect of the configurations. The filenames of the files are configured using the following Maven parameters:
-
licensesFilename
-
providersFilename
-
noticesFilename
-
checkedArchivesFilename
-
licenseUrlMappingsFilename
-
licenseNameMappingsFilename
-
globalFiltersFilename
-
filteredVendorNamesFilename
(for an example see above)
Known licenses, their URLs and associated detection strings are configured using an XML file. The filename is configured using the Maven Parameter licensesFilename
.
Example of the file:
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<licenses>
<license id='AFL-1.1'>
<spdxIdentifier>AFL-1.1</spdxIdentifier>
<name>Academic Free License</name>
<legalStatus>ACCEPTED</legalStatus>
<author>Lawrence E. Rosen</author>
<version>1.1</version>
<publicUrl>https://spdx.org/licenses/AFL-1.1.html</publicUrl>
</license>
<license id='AFL-1.2'>
<spdxIdentifier>AFL-1.2</spdxIdentifier>
<name>Academic Free License</name>
<legalStatus>ACCEPTED</legalStatus>
<author>Lawrence E. Rosen</author>
<version>1.2</version>
<publicUrl>https://spdx.org/licenses/AFL-1.2.html</publicUrl>
</license>
<license id='AFL-2.0'>
<spdxIdentifier>AFL-2.0</spdxIdentifier>
<name>Academic Free License</name>
<legalStatus>ACCEPTED</legalStatus>
<author>Lawrence E. Rosen</author>
<version>2.0</version>
<publicUrl>https://spdx.org/licenses/AFL-2.0.html</publicUrl>
</license>
<license id='AFL-2.1'>
<spdxIdentifier>AFL-2.1</spdxIdentifier>
<name>Academic Free License</name>
<legalStatus>ACCEPTED</legalStatus>
<author>Lawrence E. Rosen</author>
<version>2.1</version>
<publicUrl>https://spdx.org/licenses/AFL-2.1.html</publicUrl>
</license>
<license id='AFL-3.0'>
<spdxIdentifier>AFL-3.0</spdxIdentifier>
<name>Academic Free License</name>
<legalStatus>ACCEPTED</legalStatus>
<author>Lawrence E. Rosen</author>
<version>3.0</version>
<publicUrl>https://spdx.org/licenses/AFL-3.0.html</publicUrl>
<notice>AFL-Notice-3.0</notice>
</license>
<licenseSet>
<license idref='AFL-1.1' />
<license idref='AFL-1.2' />
<license idref='AFL-2.0' />
<license idref='AFL-2.1' />
<license idref='AFL-3.0' />
<detectionString>ACADEMIC FREE LICENSE</detectionString>
</licenseSet>
...
</licenses>
Each license should be given as a license
element. Also different versions of a license should be given as separate license
elements.
The id
attribute of license
is mandatory, it is used to refer to the license in licenseSet`s. Usually, the value of the `id
attribute should be identical to the SPDX identifier of the license. However, the id
attributes are only used for referencing in the XML file internally. So an id
attribute can be used even if the license has no SPDX identifier.
-
The
spdxIdentifier
element is optional. However, it is recommended to assign a value even if the license has no actual SPDX identifier. The reason for this is that in CSV output the licenses are given by their SPDX identifier. If a license has no identifier a blank field will appear. The values are -
The value of the
name
element is only used for displaying the license, not for automatic detection (for detectionlicenseSet
/detectionString
is used). -
The value of
name
should not be empty, as it is used to sort licenses in the output reports. -
The
legalStatus
reflects if a license is acceptable for the given project. The value can beACCEPTED
,NOT_ACCEPTED
orUNKNOWN
(see enumerationorg.aposin.licensescout.license.LegalStatus
). The value is mandatory. -
The
author
element gives the name of the person or organisation that published the license. The value may be empty. -
The
version
element gives the version of the license. The value may be empty. As the value ofversion
is appended to the name in the report output, usually the name should not contain a version number. The version number is also used in automatic detection to distinguish different versions of a license. -
The
publicUrl
should be an URL that leads to a readable license text, as this URL is used in the HTML output for links underlying the license. On the other hand,secondaryUrl
values are not required to be actually accessible. They are only used to associate licenses with that URL. -
A
notice
element contains an ID of a notice from the notices XML file. This is optional.
Tip
|
For automatic detection, different versions of the same licenses can be grouped to license sets. A license set has one or more associated detection strings. If a detection string of a license set is found in a text file that may be a potential license file, the mechanism tries to detect a version number from the file. If a version number is found and matches the version string of one of the licenses of the set, the file is recognized as that dedicated version of the license. Otherwise the first license of the set is recognized. Note that for special detection behaviour a license can be member of multiple license sets with different detection strings (though this case is not very common). Detection strings are matched case-insensitive against potential license text. |
Providers with their name and URL are configured using an XML file. The filename is configured using the Maven Parameter providersFilename
.
Example of the file:
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<providers>
<provider id='EclipseFoundation'>
<name>Eclipse Foundation</name>
<url>https://www.eclipse.org/</url>
</provider>
</providers>
Notices are pieces that a license requires to be published with a software that uses a third-party software under this license. Example of the file:
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<notices>
<notice id='EPL-1.0'>
<text>Notice for EPL 1</text>
</notice>
<notice id='EPL-2.0'>
<text>Notice for EPL 2</text>
</notice>
<notice id='MIT-1'>
<text>Notice for MIT 1</text>
</notice>
...
</notices>
Each notice should be given as a notice
element. Also different versions of a license should be given as separate notice
element.
The id
attribute of notice
is mandatory, it is used to refer to the notice from licenses (in the licenses XML file) and checked archives (in the checked archives CSV file).
Here, archives that have no license detected automatically can be assigned a licenses that has been checked manually.
It also can be used to decide between multiple detected licenses.
An archive can be identified by either:
-
an archive name (exact match) and a version number
-
an archive name (exact match) and a hash code
-
a regular expression that is matched against the archive name
-
a regular expression that is matched against the path of the archive
From the file, lines are split by the character ','.
First colum (type) - can be:
-
JAVA
for Java Jar archives (packed or unpacked) -
JAVASCRIPT
for JS/NPM packages
The second column is the name. The name is used as:
-
a regular expression on the archive’s path if it starts with
==
(which are not part of the regular expression) -
a regular expression on the Archive Name if it starts with
=
(which are not part of the regular expression) -
an archive name that is matched exactly otherwise
The third column is either a version number or a hash code. If the length of the field is exactly 64 characters, it is parsed as an SHA-256
hash value. Otherwise, it is taken as a version number.
The fourth column is string that is used as documentation URL in the output reports (if the Output configuration enables outputting this Information, see showDocumentationUrl
Maven parameter). The value may be empty.
The fifth column is an identifier of a provider. This is optional. The value may be empty.
The sixth column is an identifier of a notive. This is optional. The value may be empty.
The seventh and any further column are license identifiers.
An archive can have one multiple or no license assigned. If no license is assigned, it will get the status MANUALLY_NOT_DETECTED
.
Empty lines and lines starting with '#' are ignored. Examples:
JAVA, bcprov-ext-jdk15on-155.jar, 2FBFC48DA088C1223ADB84A928ABEA4083C2702F4C06CC9692736627DD50C59B,http://dummy,,, MIT JAVA, xpp3_min.jar, 8D60778CD5018E7A130B3FB6C96A57DD9E1877B9EFBF76B4B63A8DD395128EAEhttp://path/to/cpp3-license-documentation,, ExtremeLab-1.1.1, Apache-1.1,EclipseFoundation,EPL-Notice-1, PublicDomain JAVASCRIPT, indexof, 0.0.1,,,, MIT
Empty lines and lines starting with '#' are ignored.
Warning
|
Note that ',' is not an allowed character in regular expressions, since it is used as a separation character for the CSV parsing, and it cannot be quoted at the moment. |
In some places licenses are usually given by URL, not by license name (this can be the case in MANIFEST.MF
, pom.xml
and package.json
files). The URL mapping maps these URLs (and, actually, other fancy names used) to internal license names (SPDX identifiers). Examples:
https://javaee.github.io/javamail/LICENSE, CDDL-1.1 http://www.h2database.com/html/license.html, MPL-2.0, EPL-1.0 https://glassfish.java.net/public/CDDL+GPL_1_1.html, CDDL-1.1, GPL-2.0 http://repository.jboss.org/licenses/cddl.txt, CDDL-1.0 http://repository.jboss.org/licenses/gpl-2.0-ce.txt, GPL-2.0 http://www.antlr.org/license.html, BSD-3-Clause http://antlr.org/license.html, BSD-3-Clause http://treelayout.googlecode.com/files/LICENSE.TXT, BSD-3-Clause http://xstream.codehaus.com/license.html, BSD-3-Clause
Empty lines and lines starting with '#' are ignored.
Note
|
From the file, lines are split by the character ','. The first column is the URL that should be mapped. The second and any further columns are license identifiers. Note that this way, an URL can be mapped to multiple licenses. |
In some places licenses are given by their name. This includes pom.xml
files, NPM package.json
files, and in some cases MANIFEST.MF
files. The name mapping maps these names to internal license names (SPDX identifiers). Example file:
(MIT AND CC-BY-3.0), MIT, CC-BY-3.0 (MIT OR Apache-2.0), MIT, Apache-2.0 (WTFPL OR MIT), WTFPL, MIT (BSD-2-Clause OR MIT OR Apache-2.0), BSD-2-Clause, MIT, Apache-2.0 (MIT AND Zlib), MIT, Zlib AFLv2.1, AFL-2.1 Apache 2, Apache-2.0 Apache 2.0, Apache-2.0
Empty lines and lines starting with '#' are ignored.
From the file, lines are split by the character ','. The first column is the name that should be mapped. The second and any further columns are license identifiers. Note that this way, a name can be mapped to multiple licenses.
Archives matching a global filter are removed from the output list completely.
This Feature can be used to filter out inner JARs that have no license information.
Examples:
==/org\.eclipse\.[_\-a-z0-9\.]+jar!/ant_tasks/[_\-a-zA-Z0-9\.]+\.jar
Each line from the file is taken as one expression. It can be:
-
a regular expression on the archive’s path if it starts with '==' (which are not part of the regular expression)
-
a regular axpression on the archive name if it starts with '=' (which are not part of the regular expression) Empty lines and lines starting with '#' are ignored.
Note
|
unlike the checked archives file, here ',' is an allowed character in regular expressions, since here not splitting by that character is done. |
If vendor names are given, archives are checked if their vendor name (retrieved from MANIFEST.MF
Bundle-Vendor
, POM file or NPM package.json
Vendor
) matches exactly. If yes, the archive is removed from the result list.
If a configuration file is used for vendor names, each line in the file is one vendor name. No split operations are done on the line. So a vendorname.csv
can look like this:
Company Another company My fancy open source project
Empty lines and lines starting with #
are ignored.
The recommended way of maintaining the configuration files of LicenseScout is to bundle them in a Maven artifact. This approach is described here.
For use as a sample (both Java and Javascript), a separate Maven project is used that contains only the configuration files. They are packaged as a ZIP file GAV Parameters:
-
groupId:
org.aposin.licensescout
-
artifactId:
org.aposin.licensescout.configuration.sample
-
classifier:
configuration
-
type:
zip
It contains the following files:
-
checkedarchives.csv
-
filteredvendornames.csv
-
globalfilters.csv
-
licenses.xml
-
namemappings.csv
-
notices.xml
-
providers.xml
-
urlmappings.csv
It is created and uploaded using mvn install
or mvn deploy
.
In builds that use this configuration project it is downloaded and unpacked to a local directory using the maven-dependency-plugin
. A typical configuration for downloading looks like this:
<properties>
<licensescout-configuration.dir>${project.build.directory}/licensescout-configuration</licensescout-configuration.dir>
</properties>
...
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-dependency-plugin</artifactId>
<executions>
<execution>
<id>unpack-licensescout-configuration</id>
<phase>process-resources</phase>
<goals>
<goal>unpack</goal>
</goals>
<configuration>
<artifactItems>
<artifactItem>
<groupId>org.aposin.licensescout</groupId>
<artifactId>org.aposin.licensescout.configuration.sample</artifactId>
<version>1.0.0-SNAPSHOT</version>
<classifier>configuration</classifier>
<type>zip</type>
<overWrite>true</overWrite>
<outputDirectory>${licensescout-configuration.dir}</outputDirectory>
</artifactItem>
</artifactItems>
<overWriteReleases>false</overWriteReleases>
<overWriteSnapshots>true</overWriteSnapshots>
</configuration>
</execution>
</executions>
</plugin>
LicenseScout can then reference the configuration files in the local file system like this:
<configuration>
...
<licensesFilename>${licensescout-configuration.dir}/licenses.xml</licensesFilename>
<checkedArchivesFilename>${licensescout-configuration.dir}/checkedarchives.csv</checkedArchivesFilename>
<licenseUrlMappingsFilename>${licensescout-configuration.dir}/urlmappings.csv</licenseUrlMappingsFilename>
<licenseNameMappingsFilename>${licensescout-configuration.dir}/namemappings.csv</licenseNameMappingsFilename>
<noticesFilename>${licensescout-configuration.dir}/notices.xml</noticesFilename>
<providersFilename>${licensescout-configuration.dir}/providers.xml</providersFilename>
<globalFiltersFilename>${licensescout-configuration.dir}/globalfilters.csv</globalFiltersFilename>
<filteredVendorNamesFilename>${licensescout-configuration.dir}/filteredvendornames.csv</filteredVendorNamesFilename>
...
</configuration>
The scanning process starts with the configured scan directory (parameter scanDirectory
). The entries of this directory are examined.
Directories are checked if they contain a subdirectory META-INF
containing a file MANIFEST.MF
. If so, they are assumed to be an unpacked JAR and processed further as an archive. Other directories are further processed recursively.
Files with a filename that Ends with .jar
are tried to open as a JAR file and processed further as a packed jar. Other files are ignored.
In packed as well as in unpacked JARS the entries (JAR entries or file system entries, respectively) are processed further. JAR files are processed as packed JARS. Other encountered files are considered as license files as described below. Directories are processed further recursively.
The scanning process starts with the configured scan Directory (parameter scanDirectory
). Entries of this directory are checked if they are a Directory and contain a file package.json
. If so, the directory is assumed to be an NPM package and is processed further. Other directories are processed recursively. If the name of a Directory matches and entry of the excluded dir names, it is ignored. Plain files are ignored, too. For directories representing an NPM package its package.json
is examined for name, version, vendor and license information. Then, the directory is scanned recursively for other files that may contain license Information in text.
Files are selected for automatic detection of licenses if their file name fulfills certain criteria. For Java, a filename meets the criteria if it ends with txt
, htm
or html
, or if the filename contains license
, licence
or notice
and does not end with .class
. For Javascript the filename meets the criteria if it contains license
or readme
. All file name comparisons are done case-insensitive.
Files selected for license detection are opened as text files and processed line by line. If the line contains a detection string (for the license name, see file licenses.xml
) the associated list of licenses is taken as the current candidate license list. If there is a candidate license list, the current line is also scanned for a version number. If a version number is found, the candidate license list is checked for a license that matches the version number. If no license matches, the first element of the list is taken as a detected license and added to a set of overall detected licenses.
If a detection string is found and there is already a candidate license list it is assumed that there will be no version number for the current candidate license list. Therefore, the default element of the current candidate list is used as a detected license. Then, the license list associated with the newly detected string becomes the new current candidate license list. All license lists that have been encountered are also stored in a set. If a new license list is found it is checked against the know processed lists. If a license list is found, it is assumed that it is already processed and will not be handled a second time.
If no version number is detected in the three lines following the line where the license name was detected, it is assumed that there is no version number and the default element of the candidate list is used as a detected license.
If the end of the file is met and there is still a candidate license list, also the default element of the list is used as a detected license.
MIT style licenses typically have the name of the author or organisation that is the license holder as part of the license text (this is required to be copied without modifications). To fulfill this requirement automatically, files containing license
in their file name (case insensitive) are considered as full license texts. If an artifact has MIT as a detected license and a full license text is available, the standard MIT license object is replaced in this artifacts license list by a license object that contains the actual license text. This way, the actual license text is used in the license text report.
Detection status is set according to the following rules:
-
If there are manually configured licenses and the list of automatically detected licenses is empty the status of the archive becomes
MANUALLY_DETECTED
. -
If there are manually configured licenses and the list of automatically detected licenses is not empty the status of the archive becomes
MANUALLY_SELECTED
. -
If there are no manually configured licenses and the list of automatically detected licenses contains more than one entry the status of the archive becomes
MULTIPLE_DETECTED
. -
If there are no manually configured licenses and the list of automatically detected licenses contains one entry the status of the archive becomes
DETECTED
. -
If there are no manually configured licenses and no automatically detected licenses the status of the archive becomes
NOT_DETECTED
.
Legal status is set according to the following rules:
-
If one of the licenses (of an archive) has status
UNKNOWN
, the status of the archive becomesUNKNOWN
. -
If there are only licenses with status
ACCEPTED
, the status of the archive becomesACCEPTED
. -
If there are only licenses with status
NOT_ACCEPTED
, the status of the archive becomesNOT_ACCEPTED
. -
If there are licenses with status
ACCEPTED
and licenses with statusNOT_ACCEPTED
, the status of the archive becomesCONFLICTING
.
LicenseScout can also be executed locally. Do the following:
-
Do a build of LicenseScout using the launch configuration
/org.aposin.licensescout.core/launch/org.aposin.licensescout.core_clean_install.launch
-
Do a build of the license finder sample configuration using the launch configuration
/org.aposin.licensescout.configuration.sample/launch/org.aposin.licensescout.configuration.sample_clean_install.launch
-
Make sure the current (SNAPSHOT) version numbers of LicenseScout (from
/org.aposin.licensescout.core/pom.xml
) and the license finder configuration (from/org.aposin.licensescout.configuration.sample/pom.xml
) are entered as property values in/org.aposin.licensescout.licensereport/pom.xml
-
Run LicenseScout:
/org.aposin.licensescout.licensereport/launch/org.aposin.licensescout.licensereport_scan.launch
The file org.aposin.licensescout.core/sql/licensescout.sql
contains an SQL script that creates the tables in the database needed for storing the LicenseScout results.
The LicenseScout uses the JDBC driver of MariaDB (https://mariadb.org/). This JDBC driver is compatible with the MySQL database engine. The version of the MariaDB driver used (2.4.1) is compatible with MySQL server 5.5.3 and later (see https://mariadb.com/kb/en/library/about-mariadb-connector-j/).
This JDBC driver is used because the standard JDBC driver of MySQL is licensed under the GNU General public license, which is incompatible with the Apache 2.0 license used for the LicenseScout.