LicenseScout

LicenseScout is a tool to identify third-party artifacts (libraries) and their licenses, in Java as well as JavaScript projects. The goal is to get an overview over the used licenses, and the artifacts for which no license could be detected.

Key features include:

Scanning Java and Javascript/NPM artifacts
Writing reports in different file formats
Writing results to a database
Defining licenses manually for certain artifacts
Exclude artifacts completely

Quick start

You need a working Maven 3 installation
check out all LicenseScout projects
cd org.aposin.licensescout.quickstart
mvn clean install
The reports are written to the directory org.aposin.licensescout.licensereport/target
- HTML report: licenses_org.aposin.licensescout.licensereport.html
- CSV report: licenses_org.aposin.licensescout.licensereport.csv
- TXT report: licenses_org.aposin.licensescout.licensereport.txt

Projects overview

Table 1. LicenseScout projects

artifactId	Purpose
`org.aposin.licensescout.parent`	Parent POM for all other LicenseScout projects
`org.aposin.licensescout.core`	Code for LicenseScout maven plugin
`org.aposin.licensescout.configuration.sample`	Sample configuration files
`org.aposin.licensescout.licensereport`	Sample for LicenseScout build integration, generates license reports for LicenseScout itself (using the sample configuration)
`org.aposin.licensescout.quickstart`	Compiles the Maven plugin, creates the sample configuration bundle and generates a license report for LicenseScout itself.

Design

LicenseScout is implemented as a Maven plugin. It scans a given base directory for artifacts.

Processing phases

Initialisation

Initialisation includes:

Validity checks on the input parameters
Creation of output directory, if necessary
Reading of configuration files (licenses, mappings, filters, etc)

Scan/license detection

The configured base directory is scanned recursively, either for Java archives or for NPM packages. The archives found are collected in a list. If license information associated with an archive is found, the license is added to the list associated with the archive. During this step a mapping of license URLs to license names is done (so that licenses given as URL can be found from the internal list).

License processing

Processings are done:

Manually checked licenses for specific archives are added to the archives from the checked archives list.
a license status for each archive is calculated.
Global filters are applied: artifact names are matched against a global filter that applies regular expressions on the artifact path name. Artifacts that match are removed.
The vendor filter is applied, using the configured vendor names to filter out: artifacts with a certain vendor name in the MANIFEST.MF are removed (this can be used for own artifacts, which should not appear at all).
Clean output filtering is applied, if configured
Statistics on the total number of archives, the total number of considered license files and the number of archives per license state are calculated.

Report Output

From the collected, processed and filtered information reports are generated, depending on the output configuration. Either a HTML report, a CSV report, a TXT report, or a combination of these are generated. The CSV Output is suitable for automatic comparison between versions. The HTML output uses velocity templates to generate the output. The template used for this is org.aposin.licensescout/src/main/resources/templates/license_report.vm. The TXT output uses hard-coded text fragments.

Database Output

If configured, important data are also written to a database.

The results are written into three tables:

Builds
LibraryData
DetectedLicenses

Note	For details on the table structure see the file `org.aposin.licensescout.core/sql/licensescout.sql`

Configuration

LicenseScout is configured as a Maven plugin. Below you can see a typical configuration.

<plugin>
    <groupId>org.aposin.licensescout</groupId>
    <artifactId>licensescout-maven-plugin</artifactId>
    <version>1.1.4</version>
    <executions>
        <execution>
            <id>find-licenses</id>
            <phase>verify</phase>
            <goals>
                <goal>scanJava</goal>
            </goals>
            <configuration>
                <scanDirectory>${project.build.directory}/products/my.product/win32/win32/x86/plugins/</scanDirectory>
                <outputDirectory>${licensescout.outputDirectory}</outputDirectory>
                <outputs>
                    <output>
                        <type>HTML</type>
                        <filename>${licensescout.outputFilename.html}</filename>
                        <url>${licensereport.url.html}</url>
                    </output>
                    <output>
                        <type>CSV</type>
                        <filename>${licensescout.outputFilename.csv}</filename>
                        <url>${licensereport.url.csv}</url>
                    </output>
                    <output>
                        <type>TXT</type>
                        <filename>${licensescout.outputFilename.txt}</filename>
                        <url>${licensereport.url.txt}</url>
                    </output>
                </outputs>
                <licensesFilename>${licensescout-configuration.dir}/licenses.xml</licensesFilename>
                <providersFilename>${licensescout-configuration.dir}/providers.xml</providersFilename>
                <noticesFilename>${licensescout-configuration.dir}/notices.xml</noticesFilename>
                <checkedArchivesFilename>${licensescout-configuration.dir}/checkedarchives.csv</checkedArchivesFilename>
                <licenseUrlMappingsFilename>${licensescout-configuration.dir}/urlmappings.csv</licenseUrlMappingsFilename>
                <licenseNameMappingsFilename>${licensescout-configuration.dir}/namemappings.csv</licenseNameMappingsFilename>
                <globalFiltersFilename>${licensescout-configuration.dir}/globalfilters.csv</globalFiltersFilename>
                <filteredVendorNamesFilename>${licensescout-configuration.dir}/filteredvendornames.csv</filteredVendorNamesFilename>
            </configuration>
        </execution>
    </executions>
</plugin>

XML configuration

This section describes the configuration for the LicenseScout maven plugin that is done in the POM file where you want to execute the LicenseScout. For information on the format of configuration files see the next section.

Goals

In one execution, LicenseScout can either scan for Java artifacts or for Javascript/NPM artifacts.

For Java executions, the Maven goal scanJava is used.
For Javascript/NPM executions the Maven goal scanNpm is used.

Scan Location

The base directory where archives are searched for (recursively and also inside JARs) is configured by the parameter scanDirectory.

Output types and files

Output can be configured with the output directory (parameter outputDirectory) and output types. The actual output files to generate are configured by the output types. the available output types are 'HTML', 'CSV' and 'TXT'. One or multiple output types can be configured. If no output type is configured, no output files will be written. File names of the output files are the result of appending the filename given by parameters filename in the output section to the output directory. An additional parameter url can be used to specify the URL associated with the output that will be written to the database, if enabled.

Example:

<properties>
    <licensescout.outputDirectory>...<licensescout.outputDirectory>
    licensescout.outputFilename.html TODO
</properties>

...

<configuration>
    ...
    <outputDirectory>${licensescout.outputDirectory}</outputDirectory>
    <outputs>
        <output>
            <type>HTML</type>
            <filename>${licensescout.outputFilename.html}</filename>
        </output>
        <output>
            <type>CSV</type>
            <filename>${licensescout.outputFilename.csv}</filename>
        </output>
        <output>
            <type>TXT</type>
            <filename>${licensescout.outputFilename.txt}</filename>
        </output>
    </outputs>
    ...
</configuration>

Vendor names to filter out

Vendor names that should be used to filter out archives from the result list can be configured either directly with the parameter filteredVendorNamesFilename or with a configuration file (see below). Example:

<configuration>
    ...
    <filteredVendorNames>
        <filteredVendorName>My company</filteredVendorName>
        <filteredVendorName>Another Company</filteredVendorName>
    </filteredVendorNames>
    ...
</configuration>

If no vendor names are configured, no filtering by vendor name takes place.

NPM excludes

Directory names that should be ignored in scanning for NPM modules can be configured using the parameter npmExcludedDirectoryNames / npmExcludedDirectoryName. Example:

<configuration>
    ...
    <npmExcludedDirectoryNames>
        <npmExcludedDirectoryName>.bin</npmExcludedDirectoryName>
        <npmExcludedDirectoryName>@angular</npmExcludedDirectoryName>
        <npmExcludedDirectoryName>@ngtools</npmExcludedDirectoryName>
        <npmExcludedDirectoryName>@types</npmExcludedDirectoryName>
    </npmExcludedDirectoryNames>
    ...
</configuration>

If no excludes are given, no directories are excluded.

Maven central configuration

LicenseScout accesses an external Maven repository to download parent POM files if it is necessary to find out license information. The base URL used for this can be configured.

In an enterprise environment, this can be used to point to an artifact server like Nexus that mirrors the Maven central repository.

Example:

<configuration>
    ...
    <nexusCentralBaseUrl>http://nexus.company.com:8081/nexus/content/repositories/central/</nexusCentralBaseUrl>
    ...
</configuration>

Note	If no Maven central URL is given, the default is to access Maven Central directly (value `https://repo.maven.apache.org/maven2/`).

Output filtering

The resulting output list of archives can be filtered to remove archives with certain legal state or certain licenses. A list of legal states to filter out can be given with cleanOutputLegalStates / cleanOutputLegalState. Any archive that has one of the states given will be filtered out from the result list. Also, a list of license identifiers can be given with cleanOutputLicenseSpdxIdentifiers / cleanOutputLicenseSpdxIdentifier. These values are matched against the SPDX identifiers given as spdxIdentifier in the license XML file (see below). Any archive that contains one of the licenses given will be filtered out. The filtering can be activated and deactivated with a switch (cleanOutputActive) with values true or false. Example:

<configuration>
    ...
        <cleanOutputActive>true</cleanOutputActive>
        <cleanOutputLegalStates>
            <cleanOutputLegalState>NOT_ACCEPTED</cleanOutputLegalState>
            <cleanOutputLegalState>CONFLICTING</cleanOutputLegalState>
        </cleanOutputLegalStates>
        <cleanOutputLicenseSpdxIdentifiers>
            <cleanOutputLicenseSpdxIdentifier>WTFPL</cleanOutputLicenseSpdxIdentifier>
        </cleanOutputLicenseSpdxIdentifiers>
    ...
</configuration>

If cleanOutputActive is not configured or if no states or licenses to filter out are configured, no filtering takes place.

Report output configuration

The resulting output files (HTML, CSV and TXT) can be configured to contain or not contain specific Information. The documentation URL from the checked licenses list can be used in the output report. This can be activated with a switch (showDocumentationUrl) with values true or false. Example:

<configuration>
    ...
        <showDocumentationUrl>true</showDocumentationUrl>
    ...
</configuration>

If showDocumentationUrl is not configured the documentation URL is included into the output.

Results database configuration

LicenseScout can use a database to write core information of the report to. With the parameter writeResultsToDatabase writing to the database can be enabled or disabled. The parameter writeResultsToDatabaseForSnapshotBuilds determines if records should be written to the database also for snapshot versions. If the value is not true, version numbers (taken from the parameter buildVersion) that contain -SNAPSHOT are not processed further.

The record resultDatabaseConfiguration with the parameter jdbcUrl, username and password is used to configure the target database.

If writing to the result database is enabled, further parameters are used to obtain information to write to the database. There are parameters for the build name, the build version, the build URL and (inside output) for the URLs of the output files.

Example:

<properties>
    <licensescout.writeResultsToDatabase>true</licensescout.writeResultsToDatabase>
    <licensescout.database.url>...</licensescout.database.url>
    <licensescout.database.username>...</licensescout.database.username>
    <licensescout.database.password>...</licensescout.database.password>

    <licensescout.buildName>${project.artifactId}</licensescout.buildName>
    <licensescout.buildVersion>${project.version}</licensescout.buildVersion>
    <licensescout.buildUrl>...</licensescout.buildUrl>
</properties>

...

<configuration>
	...
	<writeResultsToDatabase>${licensescout.writeResultsToDatabase}</writeResultsToDatabase>
	<writeResultsToDatabaseForSnapshotBuilds>false</writeResultsToDatabaseForSnapshotBuilds>
	<resultDatabaseConfiguration>
		<jdbcUrl>${licensescout.database.url}</jdbcUrl>
		<username>${licensescout.database.username}</username>
		<password>${licensescout.database.password}</password>
	</resultDatabaseConfiguration>
	<buildName>${licensescout.buildName}</buildName>
	<buildVersion>${licensescout.buildVersion}</buildVersion>
	<buildUrl>${licensescout.buildUrl}</buildUrl>
	...
</configuration>

It is recommended to configure the values for username and password via settings.xml.

Configuration files

LicenseScout can use eight configuration files for

licenses
providers
notices
manually checked archives
mappings names to licenses
mappings of URLs to licenses
global filters on archives
vendor names to filter out (vendor names can be configured both via XML or via configuration file)

The following sections describe the file Format and the effect of the configurations. The filenames of the files are configured using the following Maven parameters:

licensesFilename
providersFilename
noticesFilename
checkedArchivesFilename
licenseUrlMappingsFilename
licenseNameMappingsFilename
globalFiltersFilename
filteredVendorNamesFilename (for an example see above)

Licenses

Known licenses, their URLs and associated detection strings are configured using an XML file. The filename is configured using the Maven Parameter licensesFilename. Example of the file:

<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<licenses>
	<license id='AFL-1.1'>
		<spdxIdentifier>AFL-1.1</spdxIdentifier>
		<name>Academic Free License</name>
		<legalStatus>ACCEPTED</legalStatus>
		<author>Lawrence E. Rosen</author>
		<version>1.1</version>
		<publicUrl>https://spdx.org/licenses/AFL-1.1.html</publicUrl>
	</license>
	<license id='AFL-1.2'>
		<spdxIdentifier>AFL-1.2</spdxIdentifier>
		<name>Academic Free License</name>
		<legalStatus>ACCEPTED</legalStatus>
		<author>Lawrence E. Rosen</author>
		<version>1.2</version>
		<publicUrl>https://spdx.org/licenses/AFL-1.2.html</publicUrl>
	</license>
	<license id='AFL-2.0'>
		<spdxIdentifier>AFL-2.0</spdxIdentifier>
		<name>Academic Free License</name>
		<legalStatus>ACCEPTED</legalStatus>
		<author>Lawrence E. Rosen</author>
		<version>2.0</version>
		<publicUrl>https://spdx.org/licenses/AFL-2.0.html</publicUrl>
	</license>
	<license id='AFL-2.1'>
		<spdxIdentifier>AFL-2.1</spdxIdentifier>
		<name>Academic Free License</name>
		<legalStatus>ACCEPTED</legalStatus>
		<author>Lawrence E. Rosen</author>
		<version>2.1</version>
		<publicUrl>https://spdx.org/licenses/AFL-2.1.html</publicUrl>
	</license>
	<license id='AFL-3.0'>
		<spdxIdentifier>AFL-3.0</spdxIdentifier>
		<name>Academic Free License</name>
		<legalStatus>ACCEPTED</legalStatus>
		<author>Lawrence E. Rosen</author>
		<version>3.0</version>
		<publicUrl>https://spdx.org/licenses/AFL-3.0.html</publicUrl>
		<notice>AFL-Notice-3.0</notice>
	</license>
	<licenseSet>
		<license idref='AFL-1.1' />
		<license idref='AFL-1.2' />
		<license idref='AFL-2.0' />
		<license idref='AFL-2.1' />
		<license idref='AFL-3.0' />
		<detectionString>ACADEMIC FREE LICENSE</detectionString>
	</licenseSet>
	...
</licenses>

Each license should be given as a license element. Also different versions of a license should be given as separate license elements. The id attribute of license is mandatory, it is used to refer to the license in licenseSet`s. Usually, the value of the `id attribute should be identical to the SPDX identifier of the license. However, the id attributes are only used for referencing in the XML file internally. So an id attribute can be used even if the license has no SPDX identifier.

The spdxIdentifier element is optional. However, it is recommended to assign a value even if the license has no actual SPDX identifier. The reason for this is that in CSV output the licenses are given by their SPDX identifier. If a license has no identifier a blank field will appear. The values are
The value of the name element is only used for displaying the license, not for automatic detection (for detection licenseSet / detectionString is used).
The value of name should not be empty, as it is used to sort licenses in the output reports.
The legalStatus reflects if a license is acceptable for the given project. The value can be ACCEPTED, NOT_ACCEPTED or UNKNOWN (see enumeration org.aposin.licensescout.license.LegalStatus). The value is mandatory.
The author element gives the name of the person or organisation that published the license. The value may be empty.
The version element gives the version of the license. The value may be empty. As the value of version is appended to the name in the report output, usually the name should not contain a version number. The version number is also used in automatic detection to distinguish different versions of a license.
The publicUrl should be an URL that leads to a readable license text, as this URL is used in the HTML output for links underlying the license. On the other hand, secondaryUrl values are not required to be actually accessible. They are only used to associate licenses with that URL.
A notice element contains an ID of a notice from the notices XML file. This is optional.

Tip

For automatic detection, different versions of the same licenses can be grouped to license sets. A license set has one or more associated detection strings. If a detection string of a license set is found in a text file that may be a potential license file, the mechanism tries to detect a version number from the file. If a version number is found and matches the version string of one of the licenses of the set, the file is recognized as that dedicated version of the license. Otherwise the first license of the set is recognized. Note that for special detection behaviour a license can be member of multiple license sets with different detection strings (though this case is not very common). Detection strings are matched case-insensitive against potential license text.

Providers

Providers with their name and URL are configured using an XML file. The filename is configured using the Maven Parameter providersFilename. Example of the file:

<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<providers>
	<provider id='EclipseFoundation'>
		<name>Eclipse Foundation</name>
		<url>https://www.eclipse.org/</url>
	</provider>
</providers>

Notices

Notices are pieces that a license requires to be published with a software that uses a third-party software under this license. Example of the file:

<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<notices>
	<notice id='EPL-1.0'>
		<text>Notice for EPL 1</text>
	</notice>
	<notice id='EPL-2.0'>
		<text>Notice for EPL 2</text>
	</notice>
	<notice id='MIT-1'>
		<text>Notice for MIT 1</text>
	</notice>
	...
</notices>

Each notice should be given as a notice element. Also different versions of a license should be given as separate notice element. The id attribute of notice is mandatory, it is used to refer to the notice from licenses (in the licenses XML file) and checked archives (in the checked archives CSV file).

Checked archives

Here, archives that have no license detected automatically can be assigned a licenses that has been checked manually.

It also can be used to decide between multiple detected licenses.

An archive can be identified by either:

an archive name (exact match) and a version number
an archive name (exact match) and a hash code
a regular expression that is matched against the archive name
a regular expression that is matched against the path of the archive

From the file, lines are split by the character ','.

First colum (type) - can be:

JAVA for Java Jar archives (packed or unpacked)
JAVASCRIPT for JS/NPM packages

The second column is the name. The name is used as:

a regular expression on the archive’s path if it starts with == (which are not part of the regular expression)
a regular expression on the Archive Name if it starts with = (which are not part of the regular expression)
an archive name that is matched exactly otherwise

The third column is either a version number or a hash code. If the length of the field is exactly 64 characters, it is parsed as an SHA-256 hash value. Otherwise, it is taken as a version number.

The fourth column is string that is used as documentation URL in the output reports (if the Output configuration enables outputting this Information, see showDocumentationUrl Maven parameter). The value may be empty.

The fifth column is an identifier of a provider. This is optional. The value may be empty.

The sixth column is an identifier of a notive. This is optional. The value may be empty.

The seventh and any further column are license identifiers. An archive can have one multiple or no license assigned. If no license is assigned, it will get the status MANUALLY_NOT_DETECTED.

Empty lines and lines starting with '#' are ignored. Examples:

JAVA, bcprov-ext-jdk15on-155.jar, 2FBFC48DA088C1223ADB84A928ABEA4083C2702F4C06CC9692736627DD50C59B,http://dummy,,, MIT
JAVA, xpp3_min.jar, 8D60778CD5018E7A130B3FB6C96A57DD9E1877B9EFBF76B4B63A8DD395128EAEhttp://path/to/cpp3-license-documentation,, ExtremeLab-1.1.1, Apache-1.1,EclipseFoundation,EPL-Notice-1, PublicDomain
JAVASCRIPT, indexof, 0.0.1,,,, MIT

Empty lines and lines starting with '#' are ignored.

Warning

Note that ',' is not an allowed character in regular expressions, since it is used as a separation character for the CSV parsing, and it cannot be quoted at the moment.

License URL mapping

In some places licenses are usually given by URL, not by license name (this can be the case in MANIFEST.MF, pom.xml and package.json files). The URL mapping maps these URLs (and, actually, other fancy names used) to internal license names (SPDX identifiers). Examples:

https://javaee.github.io/javamail/LICENSE, CDDL-1.1
http://www.h2database.com/html/license.html, MPL-2.0, EPL-1.0
https://glassfish.java.net/public/CDDL+GPL_1_1.html, CDDL-1.1, GPL-2.0
http://repository.jboss.org/licenses/cddl.txt, CDDL-1.0
http://repository.jboss.org/licenses/gpl-2.0-ce.txt, GPL-2.0
http://www.antlr.org/license.html, BSD-3-Clause
http://antlr.org/license.html, BSD-3-Clause
http://treelayout.googlecode.com/files/LICENSE.TXT, BSD-3-Clause
http://xstream.codehaus.com/license.html, BSD-3-Clause

Empty lines and lines starting with '#' are ignored.

Note	From the file, lines are split by the character ','. The first column is the URL that should be mapped. The second and any further columns are license identifiers. Note that this way, an URL can be mapped to multiple licenses.

License name mapping

In some places licenses are given by their name. This includes pom.xml files, NPM package.json files, and in some cases MANIFEST.MF files. The name mapping maps these names to internal license names (SPDX identifiers). Example file:

(MIT AND CC-BY-3.0), MIT, CC-BY-3.0
(MIT OR Apache-2.0), MIT, Apache-2.0
(WTFPL OR MIT), WTFPL, MIT
(BSD-2-Clause OR MIT OR Apache-2.0), BSD-2-Clause, MIT, Apache-2.0
(MIT AND Zlib), MIT, Zlib
AFLv2.1, AFL-2.1
Apache 2, Apache-2.0
Apache 2.0, Apache-2.0

Empty lines and lines starting with '#' are ignored.

From the file, lines are split by the character ','. The first column is the name that should be mapped. The second and any further columns are license identifiers. Note that this way, a name can be mapped to multiple licenses.

Global filters

Archives matching a global filter are removed from the output list completely.

This Feature can be used to filter out inner JARs that have no license information.

Examples:

==/org\.eclipse\.[_\-a-z0-9\.]+jar!/ant_tasks/[_\-a-zA-Z0-9\.]+\.jar

Each line from the file is taken as one expression. It can be:

a regular expression on the archive’s path if it starts with '==' (which are not part of the regular expression)
a regular axpression on the archive name if it starts with '=' (which are not part of the regular expression) Empty lines and lines starting with '#' are ignored.

Note	unlike the checked archives file, here ',' is an allowed character in regular expressions, since here not splitting by that character is done.

Vendor names

If vendor names are given, archives are checked if their vendor name (retrieved from MANIFEST.MF Bundle-Vendor, POM file or NPM package.json Vendor) matches exactly. If yes, the archive is removed from the result list.

If a configuration file is used for vendor names, each line in the file is one vendor name. No split operations are done on the line. So a vendorname.csv can look like this:

Company
Another company
My fancy open source project

Empty lines and lines starting with # are ignored.

Sample Configuration Project

The recommended way of maintaining the configuration files of LicenseScout is to bundle them in a Maven artifact. This approach is described here.

For use as a sample (both Java and Javascript), a separate Maven project is used that contains only the configuration files. They are packaged as a ZIP file GAV Parameters:

groupId: org.aposin.licensescout
artifactId: org.aposin.licensescout.configuration.sample
classifier: configuration
type: zip

It contains the following files:

checkedarchives.csv
filteredvendornames.csv
globalfilters.csv
licenses.xml
namemappings.csv
notices.xml
providers.xml
urlmappings.csv

It is created and uploaded using mvn install or mvn deploy. In builds that use this configuration project it is downloaded and unpacked to a local directory using the maven-dependency-plugin. A typical configuration for downloading looks like this:

<properties>
    <licensescout-configuration.dir>${project.build.directory}/licensescout-configuration</licensescout-configuration.dir>
</properties>
...
<plugin>
    <groupId>org.apache.maven.plugins</groupId>
    <artifactId>maven-dependency-plugin</artifactId>
    <executions>
        <execution>
            <id>unpack-licensescout-configuration</id>
            <phase>process-resources</phase>
            <goals>
                <goal>unpack</goal>
            </goals>
            <configuration>
                <artifactItems>
                    <artifactItem>
                        <groupId>org.aposin.licensescout</groupId>
                        <artifactId>org.aposin.licensescout.configuration.sample</artifactId>
                        <version>1.0.0-SNAPSHOT</version>
                        <classifier>configuration</classifier>
                        <type>zip</type>
                        <overWrite>true</overWrite>
                        <outputDirectory>${licensescout-configuration.dir}</outputDirectory>
                    </artifactItem>
                </artifactItems>
                <overWriteReleases>false</overWriteReleases>
                <overWriteSnapshots>true</overWriteSnapshots>
            </configuration>
        </execution>
    </executions>
</plugin>

LicenseScout can then reference the configuration files in the local file system like this:

    <configuration>
        ...
        <licensesFilename>${licensescout-configuration.dir}/licenses.xml</licensesFilename>
        <checkedArchivesFilename>${licensescout-configuration.dir}/checkedarchives.csv</checkedArchivesFilename>
        <licenseUrlMappingsFilename>${licensescout-configuration.dir}/urlmappings.csv</licenseUrlMappingsFilename>
        <licenseNameMappingsFilename>${licensescout-configuration.dir}/namemappings.csv</licenseNameMappingsFilename>
        <noticesFilename>${licensescout-configuration.dir}/notices.xml</noticesFilename>
        <providersFilename>${licensescout-configuration.dir}/providers.xml</providersFilename>
        <globalFiltersFilename>${licensescout-configuration.dir}/globalfilters.csv</globalFiltersFilename>
        <filteredVendorNamesFilename>${licensescout-configuration.dir}/filteredvendornames.csv</filteredVendorNamesFilename>
        ...
    </configuration>

Internals

Scan for JAVA (goal: `scanJava`)

The scanning process starts with the configured scan directory (parameter scanDirectory). The entries of this directory are examined. Directories are checked if they contain a subdirectory META-INF containing a file MANIFEST.MF. If so, they are assumed to be an unpacked JAR and processed further as an archive. Other directories are further processed recursively. Files with a filename that Ends with .jar are tried to open as a JAR file and processed further as a packed jar. Other files are ignored. In packed as well as in unpacked JARS the entries (JAR entries or file system entries, respectively) are processed further. JAR files are processed as packed JARS. Other encountered files are considered as license files as described below. Directories are processed further recursively.

Scan for JAVASCRIPT (goal: `scanNpm`)

The scanning process starts with the configured scan Directory (parameter scanDirectory). Entries of this directory are checked if they are a Directory and contain a file package.json. If so, the directory is assumed to be an NPM package and is processed further. Other directories are processed recursively. If the name of a Directory matches and entry of the excluded dir names, it is ignored. Plain files are ignored, too. For directories representing an NPM package its package.json is examined for name, version, vendor and license information. Then, the directory is scanned recursively for other files that may contain license Information in text.

License detection from text files

Files are selected for automatic detection of licenses if their file name fulfills certain criteria. For Java, a filename meets the criteria if it ends with txt, htm or html, or if the filename contains license, licence or notice and does not end with .class. For Javascript the filename meets the criteria if it contains license or readme. All file name comparisons are done case-insensitive.

Files selected for license detection are opened as text files and processed line by line. If the line contains a detection string (for the license name, see file licenses.xml) the associated list of licenses is taken as the current candidate license list. If there is a candidate license list, the current line is also scanned for a version number. If a version number is found, the candidate license list is checked for a license that matches the version number. If no license matches, the first element of the list is taken as a detected license and added to a set of overall detected licenses. If a detection string is found and there is already a candidate license list it is assumed that there will be no version number for the current candidate license list. Therefore, the default element of the current candidate list is used as a detected license. Then, the license list associated with the newly detected string becomes the new current candidate license list. All license lists that have been encountered are also stored in a set. If a new license list is found it is checked against the know processed lists. If a license list is found, it is assumed that it is already processed and will not be handled a second time. If no version number is detected in the three lines following the line where the license name was detected, it is assumed that there is no version number and the default element of the candidate list is used as a detected license. If the end of the file is met and there is still a candidate license list, also the default element of the list is used as a detected license.

Special handling for MIT licenses

MIT style licenses typically have the name of the author or organisation that is the license holder as part of the license text (this is required to be copied without modifications). To fulfill this requirement automatically, files containing license in their file name (case insensitive) are considered as full license texts. If an artifact has MIT as a detected license and a full license text is available, the standard MIT license object is replaced in this artifacts license list by a license object that contains the actual license text. This way, the actual license text is used in the license text report.

License processing details

Detection status is set according to the following rules:

If there are manually configured licenses and the list of automatically detected licenses is empty the status of the archive becomes MANUALLY_DETECTED.
If there are manually configured licenses and the list of automatically detected licenses is not empty the status of the archive becomes MANUALLY_SELECTED.
If there are no manually configured licenses and the list of automatically detected licenses contains more than one entry the status of the archive becomes MULTIPLE_DETECTED.
If there are no manually configured licenses and the list of automatically detected licenses contains one entry the status of the archive becomes DETECTED.
If there are no manually configured licenses and no automatically detected licenses the status of the archive becomes NOT_DETECTED.

Legal status is set according to the following rules:

If one of the licenses (of an archive) has status UNKNOWN, the status of the archive becomes UNKNOWN.
If there are only licenses with status ACCEPTED, the status of the archive becomes ACCEPTED.
If there are only licenses with status NOT_ACCEPTED, the status of the archive becomes NOT_ACCEPTED.
If there are licenses with status ACCEPTED and licenses with status NOT_ACCEPTED, the status of the archive becomes CONFLICTING.

Local execution

LicenseScout can also be executed locally. Do the following:

Do a build of LicenseScout using the launch configuration /org.aposin.licensescout.core/launch/org.aposin.licensescout.core_clean_install.launch
Do a build of the license finder sample configuration using the launch configuration /org.aposin.licensescout.configuration.sample/launch/org.aposin.licensescout.configuration.sample_clean_install.launch
Make sure the current (SNAPSHOT) version numbers of LicenseScout (from /org.aposin.licensescout.core/pom.xml) and the license finder configuration (from /org.aposin.licensescout.configuration.sample/pom.xml) are entered as property values in /org.aposin.licensescout.licensereport/pom.xml
Run LicenseScout: /org.aposin.licensescout.licensereport/launch/org.aposin.licensescout.licensereport_scan.launch

Setting up LicenseScout database

The file org.aposin.licensescout.core/sql/licensescout.sql contains an SQL script that creates the tables in the database needed for storing the LicenseScout results.

Database driver

The LicenseScout uses the JDBC driver of MariaDB (https://mariadb.org/). This JDBC driver is compatible with the MySQL database engine. The version of the MariaDB driver used (2.4.1) is compatible with MySQL server 5.5.3 and later (see https://mariadb.com/kb/en/library/about-mariadb-connector-j/).

This JDBC driver is used because the standard JDBC driver of MySQL is licensed under the GNU General public license, which is incompatible with the Apache 2.0 license used for the LicenseScout.

Files

documentation.adoc

Latest commit

History

documentation.adoc

File metadata and controls

LicenseScout

Quick start

Projects overview

Design

Processing phases

Initialisation

Scan/license detection

License processing

Report Output

Database Output

Configuration

XML configuration

Goals

Scan Location

Output types and files

Vendor names to filter out

NPM excludes

Maven central configuration

Output filtering

Report output configuration

Results database configuration

Configuration files

Licenses

Providers

Notices

Checked archives

License URL mapping

License name mapping

Global filters

Vendor names

Sample Configuration Project

Internals

Scan for JAVA (goal: scanJava)

Scan for JAVASCRIPT (goal: scanNpm)

License detection from text files

Special handling for MIT licenses

License processing details

Local execution

Setting up LicenseScout database

Database driver

Scan for JAVA (goal: `scanJava`)

Scan for JAVASCRIPT (goal: `scanNpm`)