diff --git a/docs/source/_static/images/tutorial_example_maven_app_report_dependencies.png b/docs/source/_static/images/tutorial_example_maven_app_report_dependencies.png index ee0c91775..0f8b8b135 100644 Binary files a/docs/source/_static/images/tutorial_example_maven_app_report_dependencies.png and b/docs/source/_static/images/tutorial_example_maven_app_report_dependencies.png differ diff --git a/docs/source/_static/images/tutorial_guava_infer_pipeline.png b/docs/source/_static/images/tutorial_guava_infer_pipeline.png deleted file mode 100644 index 3cee17c0c..000000000 Binary files a/docs/source/_static/images/tutorial_guava_infer_pipeline.png and /dev/null differ diff --git a/docs/source/_static/images/tutorial_log4j_find_pipeline.png b/docs/source/_static/images/tutorial_log4j_find_pipeline.png new file mode 100644 index 000000000..e8d8bc511 Binary files /dev/null and b/docs/source/_static/images/tutorial_log4j_find_pipeline.png differ diff --git a/docs/source/index.rst b/docs/source/index.rst index e1abecc58..94caf4ff0 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -77,7 +77,7 @@ the requirements that are currently supported by Macaron. * - ``mcn_build_as_code_1`` - **Build as code** - If a trusted builder is not present, this requirement determines that the build definition and configuration executed by the build service is verifiably derived from text file definitions stored in a version control system. - Identify and validate the CI service(s) used to build and deploy/publish an artifact. - * - ``mcn_infer_artifact_pipeline_1`` + * - ``mcn_find_artifact_pipeline_1`` - **Infer artifact publish pipeline** - When a provenance is not available, checks whether a CI workflow run has automatically published the artifact. - Identify a workflow run that has triggered the deploy step determined by the ``Build as code`` check. * - ``mcn_provenance_level_three_1`` diff --git a/docs/source/pages/tutorials/detect_malicious_java_dep.rst b/docs/source/pages/tutorials/detect_malicious_java_dep.rst index de2f3ac57..b73a79840 100644 --- a/docs/source/pages/tutorials/detect_malicious_java_dep.rst +++ b/docs/source/pages/tutorials/detect_malicious_java_dep.rst @@ -1,11 +1,11 @@ .. Copyright (c) 2024 - 2024, Oracle and/or its affiliates. All rights reserved. .. Licensed under the Universal Permissive License v 1.0 as shown at https://oss.oracle.com/licenses/upl/. -.. _detect-malicious-java-dep: +.. _detect-manual-upload-java-dep: ------------------------------------------------------------------------- -Detecting a malicious Java dependency uploaded manually to Maven Central ------------------------------------------------------------------------- +-------------------------------------------------------------- +Detecting Java dependencies manually uploaded to Maven Central +-------------------------------------------------------------- In this tutorial we show how Macaron can determine whether the dependencies of a Java project are built and published via transparent CI workflows or manually uploaded to Maven Central. You can also @@ -24,12 +24,12 @@ dependencies: * - Artifact name - `Package URL (PURL) `_ - * - `guava `_ - - ``pkg:maven/com.google.guava/guava@32.1.2-jre?type=jar`` + * - `log4j-core `_ + - ``pkg:maven/org.apache.logging.log4j/log4j-core@3.0.0-beta2?type=jar`` * - `jackson-databind `_ - ``pkg:maven/io.github.behnazh-w.demo/jackson-databind@1.0?type=jar`` -While the ``guava`` dependency follows best practices to publish artifacts automatically with minimal human +While the ``log4j-core`` dependency follows best practices to publish artifacts automatically with minimal human intervention, ``jackson-databind`` is a malicious dependency that pretends to provide data-binding functionalities like `the official jackson-databind `_ library (note that this artifact is created for demonstration purposes and is not actually malicious). @@ -70,7 +70,7 @@ First, we need to run the ``analyze`` command of Macaron to run a number of :ref .. code-block:: shell - ./run_macaron.sh analyze -purl pkg:maven/io.github.behnazh-w.demo/example-maven-app@1.0?type=jar -rp https://github.com/behnazh-w/example-maven-app + ./run_macaron.sh analyze -purl pkg:maven/io.github.behnazh-w.demo/example-maven-app@2.0?type=jar -rp https://github.com/behnazh-w/example-maven-app --deps-depth=1 .. note:: By default, Macaron clones the repositories and creates output files under the ``output`` directory. To understand the structure of this directory please see :ref:`Output Files Guide `. @@ -96,7 +96,7 @@ As you can see, some of the checks are passing and some are failing. In summary, * is not producing any :term:`SLSA` or :term:`Witness` provenances (``mcn_provenance_available_1``) * is using GitHub Actions to build and test using ``mvnw`` (``mcn_build_service_1``) * but it is not deploying any artifacts automatically (``mcn_build_as_code_1``) -* and no CI workflow runs are detected that automatically publish artifacts (``mcn_infer_artifact_pipeline_1``) +* and no CI workflow runs are detected that automatically publish artifacts (``mcn_find_artifact_pipeline_1``) As you scroll down in the HTML report, you will see a section for the dependencies that were automatically identified: @@ -110,25 +110,25 @@ As you scroll down in the HTML report, you will see a section for the dependenci | Macaron has found the two dependencies as expected: * ``io.github.behnazh-w.demo:jackson-databind:1.0`` -* ``com.google.guava:guava:32.1.2-jre`` +* ``org.apache.logging.log4j:log4j-core:3.0.0-beta2`` -When we open the reports for each dependency, we see that ``mcn_infer_artifact_pipeline_1`` is passed for ``com.google.guava:guava:32.1.2-jre`` -and a GitHub Actions workflow run is found for publishing version ``32.1.2-jre``. However, this check is failing for ``io.github.behnazh-w.demo:jackson-databind:1.0``. +When we open the reports for each dependency, we see that ``mcn_find_artifact_pipeline_1`` is passed for ``org.apache.logging.log4j:log4j-core:3.0.0-beta2`` +and a GitHub Actions workflow run is found for publishing version ``3.0.0-beta2``. However, this check is failing for ``io.github.behnazh-w.demo:jackson-databind:1.0``. This means that ``io.github.behnazh-w.demo:jackson-databind:1.0`` could have been built and published manually to Maven Central and could potentially be malicious. -.. _fig_infer_artifact_pipeline_guava: +.. _fig_find_artifact_pipeline_log4j: -.. figure:: ../../_static/images/tutorial_guava_infer_pipeline.png - :alt: mcn_infer_artifact_pipeline_1 for com.google.guava:guava:32.1.2-jre +.. figure:: ../../_static/images/tutorial_log4j_find_pipeline.png + :alt: mcn_find_artifact_pipeline_1 for org.apache.logging.log4j:log4j-core:3.0.0-beta2 :align: center - ``com.google.guava:guava:32.1.2-jre`` + ``org.apache.logging.log4j:log4j-core:3.0.0-beta2`` .. _fig_infer_artifact_pipeline_bh_jackson_databind: .. figure:: ../../_static/images/tutorial_bh_jackson_databind_infer_pipeline.png - :alt: mcn_infer_artifact_pipeline_1 for io.github.behnazh-w.demo:jackson-databind:1.0 + :alt: mcn_find_artifact_pipeline_1 for io.github.behnazh-w.demo:jackson-databind:1.0 :align: center ``io.github.behnazh-w.demo:jackson-databind:1.0`` @@ -154,7 +154,7 @@ The security requirement in this tutorial is to mandate dependencies of our proj transparent artifact publish CI workflows. To write a policy for this requirement, first we need to revisit the checks shown in the HTML report in the previous :ref:`step `. The result of each of the checks can be queried by the check ID in the first column. For the policy in this tutorial, -we are interested in the ``mcn_infer_artifact_pipeline_1`` and ``mcn_provenance_level_three_1`` checks: +we are interested in the ``mcn_find_artifact_pipeline_1`` and ``mcn_provenance_level_three_1`` checks: .. code-block:: prolog @@ -167,7 +167,7 @@ we are interested in the ``mcn_infer_artifact_pipeline_1`` and ``mcn_provenance_ .decl violating_dependencies(parent: number) violating_dependencies(parent) :- transitive_dependency(parent, dependency), - !check_passed(dependency, "mcn_infer_artifact_pipeline_1"), + !check_passed(dependency, "mcn_find_artifact_pipeline_1"), !check_passed(dependency, "mcn_provenance_level_three_1"). apply_policy_to("detect-malicious-upload", component_id) :- @@ -176,8 +176,8 @@ we are interested in the ``mcn_infer_artifact_pipeline_1`` and ``mcn_provenance_ This policy requires that all the dependencies of repository ``github.com/behnazh-w/example-maven-app`` either pass the ``mcn_provenance_level_three_1`` (have non-forgeable -:term:`SLSA` provenances) or ``mcn_infer_artifact_pipeline_1`` check. Note that if an artifact already has a non-forgeable provenance, it means it is produced -by a hosted build platform, such as GitHub Actions CI workflows. So, the ``mcn_infer_artifact_pipeline_1`` needs to pass +:term:`SLSA` provenances) or ``mcn_find_artifact_pipeline_1`` check. Note that if an artifact already has a non-forgeable provenance, it means it is produced +by a hosted build platform, such as GitHub Actions CI workflows. So, the ``mcn_find_artifact_pipeline_1`` needs to pass only if ``mcn_provenance_level_three_1`` fails. Let's take a closer look at this policy to understand what each line means. @@ -219,12 +219,12 @@ This rule populates the ``Policy`` relation if ``component_id`` exists in the da .decl violating_dependencies(parent: number) violating_dependencies(parent) :- transitive_dependency(parent, dependency), - !check_passed(dependency, "mcn_infer_artifact_pipeline_1"), + !check_passed(dependency, "mcn_find_artifact_pipeline_1"), !check_passed(dependency, "mcn_provenance_level_three_1"). This is the rule that the user needs to design to detect dependencies that violate a security requirement. Here we declare a relation called ``violating_dependencies`` and populate it if the dependencies in the -``transitive_dependency`` relation do not pass any of the ``mcn_infer_artifact_pipeline_1`` and +``transitive_dependency`` relation do not pass any of the ``mcn_find_artifact_pipeline_1`` and ``mcn_provenance_level_three_1`` checks. .. code-block:: prolog @@ -253,7 +253,7 @@ printed to the console will look like the following: failed_policies ['detect-malicious-upload'] component_violates_policy - ['1', 'pkg:github.com/behnazh-w/example-maven-app@34c06e8ae3811885c57f8bd42db61f37ac57eb6c', 'detect-malicious-upload'] + ['1', 'pkg:maven/io.github.behnazh-w.demo/example-maven-app@2.0?type=jar', 'detect-malicious-upload'] As you can see, the policy has failed because the ``io.github.behnazh-w.demo:jackson-databind:1.0`` dependency is manually uploaded to Maven Central and does not meet the security requirement. diff --git a/docs/source/pages/tutorials/exclude_include_checks.rst b/docs/source/pages/tutorials/exclude_include_checks.rst index c0c3d2faa..3cb766483 100644 --- a/docs/source/pages/tutorials/exclude_include_checks.rst +++ b/docs/source/pages/tutorials/exclude_include_checks.rst @@ -24,7 +24,7 @@ This tutorial will show how you can configure Macaron to: Prerequisites ------------- -* You are expected to have gone through :ref:`this tutorial `. +* You are expected to have gone through :ref:`this tutorial `. * This tutorial requires a high-level understanding of checks in Macaron and how they depend on each other. Please see this :ref:`page ` for more information. ------------------ diff --git a/src/macaron/config/defaults.ini b/src/macaron/config/defaults.ini index 8d7b2b1cd..ae0b72cb8 100644 --- a/src/macaron/config/defaults.ini +++ b/src/macaron/config/defaults.ini @@ -146,7 +146,12 @@ wrapper_files = mvnw [builder.maven.ci.build] -github_actions = actions/setup-java +github_actions = + actions/setup-java + # Parent project used in Maven-based projects of the Apache Logging Services. + apache/logging-parent/.github/workflows/build-reusable.yaml + # This action can be used to deploy artifacts to a JFrog artifactory server. + spring-io/artifactory-deploy-action travis_ci = jdk circle_ci = gitlab_ci = @@ -159,6 +164,8 @@ jenkins = [builder.maven.ci.deploy] github_actions = + # Parent project used in Maven-based projects of the Apache Logging Services. + apache/logging-parent/.github/workflows/deploy-release-reusable.yaml travis_ci = gpg:sign-and-deploy-file deploy:deploy @@ -237,6 +244,8 @@ jenkins = [builder.gradle.ci.deploy] github_actions = + # This action can be used to deploy artifacts to a JFrog artifactory server. + spring-io/artifactory-deploy-action travis_ci = artifactoryPublish ./gradlew publish @@ -495,7 +504,7 @@ artifact_extensions = # Package registries. [package_registry] # The allowed time range (in seconds) from a deploy workflow run start time to publish time. -publish_time_range = 3600 +publish_time_range = 7200 # [package_registry.jfrog.maven] # In this example, the Maven repo can be accessed at `https://internal.registry.org/repo-name`. @@ -505,9 +514,12 @@ publish_time_range = 3600 [package_registry.maven_central] # Maven Central host name. -hostname = search.maven.org +search_netloc = search.maven.org +search_scheme = https # The search REST API. See https://central.sonatype.org/search/rest-api-guide/ search_endpoint = solrsearch/select +registry_url_netloc = repo1.maven.org/maven2 +registry_url_scheme = https request_timeout = 20 [package_registry.npm] diff --git a/src/macaron/json_tools.py b/src/macaron/json_tools.py index 4b4aef98c..3cd7a7d37 100644 --- a/src/macaron/json_tools.py +++ b/src/macaron/json_tools.py @@ -31,28 +31,27 @@ def json_extract(entry: dict | list, keys: Sequence[str | int], type_: type[T]) T | None: The found value as the type of the type parameter. """ - target: JsonType = entry for key in keys: - if isinstance(target, dict) and isinstance(key, str): - if key not in target: - logger.debug("JSON key '%s' not found in dict target.", key) + if isinstance(entry, dict) and isinstance(key, str): + if key not in entry: + logger.debug("JSON key '%s' not found in dict entry.", key) return None - elif isinstance(target, list) and isinstance(key, int): - if key < 0 or key >= len(target): - logger.debug("JSON list index '%s' is outside of list bounds %s.", key, len(target)) + elif isinstance(entry, list) and isinstance(key, int): + if key < 0 or key >= len(entry): + logger.debug("JSON list index '%s' is outside of list bounds %s.", key, len(entry)) return None else: - logger.debug("Cannot index '%s' (type: %s) in target (type: %s).", key, type(key), type(target)) + logger.debug("Cannot index '%s' (type: %s) in entry (type: %s).", key, type(key), type(entry)) return None # If statement required for mypy to not complain. The else case can never happen because of the above if block. - if isinstance(target, dict) and isinstance(key, str): - target = target[key] - elif isinstance(target, list) and isinstance(key, int): - target = target[key] + if isinstance(entry, dict) and isinstance(key, str): + entry = entry[key] + elif isinstance(entry, list) and isinstance(key, int): + entry = entry[key] - if isinstance(target, type_): - return target + if isinstance(entry, type_): + return entry - logger.debug("Found value of incorrect type: %s instead of %s.", type(target), type(type_)) + logger.debug("Found value of incorrect type: %s instead of %s.", type(entry), type(type_)) return None diff --git a/src/macaron/repo_finder/provenance_extractor.py b/src/macaron/repo_finder/provenance_extractor.py index 42a8819d0..7b446c00e 100644 --- a/src/macaron/repo_finder/provenance_extractor.py +++ b/src/macaron/repo_finder/provenance_extractor.py @@ -4,6 +4,7 @@ """This module contains methods for extracting repository and commit metadata from provenance files.""" import logging import urllib.parse +from abc import ABC, abstractmethod from packageurl import PackageURL from pydriller import Git @@ -17,6 +18,8 @@ extract_commit_from_version, ) from macaron.slsa_analyzer.provenance.intoto import InTotoPayload, InTotoV1Payload, InTotoV01Payload +from macaron.slsa_analyzer.provenance.intoto.v01 import InTotoV01Statement +from macaron.slsa_analyzer.provenance.intoto.v1 import InTotoV1Statement logger: logging.Logger = logging.getLogger(__name__) @@ -355,3 +358,340 @@ def check_if_repository_purl_and_url_match(url: str, repo_purl: PackageURL) -> b purl_path = f"{repo_purl.namespace}/{purl_path}" # Note that the urllib method includes the "/" before path while the PURL method does not. return f"{parsed_url.hostname}{parsed_url.path}".lower() == f"{expanded_purl_type or repo_purl.type}/{purl_path}" + + +class ProvenanceBuildDefinition(ABC): + """Abstract base class for representing provenance build definitions. + + This class serves as a blueprint for various types of build definitions + in provenance data. It outlines the methods and properties that derived + classes must implement to handle specific build definition types. + """ + + #: Determines the expected ``buildType`` field in the provenance predicate. + expected_build_type: str + + @abstractmethod + def get_build_invocation(self, statement: InTotoV01Statement | InTotoV1Statement) -> tuple[str | None, str | None]: + """Retrieve the build invocation information from the given statement. + + This method is intended to be implemented by subclasses to extract + specific invocation details from a provenance statement. + + Parameters + ---------- + statement : InTotoV1Statement | InTotoV01Statement + The provenance statement from which to extract the build invocation + details. This statement contains the metadata about the build process + and its associated artifacts. + + Returns + ------- + tuple[str | None, str | None] + A tuple containing two elements: + - The first element is the build invocation entry point (e.g., workflow name), or None if not found. + - The second element is the invocation URL or identifier (e.g., job URL), or None if not found. + + Raises + ------ + NotImplementedError + If the method is called directly without being overridden in a subclass. + """ + + +class SLSAGithubGenericBuildDefinitionV01(ProvenanceBuildDefinition): + """Class representing the SLSA GitHub Generic Build Definition (v0.1). + + This class implements the abstract methods defined in `ProvenanceBuildDefinition` + to extract build invocation details specific to the GitHub provenance generator's generic build type. + """ + + #: Determines the expected ``buildType`` field in the provenance predicate. + expected_build_type = "https://github.com/slsa-framework/slsa-github-generator/generic@v1" + + def get_build_invocation(self, statement: InTotoV01Statement | InTotoV1Statement) -> tuple[str | None, str | None]: + """Retrieve the build invocation information from the given statement. + + Parameters + ---------- + statement : InTotoV1Statement | InTotoV01Statement + The provenance statement from which to extract the build invocation + details. This statement contains the metadata about the build process + and its associated artifacts. + + Returns + ------- + tuple[str | None, str | None] + A tuple containing two elements: + - The first element is the build invocation entry point (e.g., workflow name), or None if not found. + - The second element is the invocation URL or identifier (e.g., job URL), or None if not found. + """ + if statement["predicate"] is None: + return None, None + gha_workflow = json_extract(statement["predicate"], ["invocation", "configSource", "entryPoint"], str) + gh_run_id = json_extract(statement["predicate"], ["invocation", "environment", "github_run_id"], str) + repo_uri = json_extract(statement["predicate"], ["invocation", "configSource", "uri"], str) + repo = None + if repo_uri: + repo = _clean_spdx(repo_uri) + if repo is None: + return gha_workflow, repo + invocation_url = f"{repo}/" f"actions/runs/{gh_run_id}" + return gha_workflow, invocation_url + + +class SLSAGithubActionsBuildDefinitionV1(ProvenanceBuildDefinition): + """Class representing the SLSA GitHub Actions Build Definition (v1). + + This class implements the abstract methods from the `ProvenanceBuildDefinition` + to extract build invocation details specific to the GitHub Actions build type. + """ + + #: Determines the expected ``buildType`` field in the provenance predicate. + expected_build_type = "https://slsa-framework.github.io/github-actions-buildtypes/workflow/v1" + + def get_build_invocation(self, statement: InTotoV01Statement | InTotoV1Statement) -> tuple[str | None, str | None]: + """Retrieve the build invocation information from the given statement. + + Parameters + ---------- + statement : InTotoV1Statement | InTotoV01Statement + The provenance statement from which to extract the build invocation + details. This statement contains the metadata about the build process + and its associated artifacts. + + Returns + ------- + tuple[str | None, str | None] + A tuple containing two elements: + - The first element is the build invocation entry point (e.g., workflow name), or None if not found. + - The second element is the invocation URL or identifier (e.g., job URL), or None if not found. + """ + if statement["predicate"] is None: + return None, None + + gha_workflow = json_extract( + statement["predicate"], ["buildDefinition", "externalParameters", "workflow", "path"], str + ) + invocation_url = json_extract(statement["predicate"], ["runDetails", "metadata", "invocationId"], str) + return gha_workflow, invocation_url + + +class SLSANPMCLIBuildDefinitionV2(ProvenanceBuildDefinition): + """Class representing the SLSA NPM CLI Build Definition (v12). + + This class implements the abstract methods from the `ProvenanceBuildDefinition` + to extract build invocation details specific to the GitHub Actions build type. + """ + + #: Determines the expected ``buildType`` field in the provenance predicate. + expected_build_type = "https://github.com/npm/cli/gha/v2" + + def get_build_invocation(self, statement: InTotoV01Statement | InTotoV1Statement) -> tuple[str | None, str | None]: + """Retrieve the build invocation information from the given statement. + + Parameters + ---------- + statement : InTotoV1Statement | InTotoV01Statement + The provenance statement from which to extract the build invocation + details. This statement contains the metadata about the build process + and its associated artifacts. + + Returns + ------- + tuple[str | None, str | None] + A tuple containing two elements: + - The first element is the build invocation entry point (e.g., workflow name), or None if not found. + - The second element is the invocation URL or identifier (e.g., job URL), or None if not found. + """ + if statement["predicate"] is None: + return None, None + gha_workflow = json_extract(statement["predicate"], ["invocation", "configSource", "entryPoint"], str) + gh_run_id = json_extract(statement["predicate"], ["invocation", "environment", "GITHUB_RUN_ID"], str) + repo_uri = json_extract(statement["predicate"], ["invocation", "configSource", "uri"], str) + repo = None + if repo_uri: + repo = _clean_spdx(repo_uri) + if repo is None: + return gha_workflow, repo + invocation_url = f"{repo}/" f"actions/runs/{gh_run_id}" + return gha_workflow, invocation_url + + +class SLSAGCBBuildDefinitionV1(ProvenanceBuildDefinition): + """Class representing the SLSA Google Cloud Build (GCB) Build Definition (v1). + + This class implements the abstract methods from `ProvenanceBuildDefinition` + to extract build invocation details specific to the Google Cloud Build (GCB). + """ + + #: Determines the expected ``buildType`` field in the provenance predicate. + expected_build_type = "https://slsa-framework.github.io/gcb-buildtypes/triggered-build/v1" + + def get_build_invocation(self, statement: InTotoV01Statement | InTotoV1Statement) -> tuple[str | None, str | None]: + """Retrieve the build invocation information from the given statement. + + Parameters + ---------- + statement : InTotoV1Statement | InTotoV01Statement + The provenance statement from which to extract the build invocation + details. This statement contains the metadata about the build process + and its associated artifacts. + + Returns + ------- + tuple[str | None, str | None] + A tuple containing two elements: + - The first element is the build invocation entry point (e.g., workflow name), or None if not found. + - The second element is the invocation URL or identifier (e.g., job URL), or None if not found. + """ + # TODO implement this method. + return None, None + + +class SLSAOCIBuildDefinitionV1(ProvenanceBuildDefinition): + """Class representing the SLSA Oracle Cloud Infrastructure (OCI) Build Definition (v1). + + This class implements the abstract methods from `ProvenanceBuildDefinition` + to extract build invocation details specific to OCI builds. + """ + + #: Determines the expected ``buildType`` field in the provenance predicate. + expected_build_type = ( + "https://github.com/oracle/macaron/tree/main/src/macaron/resources/provenance-buildtypes/oci/v1" + ) + + def get_build_invocation(self, statement: InTotoV01Statement | InTotoV1Statement) -> tuple[str | None, str | None]: + """Retrieve the build invocation information from the given statement. + + Parameters + ---------- + statement : InTotoV1Statement | InTotoV01Statement + The provenance statement from which to extract the build invocation + details. This statement contains the metadata about the build process + and its associated artifacts. + + Returns + ------- + tuple[str | None, str | None] + A tuple containing two elements: + - The first element is the build invocation entry point (e.g., workflow name), or None if not found. + - The second element is the invocation URL or identifier (e.g., job URL), or None if not found. + """ + # TODO implement this method. + return None, None + + +class WitnessGitLabBuildDefinitionV01(ProvenanceBuildDefinition): + """Class representing the Witness GitLab Build Definition (v0.1). + + This class implements the abstract methods from `ProvenanceBuildDefinition` + to extract build invocation details specific to GitLab. + """ + + #: Determines the expected ``buildType`` field in the provenance predicate. + expected_build_type = "https://witness.testifysec.com/attestation-collection/v0.1" + + #: Determines the expected ``attestations.type`` field in the Witness provenance predicate. + expected_attestation_type = "https://witness.dev/attestations/gitlab/v0.1" + + def get_build_invocation(self, statement: InTotoV01Statement | InTotoV1Statement) -> tuple[str | None, str | None]: + """Retrieve the build invocation information from the given statement. + + Parameters + ---------- + statement : InTotoV1Statement | InTotoV01Statement + The provenance statement from which to extract the build invocation + details. This statement contains the metadata about the build process + and its associated artifacts. + + Returns + ------- + tuple[str | None, str | None] + A tuple containing two elements: + - The first element is the build invocation entry point (e.g., workflow name), or None if not found. + - The second element is the invocation URL or identifier (e.g., job URL), or None if not found. + """ + if statement["predicate"] is None: + return None, None + + attestation_type = json_extract(statement["predicate"], ["attestations", "type"], str) + if self.expected_attestation_type != attestation_type: + return None, None + gl_workflow = json_extract(statement["predicate"], ["attestations", "attestation", "ciconfigpath"], str) + gl_job_url = json_extract(statement["predicate"], ["attestations", "attestation", "joburl"], str) + return gl_workflow, gl_job_url + + +class ProvenancePredicate: + """Class providing utility methods for handling provenance predicates. + + This class contains static methods for extracting information from predicates in + provenance statements related to various build definitions. It serves as a helper + for identifying build types and finding the appropriate build definitions based on the extracted data. + """ + + @staticmethod + def get_build_type(statement: InTotoV1Statement | InTotoV01Statement) -> str | None: + """Extract the build type from the provided provenance statement. + + Parameters + ---------- + statement : InTotoV1Statement | InTotoV01Statement + The provenance statement from which to extract the build type. + + Returns + ------- + str | None + The build type if found; otherwise, None. + """ + if statement["predicate"] is None: + return None + + # Different build provenances might store the buildType field in different sections. + if build_type := json_extract(statement["predicate"], ["buildType"], str): + return build_type + + return json_extract(statement["predicate"], ["buildDefinition", "buildType"], str) + + @staticmethod + def find_build_def(statement: InTotoV01Statement | InTotoV1Statement) -> ProvenanceBuildDefinition: + """Find the appropriate build definition class based on the extracted build type. + + This method checks the provided provenance statement for its build type + and returns the corresponding `ProvenanceBuildDefinition` subclass. + + Parameters + ---------- + statement : InTotoV01Statement | InTotoV1Statement + The provenance statement containing the build type information. + + Returns + ------- + ProvenanceBuildDefinition + An instance of the appropriate build definition class that matches the + extracted build type. + + Raises + ------ + ProvenanceError + Raised when the build definition cannot be found in the provenance statement. + """ + build_type = ProvenancePredicate.get_build_type(statement) + if build_type is None: + raise ProvenanceError("Unable to find buildType in the provenance statement.") + + build_defs: list[ProvenanceBuildDefinition] = [ + SLSAGithubGenericBuildDefinitionV01(), + SLSAGithubActionsBuildDefinitionV1(), + SLSANPMCLIBuildDefinitionV2(), + SLSAGCBBuildDefinitionV1(), + SLSAOCIBuildDefinitionV1(), + WitnessGitLabBuildDefinitionV01(), + ] + + for build_def in build_defs: + if build_def.expected_build_type == build_type: + return build_def + + raise ProvenanceError("Unable to find build definition in the provenance statement.") diff --git a/src/macaron/repo_finder/repo_finder_deps_dev.py b/src/macaron/repo_finder/repo_finder_deps_dev.py index 468bf472e..4696caa27 100644 --- a/src/macaron/repo_finder/repo_finder_deps_dev.py +++ b/src/macaron/repo_finder/repo_finder_deps_dev.py @@ -125,7 +125,7 @@ def _create_urls(self, purl: PackageURL) -> list[str]: The list of created URLs. """ # See https://docs.deps.dev/api/v3alpha/ - base_url = f"https://api.deps.dev/v3alpha/purl/{encode(str(purl)).replace('/', '%2F')}" + base_url = f"https://api.deps.dev/v3alpha/purl/{encode(str(purl), safe='')}" if not base_url: return [] diff --git a/src/macaron/slsa_analyzer/analyze_context.py b/src/macaron/slsa_analyzer/analyze_context.py index 1f00df010..f6c8fd22a 100644 --- a/src/macaron/slsa_analyzer/analyze_context.py +++ b/src/macaron/slsa_analyzer/analyze_context.py @@ -313,7 +313,7 @@ def __str__(self) -> str: return output -def store_inferred_provenance( +def store_inferred_build_info_results( ctx: AnalyzeContext, ci_info: CIInfo, ci_service: BaseCIService, @@ -321,8 +321,9 @@ def store_inferred_provenance( job_id: str | None = None, step_id: str | None = None, step_name: str | None = None, + callee_node_type: str | None = None, ) -> None: - """Store the data related to the build provenance when the project does not generate provenances. + """Store the data related to the build. Parameters ---------- @@ -340,15 +341,13 @@ def store_inferred_provenance( The CI step ID. step_name: str | None The CI step name. + callee_node_type: str | None + The callee node type in the call graph. """ # TODO: This data is potentially duplicated in the check result tables. Instead of storing the data # in the context object, retrieve it from the result tables and remove this function. - if ( - ctx.dynamic_data["is_inferred_prov"] - and ci_info["provenances"] - and isinstance(ci_info["provenances"][0].payload, InTotoV01Payload) - ): - predicate: Any = ci_info["provenances"][0].payload.statement["predicate"] + if isinstance(ci_info["build_info_results"], InTotoV01Payload): + predicate: Any = ci_info["build_info_results"].statement["predicate"] predicate["buildType"] = f"Custom {ci_service.name}" predicate["builder"]["id"] = trigger_link predicate["invocation"]["configSource"]["uri"] = ( @@ -359,3 +358,4 @@ def store_inferred_provenance( predicate["buildConfig"]["jobID"] = job_id or "" predicate["buildConfig"]["stepID"] = step_id or "" predicate["buildConfig"]["stepName"] = step_name or "" + predicate["buildConfig"]["calleeType"] = callee_node_type diff --git a/src/macaron/slsa_analyzer/analyzer.py b/src/macaron/slsa_analyzer/analyzer.py index 6f809894a..8190f87fd 100644 --- a/src/macaron/slsa_analyzer/analyzer.py +++ b/src/macaron/slsa_analyzer/analyzer.py @@ -927,6 +927,7 @@ def _determine_ci_services(self, analyze_ctx: AnalyzeContext, git_service: BaseG asset=VirtualReleaseAsset(name="No_ASSET", url="NO_URL", size_in_bytes=0), ) ], + build_info_results=InTotoV01Payload(statement=Provenance().payload), ) ) diff --git a/src/macaron/slsa_analyzer/build_tool/base_build_tool.py b/src/macaron/slsa_analyzer/build_tool/base_build_tool.py index 13a25cde3..db0fff3cb 100644 --- a/src/macaron/slsa_analyzer/build_tool/base_build_tool.py +++ b/src/macaron/slsa_analyzer/build_tool/base_build_tool.py @@ -307,7 +307,54 @@ def match_cmd_args(self, cmd: list[str], tools: list[str], args: list[str]) -> b return False - def infer_confidence_deploy_command(self, cmd: BuildToolCommand) -> Confidence: + def infer_confidence_deploy_workflow(self, ci_path: str, provenance_workflow: str | None = None) -> Confidence: + """ + Infer the confidence level for the deploy CI workflow. + + Parameters + ---------- + ci_path: str + The path to the CI workflow. + provenance_workflow: str | None + The relative path to the root CI file that is captured in a provenance or None if provenance is not found. + + Returns + ------- + Confidence + The confidence level for the deploy command. + """ + # Apply heuristics and assign weights and scores for the discovered evidence. + evidence_weight_map = EvidenceWeightMap( + [ + Evidence(name="ci_workflow_deploy", found=False, weight=2), + ] + ) + + # Check if the CI workflow path for the build command is captured in a provenance file. + if provenance_workflow and ci_path.endswith(provenance_workflow): + # We add this evidence only if a provenance is found to make sure we pick the right triggering + # workflow in the call graph. Otherwise, lack of provenance would have always lowered the + # confidence score, making the rest of the heuristics less effective. + evidence_weight_map.add( + Evidence(name="workflow_in_provenance", found=True, weight=5), + ) + + # Check workflow names. + deploy_keywords = ["release", "deploy", "publish"] + test_keywords = ["test", "snapshot"] + for deploy_kw in deploy_keywords: + if deploy_kw in os.path.basename(ci_path.lower()): + is_test = (test_kw for test_kw in test_keywords if test_kw in os.path.basename(ci_path.lower())) + if any(is_test): + continue + evidence_weight_map.update_result(name="ci_workflow_release", found=True) + break + + return Confidence.normalize(evidence_weight_map=evidence_weight_map) + + def infer_confidence_deploy_command( + self, cmd: BuildToolCommand, provenance_workflow: str | None = None + ) -> Confidence: """ Infer the confidence level for the deploy command. @@ -315,6 +362,8 @@ def infer_confidence_deploy_command(self, cmd: BuildToolCommand) -> Confidence: ---------- cmd: BuildToolCommand The build tool command object. + provenance_workflow: str | None + The relative path to the root CI file that is captured in a provenance or None if provenance is not found. Returns ------- @@ -332,6 +381,15 @@ def infer_confidence_deploy_command(self, cmd: BuildToolCommand) -> Confidence: ] ) + # Check if the CI workflow path for the build command is captured in a provenance file. + if provenance_workflow and cmd["ci_path"].endswith(provenance_workflow): + # We add this evidence only if a provenance is found to make sure we pick the right triggering + # workflow in the call graph. Otherwise, lack of provenance would have always lowered the + # confidence score, making the rest of the heuristics less effective. + evidence_weight_map.add( + Evidence(name="workflow_in_provenance", found=True, weight=5), + ) + # Check if secrets are present in the caller job. if cmd["reachable_secrets"]: evidence_weight_map.update_result(name="reachable_secrets", found=True) @@ -350,7 +408,7 @@ def infer_confidence_deploy_command(self, cmd: BuildToolCommand) -> Confidence: return Confidence.normalize(evidence_weight_map=evidence_weight_map) def is_deploy_command( - self, cmd: BuildToolCommand, excluded_configs: list[str] | None = None + self, cmd: BuildToolCommand, excluded_configs: list[str] | None = None, provenance_workflow: str | None = None ) -> tuple[bool, Confidence]: """ Determine if the command is a deploy command. @@ -364,6 +422,8 @@ def is_deploy_command( The build tool command object. excluded_configs: list[str] | None Build tool commands that are called from these configuration files are excluded. + provenance_workflow: str | None + The relative path to the root CI file that is captured in a provenance or None if provenance is not found. Returns ------- @@ -383,7 +443,7 @@ def is_deploy_command( if excluded_configs and os.path.basename(cmd["ci_path"]) in excluded_configs: return False, Confidence.HIGH - return True, self.infer_confidence_deploy_command(cmd=cmd) + return True, self.infer_confidence_deploy_command(cmd=cmd, provenance_workflow=provenance_workflow) def is_package_command( self, cmd: BuildToolCommand, excluded_configs: list[str] | None = None diff --git a/src/macaron/slsa_analyzer/build_tool/npm.py b/src/macaron/slsa_analyzer/build_tool/npm.py index 27c7e2de3..5f575b899 100644 --- a/src/macaron/slsa_analyzer/build_tool/npm.py +++ b/src/macaron/slsa_analyzer/build_tool/npm.py @@ -90,7 +90,7 @@ def get_dep_analyzer(self) -> DependencyAnalyzer: return NoneDependencyAnalyzer() def is_deploy_command( - self, cmd: BuildToolCommand, excluded_configs: list[str] | None = None + self, cmd: BuildToolCommand, excluded_configs: list[str] | None = None, provenance_workflow: str | None = None ) -> tuple[bool, Confidence]: """ Determine if the command is a deploy command. @@ -104,6 +104,8 @@ def is_deploy_command( The build tool command object. excluded_configs: list[str] | None Build tool commands that are called from these configuration files are excluded. + provenance_workflow: str | None + The relative path to the root CI file that is captured in a provenance or None if provenance is not found. Returns ------- @@ -134,7 +136,7 @@ def is_deploy_command( if excluded_configs and os.path.basename(cmd["ci_path"]) in excluded_configs: return False, Confidence.HIGH - return True, self.infer_confidence_deploy_command(cmd) + return True, self.infer_confidence_deploy_command(cmd, provenance_workflow) def is_package_command( self, cmd: BuildToolCommand, excluded_configs: list[str] | None = None diff --git a/src/macaron/slsa_analyzer/build_tool/pip.py b/src/macaron/slsa_analyzer/build_tool/pip.py index da3f980cd..5abf0c0ba 100644 --- a/src/macaron/slsa_analyzer/build_tool/pip.py +++ b/src/macaron/slsa_analyzer/build_tool/pip.py @@ -98,7 +98,7 @@ def get_dep_analyzer(self) -> DependencyAnalyzer: ) def is_deploy_command( - self, cmd: BuildToolCommand, excluded_configs: list[str] | None = None + self, cmd: BuildToolCommand, excluded_configs: list[str] | None = None, provenance_workflow: str | None = None ) -> tuple[bool, Confidence]: """ Determine if the command is a deploy command. @@ -112,6 +112,8 @@ def is_deploy_command( The build tool command object. excluded_configs: list[str] | None Build tool commands that are called from these configuration files are excluded. + provenance_workflow: str | None + The relative path to the root CI file that is captured in a provenance or None if provenance is not found. Returns ------- @@ -141,7 +143,7 @@ def is_deploy_command( if excluded_configs and os.path.basename(cmd["ci_path"]) in excluded_configs: return False, Confidence.HIGH - return True, self.infer_confidence_deploy_command(cmd) + return True, self.infer_confidence_deploy_command(cmd, provenance_workflow) def is_package_command( self, cmd: BuildToolCommand, excluded_configs: list[str] | None = None diff --git a/src/macaron/slsa_analyzer/build_tool/poetry.py b/src/macaron/slsa_analyzer/build_tool/poetry.py index bd538a5ea..eeb54216b 100644 --- a/src/macaron/slsa_analyzer/build_tool/poetry.py +++ b/src/macaron/slsa_analyzer/build_tool/poetry.py @@ -136,7 +136,7 @@ def get_dep_analyzer(self) -> DependencyAnalyzer: ) def is_deploy_command( - self, cmd: BuildToolCommand, excluded_configs: list[str] | None = None + self, cmd: BuildToolCommand, excluded_configs: list[str] | None = None, provenance_workflow: str | None = None ) -> tuple[bool, Confidence]: """ Determine if the command is a deploy command. @@ -150,6 +150,8 @@ def is_deploy_command( The build tool command object. excluded_configs: list[str] | None Build tool commands that are called from these configuration files are excluded. + provenance_workflow: str | None + The relative path to the root CI file that is captured in a provenance or None if provenance is not found. Returns ------- @@ -179,7 +181,7 @@ def is_deploy_command( if excluded_configs and os.path.basename(cmd["ci_path"]) in excluded_configs: return False, Confidence.HIGH - return True, self.infer_confidence_deploy_command(cmd) + return True, self.infer_confidence_deploy_command(cmd, provenance_workflow) def is_package_command( self, cmd: BuildToolCommand, excluded_configs: list[str] | None = None diff --git a/src/macaron/slsa_analyzer/build_tool/yarn.py b/src/macaron/slsa_analyzer/build_tool/yarn.py index 2856dc4ee..90c424035 100644 --- a/src/macaron/slsa_analyzer/build_tool/yarn.py +++ b/src/macaron/slsa_analyzer/build_tool/yarn.py @@ -88,7 +88,7 @@ def get_dep_analyzer(self) -> DependencyAnalyzer: return NoneDependencyAnalyzer() def is_deploy_command( - self, cmd: BuildToolCommand, excluded_configs: list[str] | None = None + self, cmd: BuildToolCommand, excluded_configs: list[str] | None = None, provenance_workflow: str | None = None ) -> tuple[bool, Confidence]: """ Determine if the command is a deploy command. @@ -102,6 +102,8 @@ def is_deploy_command( The build tool command object. excluded_configs: list[str] | None Build tool commands that are called from these configuration files are excluded. + provenance_workflow: str | None + The relative path to the root CI file that is captured in a provenance or None if provenance is not found. Returns ------- @@ -132,7 +134,7 @@ def is_deploy_command( if excluded_configs and os.path.basename(cmd["ci_path"]) in excluded_configs: return False, Confidence.HIGH - return True, self.infer_confidence_deploy_command(cmd) + return True, self.infer_confidence_deploy_command(cmd, provenance_workflow) def is_package_command( self, cmd: BuildToolCommand, excluded_configs: list[str] | None = None diff --git a/src/macaron/slsa_analyzer/checks/build_as_code_check.py b/src/macaron/slsa_analyzer/checks/build_as_code_check.py index 0a0f95c48..df00ef2b3 100644 --- a/src/macaron/slsa_analyzer/checks/build_as_code_check.py +++ b/src/macaron/slsa_analyzer/checks/build_as_code_check.py @@ -12,10 +12,11 @@ from sqlalchemy.sql.sqltypes import String from macaron.database.table_definitions import CheckFacts -from macaron.errors import CallGraphError +from macaron.errors import CallGraphError, ProvenanceError from macaron.parsers.bashparser import BashNode -from macaron.parsers.github_workflow_model import ActionStep, Identified, ReusableWorkflowCallJob -from macaron.slsa_analyzer.analyze_context import AnalyzeContext, store_inferred_provenance +from macaron.parsers.github_workflow_model import ActionStep +from macaron.repo_finder.provenance_extractor import ProvenancePredicate +from macaron.slsa_analyzer.analyze_context import AnalyzeContext, store_inferred_build_info_results from macaron.slsa_analyzer.checks.base_check import BaseCheck from macaron.slsa_analyzer.checks.check_result import CheckResultData, CheckResultType, Confidence, JustificationType from macaron.slsa_analyzer.ci_service.base_ci_service import BaseCIService, NoneCIService @@ -78,9 +79,9 @@ class BuildAsCodeFacts(CheckFacts): class BuildAsCodeCheck(BaseCheck): - """This class checks the build as code requirement. + """This check analyzes the CI configurations to determine if the software component is published automatically. - See https://slsa.dev/spec/v0.1/requirements#build-as-code. + As a requirement of this check, the software component should be published using a hosted build service. """ def __init__(self) -> None: @@ -121,6 +122,17 @@ def run_check(self, ctx: AnalyzeContext) -> CheckResultData: if not build_tools: return CheckResultData(result_tables=[], result_type=CheckResultType.FAILED) + # If a provenance is found, obtain the workflow that has triggered the artifact release. + prov_workflow = None + prov_payload = ctx.dynamic_data["provenance"] + if not ctx.dynamic_data["is_inferred_prov"] and prov_payload: + try: + build_def = ProvenancePredicate.find_build_def(prov_payload.statement) + except ProvenanceError as error: + logger.error(error) + return CheckResultData(result_tables=[], result_type=CheckResultType.FAILED) + prov_workflow, _ = build_def.get_build_invocation(prov_payload.statement) + ci_services = ctx.dynamic_data["ci_services"] # Check if "build as code" holds for each build tool. @@ -150,27 +162,30 @@ def run_check(self, ctx: AnalyzeContext) -> CheckResultData: logger.debug("Workflow %s is not relevant. Skipping...", callee.name) continue if workflow_name in trusted_deploy_actions: - job_id = "" - step_id = "" - step_name = "" + job_id = None + step_id = None + step_name = None caller_path = "" job = callee.caller - if isinstance(job, GitHubJobNode): - job_id = job.parsed_obj.id - caller_path = job.source_path + # We always expect the caller of the node that calls a third-party + # or Reusable GitHub Action to be a GitHubJobNode. + if not isinstance(job, GitHubJobNode): + continue + + job_id = job.parsed_obj.id + caller_path = job.source_path + + # Only third-party Actions can be called from a step. + # Reusable workflows have to be directly called from the job. + # See https://docs.github.com/en/actions/sharing-automations/ \ + # reusing-workflows#calling-a-reusable-workflow if callee.node_type == GitHubWorkflowType.EXTERNAL: callee_step_obj = cast(ActionStep, callee.parsed_obj) if "id" in callee_step_obj: step_id = callee_step_obj["id"] if "name" in callee_step_obj: step_name = callee_step_obj["name"] - else: - callee_reusable = cast(Identified[ReusableWorkflowCallJob], callee.parsed_obj) - step_id = callee_reusable.id - callee_reusable_job = callee_reusable.obj - if "name" in callee_reusable_job: - step_name = callee_reusable_job["name"] trigger_link = ci_service.api_client.get_file_link( ctx.component.repository.full_name, @@ -183,15 +198,27 @@ def run_check(self, ctx: AnalyzeContext) -> CheckResultData: else "" ), ) - store_inferred_provenance( - ctx=ctx, - ci_info=ci_info, - ci_service=ci_service, - trigger_link=trigger_link, - job_id=job_id, - step_id=step_id, - step_name=step_name, + + trusted_workflow_confidence = tool.infer_confidence_deploy_workflow( + ci_path=caller_path, provenance_workflow=prov_workflow ) + # Store or update the inferred build information if the confidence + # for the current check fact is bigger than the maximum score. + if ( + not result_tables + or trusted_workflow_confidence + > max(result_tables, key=lambda item: item.confidence).confidence + ): + store_inferred_build_info_results( + ctx=ctx, + ci_info=ci_info, + ci_service=ci_service, + trigger_link=trigger_link, + job_id=job_id, + step_id=step_id, + step_name=step_name, + callee_node_type=callee.node_type.value, + ) result_tables.append( BuildAsCodeFacts( build_tool_name=tool.name, @@ -199,7 +226,7 @@ def run_check(self, ctx: AnalyzeContext) -> CheckResultData: build_trigger=trigger_link, language=tool.language.value, deploy_command=workflow_name, - confidence=Confidence.HIGH, + confidence=trusted_workflow_confidence, ) ) overall_res = CheckResultType.PASSED @@ -207,9 +234,12 @@ def run_check(self, ctx: AnalyzeContext) -> CheckResultData: for build_command in ci_service.get_build_tool_commands( callgraph=ci_info["callgraph"], build_tool=tool ): + # Yes or no with a confidence score. result, confidence = tool.is_deploy_command( - build_command, ci_service.get_third_party_configurations() + build_command, + ci_service.get_third_party_configurations(), + provenance_workflow=prov_workflow, ) if result: trigger_link = ci_service.api_client.get_file_link( @@ -219,13 +249,13 @@ def run_check(self, ctx: AnalyzeContext) -> CheckResultData: os.path.basename(build_command["ci_path"]) ), ) - # Store or update the inferred provenance if the confidence + # Store or update the inferred build information if the confidence # for the current check fact is bigger than the maximum score. if ( not result_tables or confidence > max(result_tables, key=lambda item: item.confidence).confidence ): - store_inferred_provenance( + store_inferred_build_info_results( ctx=ctx, ci_info=ci_info, ci_service=ci_service, @@ -280,7 +310,7 @@ def run_check(self, ctx: AnalyzeContext) -> CheckResultData: if not config_name: break - store_inferred_provenance( + store_inferred_build_info_results( ctx=ctx, ci_info=ci_info, ci_service=ci_service, trigger_link=config_name ) result_tables.append( diff --git a/src/macaron/slsa_analyzer/checks/build_service_check.py b/src/macaron/slsa_analyzer/checks/build_service_check.py index c5f43c484..abbef2f35 100644 --- a/src/macaron/slsa_analyzer/checks/build_service_check.py +++ b/src/macaron/slsa_analyzer/checks/build_service_check.py @@ -12,7 +12,7 @@ from macaron.database.table_definitions import CheckFacts from macaron.errors import CallGraphError -from macaron.slsa_analyzer.analyze_context import AnalyzeContext, store_inferred_provenance +from macaron.slsa_analyzer.analyze_context import AnalyzeContext, store_inferred_build_info_results from macaron.slsa_analyzer.checks.base_check import BaseCheck from macaron.slsa_analyzer.checks.check_result import CheckResultData, CheckResultType, Confidence, JustificationType from macaron.slsa_analyzer.ci_service.base_ci_service import BaseCIService, NoneCIService @@ -140,7 +140,7 @@ def run_check(self, ctx: AnalyzeContext) -> CheckResultData: not result_tables or confidence > max(result_tables, key=lambda item: item.confidence).confidence ): - store_inferred_provenance( + store_inferred_build_info_results( ctx=ctx, ci_info=ci_info, ci_service=ci_service, trigger_link=trigger_link ) result_tables.append( @@ -181,7 +181,7 @@ def run_check(self, ctx: AnalyzeContext) -> CheckResultData: if not config_name: break - store_inferred_provenance( + store_inferred_build_info_results( ctx=ctx, ci_info=ci_info, ci_service=ci_service, trigger_link=config_name ) result_tables.append( diff --git a/src/macaron/slsa_analyzer/checks/check_result.py b/src/macaron/slsa_analyzer/checks/check_result.py index 5e7193099..f9d5c1ad0 100644 --- a/src/macaron/slsa_analyzer/checks/check_result.py +++ b/src/macaron/slsa_analyzer/checks/check_result.py @@ -201,7 +201,7 @@ def justification_report(self) -> list[tuple[Confidence, list]]: # Look for columns that are have "justification" metadata. for col in result.__table__.columns: column_value = getattr(result, col.name) - if col.info.get("justification") and column_value: + if col.info.get("justification") and column_value is not None: if col.info.get("justification") == JustificationType.HREF: dict_elements[col.name] = column_value elif col.info.get("justification") == JustificationType.TEXT: diff --git a/src/macaron/slsa_analyzer/checks/infer_artifact_pipeline_check.py b/src/macaron/slsa_analyzer/checks/infer_artifact_pipeline_check.py index 82afc9720..594c5c467 100644 --- a/src/macaron/slsa_analyzer/checks/infer_artifact_pipeline_check.py +++ b/src/macaron/slsa_analyzer/checks/infer_artifact_pipeline_check.py @@ -4,58 +4,78 @@ """This module contains the InferArtifactPipelineCheck class to check if an artifact is published from a pipeline automatically.""" import logging +from datetime import datetime -from sqlalchemy import ForeignKey +from sqlalchemy import Boolean, ForeignKey from sqlalchemy.orm import Mapped, mapped_column from sqlalchemy.sql.sqltypes import String from macaron.config.defaults import defaults from macaron.database.table_definitions import CheckFacts -from macaron.errors import InvalidHTTPResponseError +from macaron.errors import InvalidHTTPResponseError, ProvenanceError +from macaron.json_tools import json_extract +from macaron.repo_finder.provenance_extractor import ProvenancePredicate from macaron.slsa_analyzer.analyze_context import AnalyzeContext -from macaron.slsa_analyzer.build_tool.gradle import Gradle -from macaron.slsa_analyzer.build_tool.maven import Maven from macaron.slsa_analyzer.checks.base_check import BaseCheck from macaron.slsa_analyzer.checks.check_result import CheckResultData, CheckResultType, Confidence, JustificationType from macaron.slsa_analyzer.ci_service.base_ci_service import NoneCIService -from macaron.slsa_analyzer.package_registry.maven_central_registry import MavenCentralRegistry -from macaron.slsa_analyzer.provenance.intoto import InTotoV01Payload from macaron.slsa_analyzer.registry import registry from macaron.slsa_analyzer.slsa_req import ReqName -from macaron.slsa_analyzer.specs.package_registry_spec import PackageRegistryInfo logger: logging.Logger = logging.getLogger(__name__) -class InferArtifactPipelineFacts(CheckFacts): +class ArtifactPipelineFacts(CheckFacts): """The ORM mapping for justifications of the infer_artifact_pipeline check.""" - __tablename__ = "_infer_artifact_pipeline_check" + __tablename__ = "_artifact_pipeline_check" #: The primary key. id: Mapped[int] = mapped_column(ForeignKey("_check_facts.id"), primary_key=True) # noqa: A003 + #: The URL of the workflow file that triggered deploy. + deploy_workflow: Mapped[str] = mapped_column(String, nullable=True, info={"justification": JustificationType.HREF}) + #: The workflow job that triggered deploy. - deploy_job: Mapped[str] = mapped_column(String, nullable=False, info={"justification": JustificationType.TEXT}) + deploy_job: Mapped[str] = mapped_column(String, nullable=True, info={"justification": JustificationType.TEXT}) #: The workflow step that triggered deploy. - deploy_step: Mapped[str] = mapped_column(String, nullable=False, info={"justification": JustificationType.TEXT}) + deploy_step: Mapped[str | None] = mapped_column( + String, nullable=True, info={"justification": JustificationType.TEXT} + ) #: The workflow run URL. - run_url: Mapped[str] = mapped_column(String, nullable=False, info={"justification": JustificationType.HREF}) + run_url: Mapped[str | None] = mapped_column(String, nullable=True, info={"justification": JustificationType.HREF}) + + #: The triggering workflow is found from a provenance. + from_provenance: Mapped[bool] = mapped_column( + Boolean, nullable=False, info={"justification": JustificationType.TEXT} + ) + + #: The CI pipeline data is deleted. + run_deleted: Mapped[bool] = mapped_column(Boolean, nullable=False, info={"justification": JustificationType.TEXT}) + + #: The artifact has been published before the code was committed to the source-code repository. + published_before_commit: Mapped[bool] = mapped_column( + Boolean, nullable=False, info={"justification": JustificationType.TEXT} + ) __mapper_args__ = { "polymorphic_identity": "_infer_artifact_pipeline_check", } -class InferArtifactPipelineCheck(BaseCheck): - """This check detects a potential pipeline from which an artifact is published. +class ArtifactPipelineCheck(BaseCheck): + """This check detects a pipeline from which an artifact is published. + + This check depends on the deploy command identified by the ``mcn_build_as_code_1 check``. + If a deploy command is detected, this check will attempt to locate a successful CI + pipeline that triggered the step containing the deploy command. - When a verifiable provenance is found for an artifact, the result of this check can be discarded. - Otherwise, we check whether a CI workflow run has automatically published the artifact. + When a verifiable provenance is found for an artifact, we use it to obtain the pipeline trigger. + Otherwise, we use heuristics to find the triggering pipeline. - We use several heuristics in this check: + We use several heuristics in this check for inference: * The workflow run should have started before the artifact is published. * The workflow step that calls a deploy command should have run successfully. @@ -68,16 +88,19 @@ class InferArtifactPipelineCheck(BaseCheck): def __init__(self) -> None: """Initialize the InferArtifactPipeline instance.""" - check_id = "mcn_infer_artifact_pipeline_1" - description = "Detects potential pipelines from which an artifact is published." + check_id = "mcn_find_artifact_pipeline_1" + description = """ + Detects pipelines from which an artifact is published. + + When a verifiable provenance is found for an artifact, we use it to obtain the pipeline trigger. + """ depends_on: list[tuple[str, CheckResultType]] = [("mcn_build_as_code_1", CheckResultType.PASSED)] - eval_reqs = [ReqName.BUILD_AS_CODE] + eval_reqs: list[ReqName] = [] super().__init__( check_id=check_id, description=description, depends_on=depends_on, eval_reqs=eval_reqs, - result_on_skip=CheckResultType.FAILED, ) def run_check(self, ctx: AnalyzeContext) -> CheckResultData: @@ -93,35 +116,55 @@ def run_check(self, ctx: AnalyzeContext) -> CheckResultData: CheckResultData The result type of the check. """ - # This check requires the build_as_code check to pass and a repository to be available. + # This check requires a repository to be available. if not ctx.component.repository: return CheckResultData(result_tables=[], result_type=CheckResultType.FAILED) # Look for the artifact in the corresponding registry and find the publish timestamp. artifact_published_date = None - package_registry_info_entries = ctx.dynamic_data["package_registries"] - for package_registry_info_entry in package_registry_info_entries: - match package_registry_info_entry: - # TODO: add package registries for other ecosystems. - case PackageRegistryInfo( - build_tool=Gradle() | Maven(), - package_registry=MavenCentralRegistry() as mvn_central_registry, - ): - group_id = ctx.component.namespace - artifact_id = ctx.component.name - version = ctx.component.version - try: - artifact_published_date = mvn_central_registry.find_publish_timestamp( - group_id, artifact_id, version - ) - except InvalidHTTPResponseError as error: - logger.debug(error) + for registry_info in ctx.dynamic_data["package_registries"]: + if registry_info.build_tool.purl_type == ctx.component.type: + try: + artifact_published_date = registry_info.package_registry.find_publish_timestamp(ctx.component.purl) + break + except InvalidHTTPResponseError as error: + logger.debug(error) + except NotImplementedError: + continue + + # This check requires the timestamps of published artifact and its source-code commit to proceed. + # If the timestamps are not found, we return with a fail result. + try: + commit_date = datetime.strptime(ctx.component.repository.commit_date, "%Y-%m-%dT%H:%M:%S%z") + except ValueError as error: + logger.debug("Failed to parse date string '%s': %s", ctx.component.repository.commit_date, error) + return CheckResultData(result_tables=[], result_type=CheckResultType.FAILED) - # This check requires the artifact publish artifact to proceed. If the timestamp is not - # found, we return with a fail result. if not artifact_published_date: + logger.debug("Unable to find a publish date for the artifact.") return CheckResultData(result_tables=[], result_type=CheckResultType.FAILED) + # If an artifact is published before the corresponding code is committed, there cannot be + # a CI pipeline that triggered the publishing. + if published_before_commit := artifact_published_date < commit_date: + logger.debug("Publish date %s is earlier than commit date %s.", artifact_published_date, commit_date) + + # Found an acceptable publish timestamp to proceed. + logger.debug("Publish date %s is later than commit date %s.", artifact_published_date, commit_date) + + # If a provenance is found, obtain the workflow and the pipeline that has triggered the artifact release. + prov_workflow = None + prov_trigger_run = None + prov_payload = ctx.dynamic_data["provenance"] + if not ctx.dynamic_data["is_inferred_prov"] and prov_payload: + # Obtain the build-related fields from the provenance. + try: + build_def = ProvenancePredicate.find_build_def(prov_payload.statement) + except ProvenanceError as error: + logger.error(error) + return CheckResultData(result_tables=[], result_type=CheckResultType.FAILED) + prov_workflow, prov_trigger_run = build_def.get_build_invocation(prov_payload.statement) + # Obtain the metadata inferred by the build_as_code check, which is stored in the `provenances` # attribute of the corresponding CI service. ci_services = ctx.dynamic_data["ci_services"] @@ -131,66 +174,147 @@ def run_check(self, ctx: AnalyzeContext) -> CheckResultData: if isinstance(ci_service, NoneCIService): continue - if ctx.dynamic_data["is_inferred_prov"] and ci_info["provenances"]: - for inferred_prov in ci_info["provenances"]: - # Skip processing the inferred provenance if it does not conform with the in-toto v0.1 specification. - if not isinstance(inferred_prov.payload, InTotoV01Payload): - continue - - # This check requires the job and step calling the deploy command. - # Validate the content of inferred_prov. - predicate = inferred_prov.payload.statement["predicate"] - if ( - not predicate - or not isinstance(predicate["invocation"], dict) - or "configSource" not in predicate["invocation"] - or not isinstance(predicate["invocation"]["configSource"], dict) - or "entryPoint" not in predicate["invocation"]["configSource"] - or not isinstance(predicate["invocation"]["configSource"]["entryPoint"], str) - ): - continue - if ( - not isinstance(predicate["buildConfig"], dict) - or "jobID" not in predicate["buildConfig"] - or not isinstance(predicate["buildConfig"]["jobID"], str) - or "stepID" not in predicate["buildConfig"] - or not isinstance(predicate["buildConfig"]["stepID"], str) - or "stepName" not in predicate["buildConfig"] - or not isinstance(predicate["buildConfig"]["stepName"], str) - ): - continue - try: - publish_time_range = defaults.getint("package_registries", "publish_time_range", fallback=3600) - except ValueError as error: - logger.error( - "Configuration error: publish_time_range in section of package_registries is not a valid integer %s.", - error, + # Different CI services have different retention policies for the workflow runs. + # Make sure the artifact is not older than the retention date. + ci_run_deleted = ci_service.workflow_run_deleted(artifact_published_date) + + # If the artifact is published before the source code is committed, the check should fail. + if published_before_commit: + return CheckResultData( + result_tables=[ + ArtifactPipelineFacts( + from_provenance=bool(prov_workflow), + run_deleted=ci_run_deleted, + published_before_commit=published_before_commit, + confidence=Confidence.HIGH, ) - return CheckResultData(result_tables=[], result_type=CheckResultType.FAILED) - - # Find the potential workflow runs. - if html_urls := ci_service.workflow_run_in_date_time_range( - repo_full_name=ctx.component.repository.full_name, - workflow=predicate["invocation"]["configSource"]["entryPoint"], - date_time=artifact_published_date, - step_name=predicate["buildConfig"]["stepName"], - step_id=predicate["buildConfig"]["stepID"], - time_range=publish_time_range, - ): - result_tables: list[CheckFacts] = [] - for html_url in html_urls: - result_tables.append( - InferArtifactPipelineFacts( - deploy_job=predicate["buildConfig"]["jobID"], - deploy_step=predicate["buildConfig"]["stepID"] - or predicate["buildConfig"]["stepName"], - run_url=html_url, - confidence=Confidence.MEDIUM, - ) - ) - return CheckResultData(result_tables=result_tables, result_type=CheckResultType.PASSED) - - return CheckResultData(result_tables=[], result_type=CheckResultType.FAILED) - - -registry.register(InferArtifactPipelineCheck()) + ], + result_type=CheckResultType.FAILED, + ) + # Obtain the job and step calling the deploy command. + # This data must have been found already by the build-as-code check. + build_predicate = ci_info["build_info_results"].statement["predicate"] + if build_predicate is None: + continue + build_entry_point = json_extract(build_predicate, ["invocation", "configSource", "entryPoint"], str) + + # If provenance exists check that the entry point extracted from the build-as-code check matches. + if build_entry_point is None or (prov_workflow and not build_entry_point.endswith(prov_workflow)): + continue + + if not (job_id := json_extract(build_predicate, ["buildConfig", "jobID"], str)): + continue + + step_id = json_extract(build_predicate, ["buildConfig", "stepID"], str) + step_name = json_extract(build_predicate, ["buildConfig", "stepName"], str) + callee_node_type = json_extract(build_predicate, ["buildConfig", "calleeType"], str) + + try: + publish_time_range = defaults.getint("package_registry", "publish_time_range", fallback=7200) + except ValueError as error: + logger.error( + "Configuration error: publish_time_range in section of package_registries is not a valid integer %s.", + error, + ) + return CheckResultData(result_tables=[], result_type=CheckResultType.FAILED) + + # Find the workflow runs that have potentially triggered the artifact publishing. + html_urls = ci_service.workflow_run_in_date_time_range( + repo_full_name=ctx.component.repository.full_name, + workflow=build_entry_point, + publish_date_time=artifact_published_date, + commit_date_time=commit_date, + job_id=job_id, + step_name=step_name, + step_id=step_id, + time_range=publish_time_range, + callee_node_type=callee_node_type, + ) + + # If provenance exists, we expect the timestamp of the reported triggered run + # to be within an acceptable range, have succeeded, and called the deploy command. + if prov_trigger_run: + result_type = CheckResultType.FAILED + # If the triggering run in the provenance does not satisfy any of the requirements above, + # set the confidence as medium because the build-as-code results might be imprecise. + confidence = Confidence.MEDIUM + if prov_trigger_run in html_urls: + # The workflow's deploy step has been successful. In this case, the check can pass with a + # high confidence. + confidence = Confidence.HIGH + result_type = CheckResultType.PASSED + elif ci_run_deleted: + # The workflow run data has been deleted and we cannot analyze any further. + confidence = Confidence.LOW + result_type = CheckResultType.UNKNOWN + + return CheckResultData( + result_tables=[ + ArtifactPipelineFacts( + deploy_workflow=build_entry_point, + deploy_job=job_id, + deploy_step=step_id or step_name, + run_url=prov_trigger_run, + from_provenance=True, + run_deleted=ci_run_deleted, + published_before_commit=published_before_commit, + confidence=confidence, + ) + ], + result_type=result_type, + ) + + # Logic for artifacts that do not have a provenance. + result_tables: list[CheckFacts] = [] + for html_url in html_urls: + result_tables.append( + ArtifactPipelineFacts( + deploy_workflow=build_entry_point, + deploy_job=job_id, + deploy_step=step_id or step_name, + run_url=html_url, + from_provenance=False, + run_deleted=ci_run_deleted, + published_before_commit=published_before_commit, + confidence=Confidence.MEDIUM, + ) + ) + if html_urls: + return CheckResultData(result_tables=result_tables, result_type=CheckResultType.PASSED) + if ci_run_deleted: + # We set the confidence as low because the analysis could not be performed due to missing + # CI run data. + return CheckResultData( + result_tables=[ + ArtifactPipelineFacts( + deploy_workflow=build_entry_point, + deploy_job=job_id, + deploy_step=step_id or step_name, + run_url=None, + from_provenance=False, + run_deleted=ci_run_deleted, + published_before_commit=published_before_commit, + confidence=Confidence.LOW, + ) + ], + result_type=CheckResultType.UNKNOWN, + ) + + if ci_run_deleted or published_before_commit: + # If the CI run data is deleted or the artifact is older than the source-code commit, + # The check should have failed earlier and we should not reach here. + logger.debug("Unexpected error has happened.") + return CheckResultData( + result_tables=[], + result_type=CheckResultType.FAILED, + ) + + # We should reach here when the analysis has failed to detect any successful deploy step in a + # CI run. In this case the check fails with a medium confidence. + return CheckResultData( + result_tables=[], + result_type=CheckResultType.FAILED, + ) + + +registry.register(ArtifactPipelineCheck()) diff --git a/src/macaron/slsa_analyzer/checks/trusted_builder_l3_check.py b/src/macaron/slsa_analyzer/checks/trusted_builder_l3_check.py index ebd632e50..e9f629447 100644 --- a/src/macaron/slsa_analyzer/checks/trusted_builder_l3_check.py +++ b/src/macaron/slsa_analyzer/checks/trusted_builder_l3_check.py @@ -13,7 +13,7 @@ from macaron.config.defaults import defaults from macaron.database.table_definitions import CheckFacts -from macaron.slsa_analyzer.analyze_context import AnalyzeContext, store_inferred_provenance +from macaron.slsa_analyzer.analyze_context import AnalyzeContext, store_inferred_build_info_results from macaron.slsa_analyzer.checks.base_check import BaseCheck from macaron.slsa_analyzer.checks.check_result import CheckResultData, CheckResultType, Confidence, JustificationType from macaron.slsa_analyzer.ci_service.github_actions.analyzer import ( @@ -133,7 +133,7 @@ def run_check(self, ctx: AnalyzeContext) -> CheckResultData: ci_service.api_client.get_relative_path_of_workflow(os.path.basename(caller_path)), ) - store_inferred_provenance( + store_inferred_build_info_results( ctx=ctx, ci_info=ci_info, ci_service=ci_service, trigger_link=caller_link ) diff --git a/src/macaron/slsa_analyzer/ci_service/base_ci_service.py b/src/macaron/slsa_analyzer/ci_service/base_ci_service.py index 6089d9aff..4a2b69e19 100644 --- a/src/macaron/slsa_analyzer/ci_service/base_ci_service.py +++ b/src/macaron/slsa_analyzer/ci_service/base_ci_service.py @@ -187,10 +187,13 @@ def workflow_run_in_date_time_range( self, repo_full_name: str, workflow: str, - date_time: datetime, + publish_date_time: datetime, + commit_date_time: datetime, + job_id: str, step_name: str | None, step_id: str | None, time_range: int = 0, + callee_node_type: str | None = None, ) -> set[str]: """Check if the repository has a workflow run started before the date_time timestamp within the time_range. @@ -205,8 +208,12 @@ def workflow_run_in_date_time_range( The target repo's full name. workflow : str The workflow URL. - date_time: datetime - The datetime object to query. + publish_date_time: datetime + The artifact publishing datetime object. + commit_date_time: datetime + The artifact's source-code commit datetime object. + job_id:str + The job that triggers the run. step_name: str The step in the GitHub Action workflow that needs to be checked. time_range: int @@ -220,6 +227,22 @@ def workflow_run_in_date_time_range( """ return set() + def workflow_run_deleted(self, timestamp: datetime) -> bool: + """ + Check if the CI run data is deleted based on a retention policy. + + Parameters + ---------- + timestamp: datetime + The timestamp of the CI run. + + Returns + ------- + bool + True if the CI run data is deleted. + """ + return False + def get_build_tool_commands(self, callgraph: CallGraph, build_tool: BaseBuildTool) -> Iterable[BuildToolCommand]: """ Traverse the callgraph and find all the reachable build tool commands. diff --git a/src/macaron/slsa_analyzer/ci_service/github_actions/analyzer.py b/src/macaron/slsa_analyzer/ci_service/github_actions/analyzer.py index 2009d97ff..2f0e49888 100644 --- a/src/macaron/slsa_analyzer/ci_service/github_actions/analyzer.py +++ b/src/macaron/slsa_analyzer/ci_service/github_actions/analyzer.py @@ -45,7 +45,7 @@ class ThirdPartyAction: action_version: str | None -class GitHubWorkflowType(Enum): +class GitHubWorkflowType(str, Enum): """This class represents different GitHub Actions workflow types.""" INTERNAL = "internal" # Workflows declared in the repo. diff --git a/src/macaron/slsa_analyzer/ci_service/github_actions/github_actions_ci.py b/src/macaron/slsa_analyzer/ci_service/github_actions/github_actions_ci.py index dec8221dc..d3f820ade 100644 --- a/src/macaron/slsa_analyzer/ci_service/github_actions/github_actions_ci.py +++ b/src/macaron/slsa_analyzer/ci_service/github_actions/github_actions_ci.py @@ -20,6 +20,7 @@ from macaron.slsa_analyzer.ci_service.github_actions.analyzer import ( GitHubJobNode, GitHubWorkflowNode, + GitHubWorkflowType, build_call_graph_from_path, find_language_setup_action, get_ci_events, @@ -243,14 +244,68 @@ def has_latest_run_passed( return "" + def check_publish_start_commit_timestamps( + self, started_at: datetime, publish_date_time: datetime, commit_date_time: datetime, time_range: int + ) -> bool: + """ + Check if the timestamps of CI run, artifact publishing, and commit date are within the acceptable time range and valid. + + This function checks that the CI run has happened before the artifact publishing timestamp. + + This function also verifies whether the commit date is within an acceptable time range + from the publish start time. The acceptable range is defined as half of the provided + time range parameter. + + Parameters + ---------- + started_at : datetime + The timestamp indicating when the GitHub Actions workflow started. + publish_date_time : datetime + The timestamp indicating when the artifact is published. + commit_date_time : datetime + The timestamp of the source code commit. + time_range : int + The total acceptable time range in seconds. + + Returns + ------- + bool + True if the commit date is within the acceptable range from the publish start time, + False otherwise. Returns False in case of any errors during timestamp comparisons. + """ + # Make sure the source-code commit date is also within acceptable range. + acceptable_range = time_range / 2 + try: + if started_at < publish_date_time: + if timedelta.total_seconds(abs(started_at - commit_date_time)) > acceptable_range: + logger.debug( + ( + "The difference between GitHub Actions starting time %s and source commit time %s" + " is not within %s seconds." + ), + started_at, + commit_date_time, + acceptable_range, + ) + return False + return True + + except (ValueError, OverflowError, TypeError) as error: + logger.debug(error) + + return False + def workflow_run_in_date_time_range( self, repo_full_name: str, workflow: str, - date_time: datetime, + publish_date_time: datetime, + commit_date_time: datetime, + job_id: str, step_name: str | None, step_id: str | None, time_range: int = 0, + callee_node_type: str | None = None, ) -> set[str]: """Check if the repository has a workflow run started before the date_time timestamp within the time_range. @@ -273,7 +328,6 @@ def workflow_run_in_date_time_range( The ID of the step in the GitHub Action workflow that needs to be checked. time_range: int The date-time range in seconds. The default value is 0. - For example a 30 seconds range for 2022-11-05T20:30 is 2022-11-05T20:15..2022-11-05T20:45. Returns ------- @@ -281,15 +335,16 @@ def workflow_run_in_date_time_range( The set of URLs found for the workflow within the time range. """ logger.debug( - "Getting the latest workflow run of %s at %s within time range %s", + "Getting the latest workflow run of %s at publishing time %s and source commit date %s within time range %s.", workflow, - str(date_time), + str(publish_date_time), + str(commit_date_time), str(time_range), ) html_urls: set[str] = set() try: - datetime_from = date_time - timedelta(seconds=time_range) + datetime_from = publish_date_time - timedelta(seconds=time_range) except (OverflowError, OSError, TypeError) as error: logger.debug(error) return html_urls @@ -298,7 +353,7 @@ def workflow_run_in_date_time_range( logger.debug("Search for the workflow runs within the range.") try: run_data = self.api_client.get_workflow_run_for_date_time_range( - repo_full_name, f"{datetime_from.isoformat()}..{date_time.isoformat()}" + repo_full_name, f"{datetime_from.isoformat()}..{publish_date_time.isoformat()}" ) except ValueError as error: logger.debug(error) @@ -321,33 +376,48 @@ def workflow_run_in_date_time_range( continue # Find the matching step and check its `conclusion` and `started_at` attributes. + html_url = None for job in run_jobs["jobs"]: + # If the deploy step is a Reusable Workflow, there won't be any steps in the caller job. + if callee_node_type == GitHubWorkflowType.REUSABLE.value: + if not job["name"].startswith(job_id) or job["conclusion"] != "success": + continue + started_at = datetime.fromisoformat(job["started_at"]) + if self.check_publish_start_commit_timestamps( + started_at=started_at, + publish_date_time=publish_date_time, + commit_date_time=commit_date_time, + time_range=time_range, + ): + run_id = item["id"] + html_url = item["html_url"] + break + for step in job["steps"]: - if (step["name"] not in [step_name, step_id]) or step["conclusion"] != "success": + if step["name"] not in [step_name, step_id] or step["conclusion"] != "success": continue - try: - if datetime.fromisoformat(step["started_at"]) < date_time: - run_id: str = item["id"] - html_url: str = item["html_url"] - logger.info( - "The workflow run status of %s (id = %s, url = %s, step = %s) is %s.", - workflow, - run_id, - html_url, - step["name"], - step["conclusion"], - ) - html_urls.add(html_url) - else: - logger.debug( - "The workflow start run %s happened after %s with status %s.", - datetime.fromisoformat(step["started_at"]), - date_time, - step["conclusion"], - ) - # Handle errors for calls to `fromisoformat()` and the time comparison. - except (ValueError, OverflowError, OSError, TypeError) as error: - logger.debug(error) + started_at = datetime.fromisoformat(step["started_at"]) + if self.check_publish_start_commit_timestamps( + started_at=started_at, + publish_date_time=publish_date_time, + commit_date_time=commit_date_time, + time_range=time_range, + ): + run_id = item["id"] + html_url = item["html_url"] + logger.info( + "The workflow run status of %s (id = %s, url = %s, step = %s) is %s.", + workflow, + run_id, + html_url, + step["name"], + step["conclusion"], + ) + break + + if html_url: + html_urls.add(html_url) + except KeyError as key_error: logger.debug( "Unable to read data of %s from the GitHub API result. Error: %s", @@ -357,6 +427,35 @@ def workflow_run_in_date_time_range( return html_urls + def workflow_run_deleted(self, timestamp: datetime) -> bool: + """ + Check if the CI run data is deleted based on a retention policy. + + Parameters + ---------- + timestamp: datetime + The timestamp of the CI run. + + Returns + ------- + bool + True if the CI run data is deleted. + """ + # Setting the timezone to UTC because the date format + # we are using for GitHub Actions is in ISO format, which contains the offset + # from the UTC timezone. For example: 2022-04-10T14:10:01+07:00 + # GitHub retains GitHub Actions pipeline data for 400 days. So, we cannot analyze the + # pipelines if artifacts are older than 400 days. + # https://docs.github.com/en/rest/guides/using-the-rest-api-to-interact-with-checks? + # apiVersion=2022-11-28#retention-of-checks-data + # TODO: change this check if this issue is resolved: + # https://github.com/orgs/community/discussions/138249 + if datetime.now(timezone.utc) - timedelta(days=400) > timestamp: + logger.debug("Artifact published at %s is older than 400 days.", timestamp) + return True + + return False + def search_for_workflow_run( self, workflow_id: str, diff --git a/src/macaron/slsa_analyzer/package_registry/jfrog_maven_registry.py b/src/macaron/slsa_analyzer/package_registry/jfrog_maven_registry.py index 62ae09c06..65987d1e2 100644 --- a/src/macaron/slsa_analyzer/package_registry/jfrog_maven_registry.py +++ b/src/macaron/slsa_analyzer/package_registry/jfrog_maven_registry.py @@ -7,6 +7,7 @@ import json import logging +from datetime import datetime from typing import NamedTuple from urllib.parse import SplitResult, urlunsplit @@ -851,3 +852,35 @@ def download_asset(self, url: str, dest: str) -> bool: return False return True + + def find_publish_timestamp(self, purl: str, registry_url: str | None = None) -> datetime: + """Make a search request to Maven Central to find the publishing timestamp of an artifact. + + The reason for directly fetching timestamps from Maven Central is that deps.dev occasionally + misses timestamps for Maven artifacts, making it unreliable for this purpose. + + To see the search API syntax see: https://central.sonatype.org/search/rest-api-guide/ + + Parameters + ---------- + purl: str + The Package URL (purl) of the package whose publication timestamp is to be retrieved. + This should conform to the PURL specification. + registry_url: str | None + The registry URL that can be set for testing. + + Returns + ------- + datetime + A timezone-aware datetime object representing the publication timestamp + of the specified package. + + Raises + ------ + InvalidHTTPResponseError + If the URL construction fails, the HTTP response is invalid, or if the response + cannot be parsed correctly, or if the expected timestamp is missing or invalid. + NotImplementedError + If not implemented for a registry. + """ + raise NotImplementedError("Fetching timestamps for artifacts on JFrog is not currently supported.") diff --git a/src/macaron/slsa_analyzer/package_registry/maven_central_registry.py b/src/macaron/slsa_analyzer/package_registry/maven_central_registry.py index 67a2b100b..92a52efd3 100644 --- a/src/macaron/slsa_analyzer/package_registry/maven_central_registry.py +++ b/src/macaron/slsa_analyzer/package_registry/maven_central_registry.py @@ -4,10 +4,11 @@ """The module provides abstractions for the Maven Central package registry.""" import logging +import urllib.parse from datetime import datetime, timezone -from urllib.parse import SplitResult, urlunsplit import requests +from packageurl import PackageURL from macaron.config.defaults import defaults from macaron.errors import ConfigurationError, InvalidHTTPResponseError @@ -75,8 +76,11 @@ class MavenCentralRegistry(PackageRegistry): def __init__( self, - hostname: str | None = None, + search_netloc: str | None = None, + search_scheme: str | None = None, search_endpoint: str | None = None, + registry_url_netloc: str | None = None, + registry_url_scheme: str | None = None, request_timeout: int | None = None, ) -> None: """ @@ -84,15 +88,25 @@ def __init__( Parameters ---------- - hostname : str - The hostname of the Maven Central service. + search_netloc: str | None = None, + The netloc of Maven Central search URL. + search_scheme: str | None = None, + The scheme of Maven Central URL. search_endpoint : str | None The search REST API to find artifacts. + registry_url_netloc: str | None + The netloc of the Maven Central registry url. + registry_url_scheme: str | None + The scheme of the Maven Central registry url. request_timeout : int | None The timeout (in seconds) for requests made to the package registry. """ - self.hostname = hostname or "" + self.search_netloc = search_netloc or "" + self.search_scheme = search_scheme or "" self.search_endpoint = search_endpoint or "" + self.registry_url_netloc = registry_url_netloc or "" + self.registry_url_scheme = registry_url_scheme or "" + self.registry_url = "" # Created from the registry_url_scheme and registry_url_netloc. self.request_timeout = request_timeout or 10 super().__init__("Maven Central Registry") @@ -109,18 +123,34 @@ def load_defaults(self) -> None: return section = defaults[section_name] - self.hostname = section.get("hostname") - if not self.hostname: + self.search_netloc = section.get("search_netloc") + if not self.search_netloc: raise ConfigurationError( - f'The "hostname" key is missing in section [{section_name}] of the .ini configuration file.' + f'The "search_netloc" key is missing in section [{section_name}] of the .ini configuration file.' ) + self.search_scheme = section.get("search_scheme", "https") self.search_endpoint = section.get("search_endpoint") if not self.search_endpoint: raise ConfigurationError( f'The "search_endpoint" key is missing in section [{section_name}] of the .ini configuration file.' ) + self.registry_url_netloc = section.get("registry_url_netloc") + if not self.registry_url_netloc: + raise ConfigurationError( + f'The "registry_url_netloc" key is missing in section [{section_name}] of the .ini configuration file.' + ) + self.registry_url_scheme = section.get("registry_url_scheme", "https") + self.registry_url = urllib.parse.ParseResult( + scheme=self.registry_url_scheme, + netloc=self.registry_url_netloc, + path="", + params="", + query="", + fragment="", + ).geturl() + try: self.request_timeout = section.getint("request_timeout", fallback=10) except ValueError as error: @@ -152,41 +182,49 @@ def is_detected(self, build_tool: BaseBuildTool) -> bool: compatible_build_tool_classes = [Maven, Gradle] return any(isinstance(build_tool, build_tool_class) for build_tool_class in compatible_build_tool_classes) - def find_publish_timestamp(self, group_id: str, artifact_id: str, version: str | None = None) -> datetime: + def find_publish_timestamp(self, purl: str, registry_url: str | None = None) -> datetime: """Make a search request to Maven Central to find the publishing timestamp of an artifact. - If version is not provided, the timestamp of the latest version will be returned. + The reason for directly fetching timestamps from Maven Central is that deps.dev occasionally + misses timestamps for Maven artifacts, making it unreliable for this purpose. To see the search API syntax see: https://central.sonatype.org/search/rest-api-guide/ Parameters ---------- - group_id : str - The group id of the artifact. - artifact_id: str - The artifact id of the artifact. - version: str | None - The version of the artifact. + purl: str + The Package URL (purl) of the package whose publication timestamp is to be retrieved. + This should conform to the PURL specification. + registry_url: str | None + The registry URL that can be set for testing. Returns ------- datetime - The artifact publish timestamp as a timezone-aware datetime object. + A timezone-aware datetime object representing the publication timestamp + of the specified package. Raises ------ InvalidHTTPResponseError - If the HTTP response is invalid or unexpected. + If the URL construction fails, the HTTP response is invalid, or if the response + cannot be parsed correctly, or if the expected timestamp is missing or invalid. """ - query_params = [f"q=g:{group_id}", f"a:{artifact_id}"] - if version: - query_params.append(f"v:{version}") + try: + purl_object = PackageURL.from_string(purl) + except ValueError as error: + logger.debug("Could not parse PURL: %s", error) + + if not purl_object.version: + raise InvalidHTTPResponseError("The PackageURL of the software component misses version.") + + query_params = [f"q=g:{purl_object.namespace}", f"a:{purl_object.name}", f"v:{purl_object.version}"] try: - url = urlunsplit( - SplitResult( - scheme="https", - netloc=self.hostname, + url = urllib.parse.urlunsplit( + urllib.parse.SplitResult( + scheme=self.search_scheme, + netloc=self.search_netloc, path=f"/{self.search_endpoint}", query="&".join(["+AND+".join(query_params), "core=gav", "rows=1", "wt=json"]), fragment="", @@ -196,7 +234,7 @@ def find_publish_timestamp(self, group_id: str, artifact_id: str, version: str | raise InvalidHTTPResponseError("Failed to construct the search URL for Maven Central.") from error response = send_get_http_raw(url, headers=None, timeout=self.request_timeout) - if response and response.status_code == 200: + if response: try: res_obj = response.json() except requests.exceptions.JSONDecodeError as error: @@ -221,7 +259,7 @@ def find_publish_timestamp(self, group_id: str, artifact_id: str, version: str | # The timestamp published in Maven Central is in milliseconds and needs to be divided by 1000. # Unfortunately, this is not documented in the API docs. try: - return datetime.fromtimestamp(timestamp / 1000, tz=timezone.utc) + return datetime.fromtimestamp(round(timestamp / 1000), tz=timezone.utc) except (OverflowError, OSError) as error: raise InvalidHTTPResponseError(f"The timestamp returned by {url} is invalid") from error diff --git a/src/macaron/slsa_analyzer/package_registry/package_registry.py b/src/macaron/slsa_analyzer/package_registry/package_registry.py index e7e68f8c5..55ae778b7 100644 --- a/src/macaron/slsa_analyzer/package_registry/package_registry.py +++ b/src/macaron/slsa_analyzer/package_registry/package_registry.py @@ -1,12 +1,21 @@ -# Copyright (c) 2023 - 2023, Oracle and/or its affiliates. All rights reserved. +# Copyright (c) 2023 - 2024, Oracle and/or its affiliates. All rights reserved. # Licensed under the Universal Permissive License v 1.0 as shown at https://oss.oracle.com/licenses/upl/. """This module defines package registries.""" +import json import logging +import urllib.parse from abc import ABC, abstractmethod +from datetime import datetime +from urllib.parse import quote as encode +import requests + +from macaron.errors import InvalidHTTPResponseError +from macaron.json_tools import json_extract from macaron.slsa_analyzer.build_tool.base_build_tool import BaseBuildTool +from macaron.util import send_get_http_raw logger: logging.Logger = logging.getLogger(__name__) @@ -40,3 +49,77 @@ def is_detected(self, build_tool: BaseBuildTool) -> bool: ``True`` if the repo under analysis can be published to this package registry, based on the given build tool. """ + + def find_publish_timestamp(self, purl: str, registry_url: str | None = None) -> datetime: + """Retrieve the publication timestamp for a package specified by its purl from the deps.dev repository by default. + + This method constructs a request URL based on the provided purl, sends an HTTP GET + request to fetch metadata about the package, and extracts the publication timestamp + from the response. + + Note: The method expects the response to include a ``version`` field with a ``publishedAt`` + subfield containing an ISO 8601 formatted timestamp. + + Parameters + ---------- + purl: str + The Package URL (purl) of the package whose publication timestamp is to be retrieved. + This should conform to the PURL specification. + registry_url: str | None + The registry URL that can be set for testing. + + Returns + ------- + datetime + A timezone-aware datetime object representing the publication timestamp + of the specified package. + + Raises + ------ + InvalidHTTPResponseError + If the URL construction fails, the HTTP response is invalid, or if the response + cannot be parsed correctly, or if the expected timestamp is missing or invalid. + NotImplementedError + If not implemented for a registry. + """ + # TODO: To reduce redundant calls to deps.dev, store relevant parts of the response + # in the AnalyzeContext object retrieved by the Repo Finder. This step should be + # implemented at the beginning of the analyze command to ensure that the data + # is available for subsequent processing. + + base_url_parsed = urllib.parse.urlparse(registry_url or "https://api.deps.dev") + path_params = "/".join(["v3alpha", "purl", encode(purl, safe="")]) + try: + url = urllib.parse.urlunsplit( + urllib.parse.SplitResult( + scheme=base_url_parsed.scheme, + netloc=base_url_parsed.netloc, + path=path_params, + query="", + fragment="", + ) + ) + except ValueError as error: + raise InvalidHTTPResponseError("Failed to construct the API URL.") from error + + response = send_get_http_raw(url) + if response and response.text: + try: + metadata: dict = json.loads(response.text) + except requests.exceptions.JSONDecodeError as error: + raise InvalidHTTPResponseError(f"Failed to process response from deps.dev for {url}.") from error + if not metadata: + raise InvalidHTTPResponseError(f"Empty response returned by {url} .") + + timestamp = json_extract(metadata, ["version", "publishedAt"], str) + if not timestamp: + raise InvalidHTTPResponseError(f"The timestamp is missing in the response returned by {url}.") + + logger.debug("Found timestamp: %s.", timestamp) + + try: + return datetime.fromisoformat(timestamp) + except ValueError as error: + raise InvalidHTTPResponseError(f"The timestamp returned by {url} is invalid") from error + + raise InvalidHTTPResponseError(f"Invalid response from deps.dev for {url}.") diff --git a/src/macaron/slsa_analyzer/provenance/slsa/__init__.py b/src/macaron/slsa_analyzer/provenance/slsa/__init__.py index b3418946f..cf9a9cfb7 100644 --- a/src/macaron/slsa_analyzer/provenance/slsa/__init__.py +++ b/src/macaron/slsa_analyzer/provenance/slsa/__init__.py @@ -2,6 +2,7 @@ # Licensed under the Universal Permissive License v 1.0 as shown at https://oss.oracle.com/licenses/upl/. """This module implements SLSA provenance abstractions.""" + from typing import NamedTuple from macaron.slsa_analyzer.asset import AssetLocator diff --git a/src/macaron/slsa_analyzer/specs/ci_spec.py b/src/macaron/slsa_analyzer/specs/ci_spec.py index 7f4bef2a3..0f00e5bdb 100644 --- a/src/macaron/slsa_analyzer/specs/ci_spec.py +++ b/src/macaron/slsa_analyzer/specs/ci_spec.py @@ -9,6 +9,7 @@ from macaron.code_analyzer.call_graph import CallGraph from macaron.slsa_analyzer.asset import AssetLocator from macaron.slsa_analyzer.ci_service.base_ci_service import BaseCIService +from macaron.slsa_analyzer.provenance.intoto import InTotoV01Payload from macaron.slsa_analyzer.provenance.provenance import DownloadedProvenanceData @@ -36,3 +37,6 @@ class CIInfo(TypedDict): provenances: Sequence[DownloadedProvenanceData] """The provenances data.""" + + build_info_results: InTotoV01Payload + """The build information results computed for a build step. We use the in-toto 0.1 as the spec.""" diff --git a/tests/integration/cases/apache_maven_local_path_with_branch_name_digest_deps_cyclonedx_maven/maven.dl b/tests/integration/cases/apache_maven_local_path_with_branch_name_digest_deps_cyclonedx_maven/maven.dl index afd7a54de..2c750872a 100644 --- a/tests/integration/cases/apache_maven_local_path_with_branch_name_digest_deps_cyclonedx_maven/maven.dl +++ b/tests/integration/cases/apache_maven_local_path_with_branch_name_digest_deps_cyclonedx_maven/maven.dl @@ -8,7 +8,7 @@ Policy("test_policy", component_id, "") :- check_passed(component_id, "mcn_build_service_1"), check_passed(component_id, "mcn_version_control_system_1"), check_failed(component_id, "mcn_build_as_code_1"), - check_failed(component_id, "mcn_infer_artifact_pipeline_1"), + check_failed(component_id, "mcn_find_artifact_pipeline_1"), check_failed(component_id, "mcn_provenance_available_1"), check_failed(component_id, "mcn_provenance_derived_commit_1"), check_failed(component_id, "mcn_provenance_derived_repo_1"), diff --git a/tests/integration/cases/apache_maven_local_paths_without_dep_resolution/guava.dl b/tests/integration/cases/apache_maven_local_paths_without_dep_resolution/guava.dl index 5f5927982..fdf03032b 100644 --- a/tests/integration/cases/apache_maven_local_paths_without_dep_resolution/guava.dl +++ b/tests/integration/cases/apache_maven_local_paths_without_dep_resolution/guava.dl @@ -8,7 +8,7 @@ Policy("test_policy", component_id, "") :- check_passed(component_id, "mcn_build_script_1"), check_passed(component_id, "mcn_build_service_1"), check_passed(component_id, "mcn_version_control_system_1"), - check_failed(component_id, "mcn_infer_artifact_pipeline_1"), + check_failed(component_id, "mcn_find_artifact_pipeline_1"), check_failed(component_id, "mcn_provenance_available_1"), check_failed(component_id, "mcn_provenance_derived_commit_1"), check_failed(component_id, "mcn_provenance_derived_repo_1"), diff --git a/tests/integration/cases/apache_maven_local_paths_without_dep_resolution/maven.dl b/tests/integration/cases/apache_maven_local_paths_without_dep_resolution/maven.dl index ef16459c9..708676471 100644 --- a/tests/integration/cases/apache_maven_local_paths_without_dep_resolution/maven.dl +++ b/tests/integration/cases/apache_maven_local_paths_without_dep_resolution/maven.dl @@ -8,7 +8,7 @@ Policy("test_policy", component_id, "") :- check_passed(component_id, "mcn_build_service_1"), check_passed(component_id, "mcn_version_control_system_1"), check_failed(component_id, "mcn_build_as_code_1"), - check_failed(component_id, "mcn_infer_artifact_pipeline_1"), + check_failed(component_id, "mcn_find_artifact_pipeline_1"), check_failed(component_id, "mcn_provenance_available_1"), check_failed(component_id, "mcn_provenance_derived_commit_1"), check_failed(component_id, "mcn_provenance_derived_repo_1"), diff --git a/tests/integration/cases/apache_maven_local_paths_without_dep_resolution/mockito.dl b/tests/integration/cases/apache_maven_local_paths_without_dep_resolution/mockito.dl index f754eb3e5..92e0e16c8 100644 --- a/tests/integration/cases/apache_maven_local_paths_without_dep_resolution/mockito.dl +++ b/tests/integration/cases/apache_maven_local_paths_without_dep_resolution/mockito.dl @@ -8,7 +8,7 @@ Policy("test_policy", component_id, "") :- check_passed(component_id, "mcn_build_script_1"), check_passed(component_id, "mcn_build_service_1"), check_passed(component_id, "mcn_version_control_system_1"), - check_failed(component_id, "mcn_infer_artifact_pipeline_1"), + check_failed(component_id, "mcn_find_artifact_pipeline_1"), check_failed(component_id, "mcn_provenance_available_1"), check_failed(component_id, "mcn_provenance_derived_commit_1"), check_failed(component_id, "mcn_provenance_derived_repo_1"), diff --git a/tests/integration/cases/apache_maven_local_repo/policy.dl b/tests/integration/cases/apache_maven_local_repo/policy.dl index ef16459c9..708676471 100644 --- a/tests/integration/cases/apache_maven_local_repo/policy.dl +++ b/tests/integration/cases/apache_maven_local_repo/policy.dl @@ -8,7 +8,7 @@ Policy("test_policy", component_id, "") :- check_passed(component_id, "mcn_build_service_1"), check_passed(component_id, "mcn_version_control_system_1"), check_failed(component_id, "mcn_build_as_code_1"), - check_failed(component_id, "mcn_infer_artifact_pipeline_1"), + check_failed(component_id, "mcn_find_artifact_pipeline_1"), check_failed(component_id, "mcn_provenance_available_1"), check_failed(component_id, "mcn_provenance_derived_commit_1"), check_failed(component_id, "mcn_provenance_derived_repo_1"), diff --git a/tests/integration/cases/apache_maven_purl_repo_path/policy.dl b/tests/integration/cases/apache_maven_purl_repo_path/policy.dl index ecc9383f6..92d3b8d7b 100644 --- a/tests/integration/cases/apache_maven_purl_repo_path/policy.dl +++ b/tests/integration/cases/apache_maven_purl_repo_path/policy.dl @@ -8,7 +8,7 @@ Policy("test_policy", component_id, "") :- check_passed(component_id, "mcn_build_service_1"), check_passed(component_id, "mcn_version_control_system_1"), check_failed(component_id, "mcn_build_as_code_1"), - check_failed(component_id, "mcn_infer_artifact_pipeline_1"), + check_failed(component_id, "mcn_find_artifact_pipeline_1"), check_failed(component_id, "mcn_provenance_available_1"), check_failed(component_id, "mcn_provenance_derived_commit_1"), check_failed(component_id, "mcn_provenance_derived_repo_1"), diff --git a/tests/integration/cases/apache_maven_repo_path_branch_digest_with_deps_cyclonedx_maven/maven.dl b/tests/integration/cases/apache_maven_repo_path_branch_digest_with_deps_cyclonedx_maven/maven.dl index afd7a54de..2c750872a 100644 --- a/tests/integration/cases/apache_maven_repo_path_branch_digest_with_deps_cyclonedx_maven/maven.dl +++ b/tests/integration/cases/apache_maven_repo_path_branch_digest_with_deps_cyclonedx_maven/maven.dl @@ -8,7 +8,7 @@ Policy("test_policy", component_id, "") :- check_passed(component_id, "mcn_build_service_1"), check_passed(component_id, "mcn_version_control_system_1"), check_failed(component_id, "mcn_build_as_code_1"), - check_failed(component_id, "mcn_infer_artifact_pipeline_1"), + check_failed(component_id, "mcn_find_artifact_pipeline_1"), check_failed(component_id, "mcn_provenance_available_1"), check_failed(component_id, "mcn_provenance_derived_commit_1"), check_failed(component_id, "mcn_provenance_derived_repo_1"), diff --git a/tests/integration/cases/apache_maven_using_default_template_file_as_input_template/maven.dl b/tests/integration/cases/apache_maven_using_default_template_file_as_input_template/maven.dl index ef16459c9..708676471 100644 --- a/tests/integration/cases/apache_maven_using_default_template_file_as_input_template/maven.dl +++ b/tests/integration/cases/apache_maven_using_default_template_file_as_input_template/maven.dl @@ -8,7 +8,7 @@ Policy("test_policy", component_id, "") :- check_passed(component_id, "mcn_build_service_1"), check_passed(component_id, "mcn_version_control_system_1"), check_failed(component_id, "mcn_build_as_code_1"), - check_failed(component_id, "mcn_infer_artifact_pipeline_1"), + check_failed(component_id, "mcn_find_artifact_pipeline_1"), check_failed(component_id, "mcn_provenance_available_1"), check_failed(component_id, "mcn_provenance_derived_commit_1"), check_failed(component_id, "mcn_provenance_derived_repo_1"), diff --git a/tests/integration/cases/apache_maven_yaml_input_skip_deps/guava.dl b/tests/integration/cases/apache_maven_yaml_input_skip_deps/guava.dl index 5f5927982..fdf03032b 100644 --- a/tests/integration/cases/apache_maven_yaml_input_skip_deps/guava.dl +++ b/tests/integration/cases/apache_maven_yaml_input_skip_deps/guava.dl @@ -8,7 +8,7 @@ Policy("test_policy", component_id, "") :- check_passed(component_id, "mcn_build_script_1"), check_passed(component_id, "mcn_build_service_1"), check_passed(component_id, "mcn_version_control_system_1"), - check_failed(component_id, "mcn_infer_artifact_pipeline_1"), + check_failed(component_id, "mcn_find_artifact_pipeline_1"), check_failed(component_id, "mcn_provenance_available_1"), check_failed(component_id, "mcn_provenance_derived_commit_1"), check_failed(component_id, "mcn_provenance_derived_repo_1"), diff --git a/tests/integration/cases/apache_maven_yaml_input_skip_deps/maven.dl b/tests/integration/cases/apache_maven_yaml_input_skip_deps/maven.dl index ef16459c9..708676471 100644 --- a/tests/integration/cases/apache_maven_yaml_input_skip_deps/maven.dl +++ b/tests/integration/cases/apache_maven_yaml_input_skip_deps/maven.dl @@ -8,7 +8,7 @@ Policy("test_policy", component_id, "") :- check_passed(component_id, "mcn_build_service_1"), check_passed(component_id, "mcn_version_control_system_1"), check_failed(component_id, "mcn_build_as_code_1"), - check_failed(component_id, "mcn_infer_artifact_pipeline_1"), + check_failed(component_id, "mcn_find_artifact_pipeline_1"), check_failed(component_id, "mcn_provenance_available_1"), check_failed(component_id, "mcn_provenance_derived_commit_1"), check_failed(component_id, "mcn_provenance_derived_repo_1"), diff --git a/tests/integration/cases/apache_maven_yaml_input_skip_deps/mockito.dl b/tests/integration/cases/apache_maven_yaml_input_skip_deps/mockito.dl index f754eb3e5..92e0e16c8 100644 --- a/tests/integration/cases/apache_maven_yaml_input_skip_deps/mockito.dl +++ b/tests/integration/cases/apache_maven_yaml_input_skip_deps/mockito.dl @@ -8,7 +8,7 @@ Policy("test_policy", component_id, "") :- check_passed(component_id, "mcn_build_script_1"), check_passed(component_id, "mcn_build_service_1"), check_passed(component_id, "mcn_version_control_system_1"), - check_failed(component_id, "mcn_infer_artifact_pipeline_1"), + check_failed(component_id, "mcn_find_artifact_pipeline_1"), check_failed(component_id, "mcn_provenance_available_1"), check_failed(component_id, "mcn_provenance_derived_commit_1"), check_failed(component_id, "mcn_provenance_derived_repo_1"), diff --git a/tests/integration/cases/behnazh-w_example-maven-app-tutorial/policy.dl b/tests/integration/cases/behnazh-w_example-maven-app-tutorial/policy.dl new file mode 100644 index 000000000..e5dba0031 --- /dev/null +++ b/tests/integration/cases/behnazh-w_example-maven-app-tutorial/policy.dl @@ -0,0 +1,17 @@ +/* Copyright (c) 2024 - 2024, Oracle and/or its affiliates. All rights reserved. */ +/* Licensed under the Universal Permissive License v 1.0 as shown at https://oss.oracle.com/licenses/upl/. */ + + #include "prelude.dl" + + Policy("detect-malicious-upload", component_id, "") :- + is_component(component_id, _), + !violating_dependencies(component_id). + + .decl violating_dependencies(parent: number) + violating_dependencies(parent) :- + transitive_dependency(parent, dependency), + !check_passed(dependency, "mcn_find_artifact_pipeline_1"), + !check_passed(dependency, "mcn_provenance_level_three_1"). + + apply_policy_to("detect-malicious-upload", component_id) :- + is_repo(_, "github.com/behnazh-w/example-maven-app", component_id). diff --git a/tests/integration/cases/behnazh-w_example-maven-app-tutorial/test.yaml b/tests/integration/cases/behnazh-w_example-maven-app-tutorial/test.yaml new file mode 100644 index 000000000..9e760d683 --- /dev/null +++ b/tests/integration/cases/behnazh-w_example-maven-app-tutorial/test.yaml @@ -0,0 +1,24 @@ +# Copyright (c) 2024 - 2024, Oracle and/or its affiliates. All rights reserved. +# Licensed under the Universal Permissive License v 1.0 as shown at https://oss.oracle.com/licenses/upl/. + +description: | + Test the example-maven-app detect-manual-upload-java-dep tutorial scenario. + +tags: +- macaron-python-package +- tutorial +steps: +- name: Run macaron analyze on the remote repository and resolve dependencies. + kind: analyze + options: + command_args: + - --package-url + - pkg:maven/io.github.behnazh-w.demo/example-maven-app@2.0?type=jar + - -rp + - https://github.com/behnazh-w/example-maven-app + - --deps-depth=1 +- name: Run macaron verify-policy and expect to fail some deps do not pass the policy. + kind: verify + options: + policy: policy.dl + expect_fail: true diff --git a/tests/integration/cases/facebook_yoga_yarn_classic/policy.dl b/tests/integration/cases/facebook_yoga_yarn_classic/policy.dl index 0d652339e..75706c7e6 100644 --- a/tests/integration/cases/facebook_yoga_yarn_classic/policy.dl +++ b/tests/integration/cases/facebook_yoga_yarn_classic/policy.dl @@ -11,7 +11,7 @@ Policy("test_policy", component_id, "") :- check_passed(component_id, "mcn_build_tool_1"), build_tool_check(yarn_id, "yarn", "javascript"), check_facts(yarn_id, _, component_id,_,_), - check_failed(component_id, "mcn_infer_artifact_pipeline_1"), + check_failed(component_id, "mcn_find_artifact_pipeline_1"), check_failed(component_id, "mcn_provenance_available_1"), check_failed(component_id, "mcn_provenance_derived_commit_1"), check_failed(component_id, "mcn_provenance_derived_repo_1"), diff --git a/tests/integration/cases/gitlab_tinyMediaManager/policy.dl b/tests/integration/cases/gitlab_tinyMediaManager/policy.dl index 1a1bd419c..2e6676a22 100644 --- a/tests/integration/cases/gitlab_tinyMediaManager/policy.dl +++ b/tests/integration/cases/gitlab_tinyMediaManager/policy.dl @@ -8,7 +8,7 @@ Policy("test_policy", component_id, "") :- check_passed(component_id, "mcn_version_control_system_1"), check_failed(component_id, "mcn_build_as_code_1"), check_failed(component_id, "mcn_build_service_1"), - check_failed(component_id, "mcn_infer_artifact_pipeline_1"), + check_failed(component_id, "mcn_find_artifact_pipeline_1"), check_failed(component_id, "mcn_provenance_available_1"), check_failed(component_id, "mcn_provenance_derived_commit_1"), check_failed(component_id, "mcn_provenance_derived_repo_1"), diff --git a/tests/integration/cases/gitlab_tinyMediaManager_purl/policy.dl b/tests/integration/cases/gitlab_tinyMediaManager_purl/policy.dl index 11e3f4b73..c2bb7761c 100644 --- a/tests/integration/cases/gitlab_tinyMediaManager_purl/policy.dl +++ b/tests/integration/cases/gitlab_tinyMediaManager_purl/policy.dl @@ -8,7 +8,7 @@ Policy("test_policy", component_id, "") :- check_passed(component_id, "mcn_version_control_system_1"), check_failed(component_id, "mcn_build_as_code_1"), check_failed(component_id, "mcn_build_service_1"), - check_failed(component_id, "mcn_infer_artifact_pipeline_1"), + check_failed(component_id, "mcn_find_artifact_pipeline_1"), check_failed(component_id, "mcn_provenance_available_1"), check_failed(component_id, "mcn_provenance_derived_commit_1"), check_failed(component_id, "mcn_provenance_derived_repo_1"), diff --git a/tests/integration/cases/google_guava/policy.dl b/tests/integration/cases/google_guava/policy.dl index dddcdea35..e872e43db 100644 --- a/tests/integration/cases/google_guava/policy.dl +++ b/tests/integration/cases/google_guava/policy.dl @@ -7,10 +7,6 @@ Policy("test_policy", component_id, "") :- check_passed(component_id, "mcn_build_as_code_1"), check_passed(component_id, "mcn_build_script_1"), check_passed(component_id, "mcn_build_service_1"), - // TODO: The GitHub API is no longer returning the required information about the workflow run - // steps for this version of Guava. So, we need to disable this check for now and adjust - // the logic in the mcn_infer_artifact_pipeline_1 check. - check_failed(component_id, "mcn_infer_artifact_pipeline_1"), check_passed(component_id, "mcn_version_control_system_1"), check_passed(component_id, "mcn_build_tool_1"), build_tool_check(maven_id, "maven", "java"), @@ -22,7 +18,19 @@ Policy("test_policy", component_id, "") :- check_failed(component_id, "mcn_provenance_level_three_1"), check_failed(component_id, "mcn_provenance_witness_level_one_1"), check_failed(component_id, "mcn_trusted_builder_level_three_1"), - is_repo_url(component_id, "https://github.com/google/guava"). + is_repo_url(component_id, "https://github.com/google/guava"), + check_failed(component_id, "mcn_find_artifact_pipeline_1"), + artifact_pipeline_check( + apc_check_id, + _, + _, + _, + _, + 0, // From provenance. + 1, // Run deleted. + 0 // Published before the code was committed. + ), + check_facts(apc_check_id, _, component_id,_,_). apply_policy_to("test_policy", component_id) :- is_component(component_id, "pkg:maven/com.google.guava/guava@32.1.2-jre?type=jar"). diff --git a/tests/integration/cases/jackson_databind_with_purl_and_no_deps/jackson-databind.dl b/tests/integration/cases/jackson_databind_with_purl_and_no_deps/jackson-databind.dl index d045d955b..c722e0298 100644 --- a/tests/integration/cases/jackson_databind_with_purl_and_no_deps/jackson-databind.dl +++ b/tests/integration/cases/jackson_databind_with_purl_and_no_deps/jackson-databind.dl @@ -11,7 +11,7 @@ Policy("test_policy", component_id, "") :- check_passed(component_id, "mcn_build_tool_1"), build_tool_check(maven_id, "maven", "java"), check_facts(maven_id, _, component_id,_,_), - check_failed(component_id, "mcn_infer_artifact_pipeline_1"), + check_failed(component_id, "mcn_find_artifact_pipeline_1"), check_failed(component_id, "mcn_provenance_available_1"), check_failed(component_id, "mcn_provenance_derived_commit_1"), check_failed(component_id, "mcn_provenance_derived_repo_1"), diff --git a/tests/integration/cases/jenkinsci_plotplugin/policy.dl b/tests/integration/cases/jenkinsci_plotplugin/policy.dl index a3d674888..355ee5e08 100644 --- a/tests/integration/cases/jenkinsci_plotplugin/policy.dl +++ b/tests/integration/cases/jenkinsci_plotplugin/policy.dl @@ -8,7 +8,7 @@ Policy("test_policy", component_id, "") :- check_passed(component_id, "mcn_build_service_1"), check_passed(component_id, "mcn_version_control_system_1"), check_failed(component_id, "mcn_build_as_code_1"), - check_failed(component_id, "mcn_infer_artifact_pipeline_1"), + check_failed(component_id, "mcn_find_artifact_pipeline_1"), check_failed(component_id, "mcn_provenance_available_1"), check_failed(component_id, "mcn_provenance_derived_commit_1"), check_failed(component_id, "mcn_provenance_derived_repo_1"), diff --git a/tests/integration/cases/log4j_release_pipeline/policy.dl b/tests/integration/cases/log4j_release_pipeline/policy.dl new file mode 100644 index 000000000..3044be45a --- /dev/null +++ b/tests/integration/cases/log4j_release_pipeline/policy.dl @@ -0,0 +1,23 @@ +/* Copyright (c) 2024 - 2024, Oracle and/or its affiliates. All rights reserved. */ +/* Licensed under the Universal Permissive License v 1.0 as shown at https://oss.oracle.com/licenses/upl/. */ + +#include "prelude.dl" + +Policy("test_policy", component_id, "") :- + check_passed(component_id, "mcn_build_as_code_1"), + check_passed(component_id, "mcn_build_script_1"), + check_passed(component_id, "mcn_build_service_1"), + check_passed_with_confidence(component_id, "mcn_find_artifact_pipeline_1", confidence), + confidence = 0.7, // Medium confidence because the pipeline was not found from a provenance. + check_passed(component_id, "mcn_version_control_system_1"), + check_failed(component_id, "mcn_provenance_available_1"), + check_failed(component_id, "mcn_provenance_derived_commit_1"), + check_failed(component_id, "mcn_provenance_derived_repo_1"), + check_failed(component_id, "mcn_provenance_expectation_1"), + check_failed(component_id, "mcn_provenance_level_three_1"), + check_failed(component_id, "mcn_provenance_witness_level_one_1"), + check_failed(component_id, "mcn_trusted_builder_level_three_1"), + is_repo_url(component_id, "https://github.com/apache/logging-log4j2"). + +apply_policy_to("test_policy", component_id) :- + is_component(component_id, "pkg:maven/org.apache.logging.log4j/log4j-core@3.0.0-beta2"). diff --git a/tests/integration/cases/log4j_release_pipeline/test.yaml b/tests/integration/cases/log4j_release_pipeline/test.yaml new file mode 100644 index 000000000..54600056c --- /dev/null +++ b/tests/integration/cases/log4j_release_pipeline/test.yaml @@ -0,0 +1,21 @@ +# Copyright (c) 2024 - 2024, Oracle and/or its affiliates. All rights reserved. +# Licensed under the Universal Permissive License v 1.0 as shown at https://oss.oracle.com/licenses/upl/. + +description: | + Analyzing with PURL and repository path without dependency resolution. + +tags: +- macaron-python-package +- tutorial + +steps: +- name: Run macaron analyze + kind: analyze + options: + command_args: + - -purl + - pkg:maven/org.apache.logging.log4j/log4j-core@3.0.0-beta2 +- name: Run macaron verify-policy to verify passed/failed checks + kind: verify + options: + policy: policy.dl diff --git a/tests/integration/cases/log4j_release_pipeline_deleted_run/policy.dl b/tests/integration/cases/log4j_release_pipeline_deleted_run/policy.dl new file mode 100644 index 000000000..2a2a68caf --- /dev/null +++ b/tests/integration/cases/log4j_release_pipeline_deleted_run/policy.dl @@ -0,0 +1,38 @@ +/* Copyright (c) 2024 - 2024, Oracle and/or its affiliates. All rights reserved. */ +/* Licensed under the Universal Permissive License v 1.0 as shown at https://oss.oracle.com/licenses/upl/. */ + +#include "prelude.dl" + +Policy("test_policy", component_id, "") :- + check_passed(component_id, "mcn_build_as_code_1"), + check_passed(component_id, "mcn_build_script_1"), + check_passed(component_id, "mcn_build_service_1"), + check_passed(component_id, "mcn_version_control_system_1"), + check_failed(component_id, "mcn_provenance_available_1"), + check_failed(component_id, "mcn_provenance_derived_commit_1"), + check_failed(component_id, "mcn_provenance_derived_repo_1"), + check_failed(component_id, "mcn_provenance_expectation_1"), + check_failed(component_id, "mcn_provenance_level_three_1"), + check_failed(component_id, "mcn_provenance_witness_level_one_1"), + check_failed(component_id, "mcn_trusted_builder_level_three_1"), + is_repo_url(component_id, "https://github.com/apache/logging-log4j2"), + // The GitHub API has a retention policy of removing CI run data after 400 days. + // Note that mcn_find_artifact_pipeline_1 fails because it returns UNKNOWN, in this case with low confidence. + // That's why we cannot rely on the check fail here only and need to also check the data gathered by + // the artifact_pipeline_check. + check_failed_with_confidence(component_id, "mcn_find_artifact_pipeline_1", confidence), + confidence = 0.4, + artifact_pipeline_check( + apc_check_id, + "https://github.com/apache/logging-log4j2/blob/5a5d3aefdc75045bb66f55a16c40a9a07a463738/.github/workflows/build.yml", + "deploy", + "Maven \"deploy\"", + _, + 0, // From provenance. + 1, // Run deleted. + 0 // Published before the code was committed. + ), + check_facts(apc_check_id, confidence, component_id,_,_). + +apply_policy_to("test_policy", component_id) :- + is_component(component_id, "pkg:maven/org.apache.logging.log4j/log4j-core@2.19.0"). diff --git a/tests/integration/cases/log4j_release_pipeline_deleted_run/test.yaml b/tests/integration/cases/log4j_release_pipeline_deleted_run/test.yaml new file mode 100644 index 000000000..2b33a0a24 --- /dev/null +++ b/tests/integration/cases/log4j_release_pipeline_deleted_run/test.yaml @@ -0,0 +1,20 @@ +# Copyright (c) 2024 - 2024, Oracle and/or its affiliates. All rights reserved. +# Licensed under the Universal Permissive License v 1.0 as shown at https://oss.oracle.com/licenses/upl/. + +description: | + Analyzing with PURL and repository path without dependency resolution. + +tags: +- macaron-python-package + +steps: +- name: Run macaron analyze + kind: analyze + options: + command_args: + - -purl + - pkg:maven/org.apache.logging.log4j/log4j-core@2.19.0 +- name: Run macaron verify-policy to verify passed/failed checks + kind: verify + options: + policy: policy.dl diff --git a/tests/integration/cases/micronaut-projects_micronaut-core/config.ini b/tests/integration/cases/micronaut-projects_micronaut-core/config.ini index e42c7cdd4..5f5a55c82 100644 --- a/tests/integration/cases/micronaut-projects_micronaut-core/config.ini +++ b/tests/integration/cases/micronaut-projects_micronaut-core/config.ini @@ -7,7 +7,4 @@ exclude = # temporarily because provenances have failed to publish due to an issue in `generator_generic_slsa3.yml@v1.9.0`: # https://github.com/slsa-framework/slsa-github-generator/issues/3350 mcn_provenance_available_1 - # Exclude `mcn_infer_artifact_pipeline_1`, due to a non-deterministic behavior in deploy command detection, - # which will be fixed in PR #673. - mcn_infer_artifact_pipeline_1 include = * diff --git a/tests/integration/cases/micronaut-projects_micronaut-test/micronaut-test.dl b/tests/integration/cases/micronaut-projects_micronaut-test/micronaut-test.dl index 35e35f7ee..2e6da73d8 100644 --- a/tests/integration/cases/micronaut-projects_micronaut-test/micronaut-test.dl +++ b/tests/integration/cases/micronaut-projects_micronaut-test/micronaut-test.dl @@ -14,7 +14,7 @@ Policy("test_policy", component_id, "") :- build_tool_check(gradle_id, "gradle", "java"), check_facts(gradle_id, _, component_id,_,_), check_passed(component_id, "mcn_provenance_level_three_1"), - check_failed(component_id, "mcn_infer_artifact_pipeline_1"), + check_passed(component_id, "mcn_find_artifact_pipeline_1"), check_failed(component_id, "mcn_provenance_derived_commit_1"), check_failed(component_id, "mcn_provenance_witness_level_one_1"), check_failed(component_id, "mcn_trusted_builder_level_three_1"), diff --git a/tests/integration/cases/onu-ui_onu-ui_pnpm/policy.dl b/tests/integration/cases/onu-ui_onu-ui_pnpm/policy.dl index 37005b017..56b09f46a 100644 --- a/tests/integration/cases/onu-ui_onu-ui_pnpm/policy.dl +++ b/tests/integration/cases/onu-ui_onu-ui_pnpm/policy.dl @@ -8,7 +8,7 @@ Policy("test_policy", component_id, "") :- check_passed(component_id, "mcn_build_script_1"), check_passed(component_id, "mcn_build_service_1"), check_passed(component_id, "mcn_version_control_system_1"), - check_failed(component_id, "mcn_infer_artifact_pipeline_1"), + check_failed(component_id, "mcn_find_artifact_pipeline_1"), check_failed(component_id, "mcn_provenance_available_1"), check_failed(component_id, "mcn_provenance_derived_commit_1"), check_failed(component_id, "mcn_provenance_derived_repo_1"), diff --git a/tests/integration/cases/purl_of_nonexistent_artifact/policy.dl b/tests/integration/cases/purl_of_nonexistent_artifact/policy.dl index a7a8fc53b..e0ae8d6c7 100644 --- a/tests/integration/cases/purl_of_nonexistent_artifact/policy.dl +++ b/tests/integration/cases/purl_of_nonexistent_artifact/policy.dl @@ -7,7 +7,7 @@ Policy("test_policy", component_id, "") :- check_failed(component_id, "mcn_build_as_code_1"), check_failed(component_id, "mcn_build_script_1"), check_failed(component_id, "mcn_build_service_1"), - check_failed(component_id, "mcn_infer_artifact_pipeline_1"), + check_failed(component_id, "mcn_find_artifact_pipeline_1"), check_failed(component_id, "mcn_provenance_available_1"), check_failed(component_id, "mcn_provenance_derived_commit_1"), check_failed(component_id, "mcn_provenance_derived_repo_1"), diff --git a/tests/integration/cases/semver/policy.dl b/tests/integration/cases/semver/policy.dl index 9412c57f6..717062b48 100644 --- a/tests/integration/cases/semver/policy.dl +++ b/tests/integration/cases/semver/policy.dl @@ -14,7 +14,11 @@ Policy("test_policy", component_id, "") :- check_passed(component_id, "mcn_provenance_verified_1"), provenance_verified_check(_, build_level, _), build_level = 2, - check_failed(component_id, "mcn_infer_artifact_pipeline_1"), + // The build_as_code check is reporting the integration_release.yaml workflow + // which is not the same as the workflow in the provenance. Therefore, the + // mcn_find_artifact_pipeline_1 check fails, which is a false negative. + // TODO: improve the build_as_code check analysis. + check_failed(component_id, "mcn_find_artifact_pipeline_1"), check_failed(component_id, "mcn_provenance_level_three_1"), check_failed(component_id, "mcn_provenance_witness_level_one_1"), check_failed(component_id, "mcn_trusted_builder_level_three_1"), diff --git a/tests/integration/cases/sigstore_mock/policy.dl b/tests/integration/cases/sigstore_mock/policy.dl index c16d43f7c..b35d2bb4d 100644 --- a/tests/integration/cases/sigstore_mock/policy.dl +++ b/tests/integration/cases/sigstore_mock/policy.dl @@ -12,11 +12,28 @@ Policy("test_policy", component_id, "") :- check_passed(component_id, "mcn_provenance_derived_commit_1"), check_passed(component_id, "mcn_provenance_derived_repo_1"), check_passed(component_id, "mcn_provenance_verified_1"), - check_failed(component_id, "mcn_infer_artifact_pipeline_1"), + check_failed(component_id, "mcn_find_artifact_pipeline_1"), check_failed(component_id, "mcn_provenance_level_three_1"), check_failed(component_id, "mcn_provenance_witness_level_one_1"), check_failed(component_id, "mcn_trusted_builder_level_three_1"), - is_repo_url(component_id, "https://github.com/sigstore/sigstore-js"). + is_repo_url(component_id, "https://github.com/sigstore/sigstore-js"), + // The GitHub API has a retention policy of removing CI run data after 400 days. + // Note that mcn_find_artifact_pipeline_1 fails because it returns UNKNOWN, in this case with low confidence. + // That's why we cannot rely on the check fail here only and need to also check the data gathered by + // the artifact_pipeline_check. + check_failed_with_confidence(component_id, "mcn_find_artifact_pipeline_1", confidence), + confidence = 0.4, + artifact_pipeline_check( + apc_check_id, + "https://github.com/sigstore/sigstore-js/blob/ebdcfdfbdfeb9c9aeee6df53674ef230613629f5/.github/workflows/release.yml", + "release", + "Create Release Pull Request", + _, + 1, // From provenance. + 1, // Run deleted. + 0 // Published before the code was committed. + ), + check_facts(apc_check_id, confidence, component_id,_,_). apply_policy_to("test_policy", component_id) :- is_component(component_id, "pkg:npm/%40sigstore/mock@0.1.0"). diff --git a/tests/integration/cases/sigstore_sget/policy.dl b/tests/integration/cases/sigstore_sget/policy.dl index e7bfb3344..df5c2a294 100644 --- a/tests/integration/cases/sigstore_sget/policy.dl +++ b/tests/integration/cases/sigstore_sget/policy.dl @@ -8,7 +8,7 @@ Policy("test_policy", component_id, "") :- check_passed(component_id, "mcn_build_script_1"), check_passed(component_id, "mcn_build_service_1"), check_passed(component_id, "mcn_version_control_system_1"), - check_failed(component_id, "mcn_infer_artifact_pipeline_1"), + check_failed(component_id, "mcn_find_artifact_pipeline_1"), check_failed(component_id, "mcn_provenance_available_1"), check_failed(component_id, "mcn_provenance_derived_commit_1"), check_failed(component_id, "mcn_provenance_derived_repo_1"), diff --git a/tests/integration/cases/slsa-framework_slsa-verifier/policy.dl b/tests/integration/cases/slsa-framework_slsa-verifier/policy.dl index d9ab6910c..51a2ecb7a 100644 --- a/tests/integration/cases/slsa-framework_slsa-verifier/policy.dl +++ b/tests/integration/cases/slsa-framework_slsa-verifier/policy.dl @@ -16,7 +16,7 @@ Policy("test_policy", component_id, "") :- check_passed(component_id, "mcn_provenance_derived_commit_1"), check_passed(component_id, "mcn_provenance_derived_repo_1"), check_passed(component_id, "mcn_provenance_expectation_1"), - check_failed(component_id, "mcn_infer_artifact_pipeline_1"), + check_failed(component_id, "mcn_find_artifact_pipeline_1"), check_failed(component_id, "mcn_provenance_witness_level_one_1"), is_repo_url(component_id, "https://github.com/slsa-framework/slsa-verifier"). diff --git a/tests/integration/cases/slsa-framework_slsa-verifier_explicit_provenance_provided/policy.dl b/tests/integration/cases/slsa-framework_slsa-verifier_explicit_provenance_provided/policy.dl index 253100908..f9579225c 100644 --- a/tests/integration/cases/slsa-framework_slsa-verifier_explicit_provenance_provided/policy.dl +++ b/tests/integration/cases/slsa-framework_slsa-verifier_explicit_provenance_provided/policy.dl @@ -13,7 +13,7 @@ Policy("test_policy", component_id, "") :- check_passed(component_id, "mcn_provenance_expectation_1"), check_passed(component_id, "mcn_trusted_builder_level_three_1"), check_passed(component_id, "mcn_version_control_system_1"), - check_failed(component_id, "mcn_infer_artifact_pipeline_1"), + check_failed(component_id, "mcn_find_artifact_pipeline_1"), check_failed(component_id, "mcn_provenance_witness_level_one_1"), is_repo_url(component_id, "https://github.com/slsa-framework/slsa-verifier"). diff --git a/tests/integration/cases/snakeyaml_unsupported_git_service/policy.dl b/tests/integration/cases/snakeyaml_unsupported_git_service/policy.dl index 3940f2b8a..dd3b5d280 100644 --- a/tests/integration/cases/snakeyaml_unsupported_git_service/policy.dl +++ b/tests/integration/cases/snakeyaml_unsupported_git_service/policy.dl @@ -8,7 +8,7 @@ Policy("test_policy", component_id, "") :- check_passed(component_id, "mcn_version_control_system_1"), check_failed(component_id, "mcn_build_as_code_1"), check_failed(component_id, "mcn_build_service_1"), - check_failed(component_id, "mcn_infer_artifact_pipeline_1"), + check_failed(component_id, "mcn_find_artifact_pipeline_1"), check_failed(component_id, "mcn_provenance_available_1"), check_failed(component_id, "mcn_provenance_derived_commit_1"), check_failed(component_id, "mcn_provenance_derived_repo_1"), diff --git a/tests/integration/cases/timyarkov_docker_test/policy.dl b/tests/integration/cases/timyarkov_docker_test/policy.dl index 1d8efaec1..599dcc138 100644 --- a/tests/integration/cases/timyarkov_docker_test/policy.dl +++ b/tests/integration/cases/timyarkov_docker_test/policy.dl @@ -11,7 +11,7 @@ Policy("test_policy", component_id, "") :- check_passed(component_id, "mcn_build_tool_1"), build_tool_check(docker_id, "docker", "docker"), check_facts(docker_id, _, component_id,_,_), - check_failed(component_id, "mcn_infer_artifact_pipeline_1"), + check_failed(component_id, "mcn_find_artifact_pipeline_1"), check_failed(component_id, "mcn_provenance_available_1"), check_failed(component_id, "mcn_provenance_derived_commit_1"), check_failed(component_id, "mcn_provenance_derived_repo_1"), diff --git a/tests/integration/cases/timyarkov_multibuild_test_maven/policy.dl b/tests/integration/cases/timyarkov_multibuild_test_maven/policy.dl index d70078002..90c4f2339 100644 --- a/tests/integration/cases/timyarkov_multibuild_test_maven/policy.dl +++ b/tests/integration/cases/timyarkov_multibuild_test_maven/policy.dl @@ -13,7 +13,7 @@ Policy("test_policy", component_id, "") :- check_facts(gradle_id, _, component_id,_,_), build_tool_check(maven_id, "maven", "java"), check_facts(maven_id, _, component_id,_,_), - check_failed(component_id, "mcn_infer_artifact_pipeline_1"), + check_failed(component_id, "mcn_find_artifact_pipeline_1"), check_failed(component_id, "mcn_provenance_available_1"), check_failed(component_id, "mcn_provenance_derived_commit_1"), check_failed(component_id, "mcn_provenance_derived_repo_1"), diff --git a/tests/integration/cases/uiv-lib_uiv/policy.dl b/tests/integration/cases/uiv-lib_uiv/policy.dl index 3823d052d..ae20ef440 100644 --- a/tests/integration/cases/uiv-lib_uiv/policy.dl +++ b/tests/integration/cases/uiv-lib_uiv/policy.dl @@ -11,7 +11,7 @@ Policy("test_policy", component_id, "") :- check_passed(component_id, "mcn_build_tool_1"), build_tool_check(npm_id, "npm", "javascript"), check_facts(npm_id, _, component_id,_,_), - check_failed(component_id, "mcn_infer_artifact_pipeline_1"), + check_failed(component_id, "mcn_find_artifact_pipeline_1"), check_failed(component_id, "mcn_provenance_available_1"), check_failed(component_id, "mcn_provenance_derived_commit_1"), check_failed(component_id, "mcn_provenance_derived_repo_1"), diff --git a/tests/integration/cases/urllib3_expectation_dir/policy.dl b/tests/integration/cases/urllib3_expectation_dir/policy.dl index dfa3d0d4a..2ba5f9dbe 100644 --- a/tests/integration/cases/urllib3_expectation_dir/policy.dl +++ b/tests/integration/cases/urllib3_expectation_dir/policy.dl @@ -16,10 +16,27 @@ Policy("test_policy", component_id, "") :- check_passed(component_id, "mcn_build_tool_1"), build_tool_check(pip_id, "pip", "python"), check_facts(pip_id, _, component_id,_,_), - check_failed(component_id, "mcn_infer_artifact_pipeline_1"), + check_failed(component_id, "mcn_find_artifact_pipeline_1"), check_failed(component_id, "mcn_provenance_witness_level_one_1"), check_failed(component_id, "mcn_trusted_builder_level_three_1"), - is_repo_url(component_id, "https://github.com/urllib3/urllib3"). + is_repo_url(component_id, "https://github.com/urllib3/urllib3"), + // The GitHub API has a retention policy of removing CI run data after 400 days. + // Note that mcn_find_artifact_pipeline_1 fails because it returns UNKNOWN, in this case with low confidence. + // That's why we cannot rely on the check fail here only and need to also check the data gathered by + // the artifact_pipeline_check. + check_failed_with_confidence(component_id, "mcn_find_artifact_pipeline_1", confidence), + confidence = 0.4, + artifact_pipeline_check( + apc_check_id, + "https://github.com/urllib3/urllib3/blob/612cead3f9704716f4ab2a1334a16e0f05fce942/.github/workflows/publish.yml", + "publish", + "Publish dists to PyPI", + _, + 1, // From provenance. + 1, // Run deleted. + 0 // Published before the code was committed. + ), + check_facts(apc_check_id, confidence, component_id,_,_). apply_policy_to("test_policy", component_id) :- is_component(component_id, "pkg:pypi/urllib3@2.0.0a1"). diff --git a/tests/integration/cases/urllib3_expectation_file/policy.dl b/tests/integration/cases/urllib3_expectation_file/policy.dl index 79bfae7ee..3fea375f2 100644 --- a/tests/integration/cases/urllib3_expectation_file/policy.dl +++ b/tests/integration/cases/urllib3_expectation_file/policy.dl @@ -13,7 +13,7 @@ Policy("test_policy", component_id, "") :- check_passed(component_id, "mcn_provenance_derived_repo_1"), check_passed(component_id, "mcn_provenance_expectation_1"), check_passed(component_id, "mcn_provenance_level_three_1"), - check_failed(component_id, "mcn_infer_artifact_pipeline_1"), + check_failed(component_id, "mcn_find_artifact_pipeline_1"), check_failed(component_id, "mcn_provenance_witness_level_one_1"), check_failed(component_id, "mcn_trusted_builder_level_three_1"), is_repo_url(component_id, "https://github.com/urllib3/urllib3"). diff --git a/tests/integration/cases/urllib3_invalid_expectation/policy.dl b/tests/integration/cases/urllib3_invalid_expectation/policy.dl index e8a017826..48dd5adc2 100644 --- a/tests/integration/cases/urllib3_invalid_expectation/policy.dl +++ b/tests/integration/cases/urllib3_invalid_expectation/policy.dl @@ -12,7 +12,7 @@ Policy("test_policy", component_id, "") :- check_passed(component_id, "mcn_provenance_derived_commit_1"), check_passed(component_id, "mcn_provenance_derived_repo_1"), check_passed(component_id, "mcn_provenance_level_three_1"), - check_failed(component_id, "mcn_infer_artifact_pipeline_1"), + check_failed(component_id, "mcn_find_artifact_pipeline_1"), check_failed(component_id, "mcn_provenance_witness_level_one_1"), check_failed(component_id, "mcn_trusted_builder_level_three_1"), check_failed(component_id, "mcn_provenance_expectation_1"), diff --git a/tests/integration/cases/wojtekmaj_reactpdf_yarn_modern/policy.dl b/tests/integration/cases/wojtekmaj_reactpdf_yarn_modern/policy.dl index f24ebd492..0ac7956cb 100644 --- a/tests/integration/cases/wojtekmaj_reactpdf_yarn_modern/policy.dl +++ b/tests/integration/cases/wojtekmaj_reactpdf_yarn_modern/policy.dl @@ -8,7 +8,7 @@ Policy("test_policy", component_id, "") :- check_passed(component_id, "mcn_build_script_1"), check_passed(component_id, "mcn_build_service_1"), check_passed(component_id, "mcn_version_control_system_1"), - check_failed(component_id, "mcn_infer_artifact_pipeline_1"), + check_failed(component_id, "mcn_find_artifact_pipeline_1"), check_failed(component_id, "mcn_provenance_available_1"), check_failed(component_id, "mcn_provenance_derived_commit_1"), check_failed(component_id, "mcn_provenance_derived_repo_1"), diff --git a/tests/repo_finder/test_provenance_finder.py b/tests/repo_finder/test_provenance_finder.py index 3426fed0d..20f2c0ad9 100644 --- a/tests/repo_finder/test_provenance_finder.py +++ b/tests/repo_finder/test_provenance_finder.py @@ -19,7 +19,9 @@ from macaron.slsa_analyzer.git_service.api_client import GhAPIClient from macaron.slsa_analyzer.package_registry import JFrogMavenRegistry, NPMRegistry from macaron.slsa_analyzer.package_registry.jfrog_maven_registry import JFrogMavenAsset, JFrogMavenAssetMetadata +from macaron.slsa_analyzer.provenance.intoto import InTotoV01Payload from macaron.slsa_analyzer.specs.ci_spec import CIInfo +from macaron.slsa_analyzer.specs.inferred_provenance import Provenance from tests.conftest import MockAnalyzeContext @@ -159,6 +161,7 @@ def test_provenance_on_unsupported_ci(macaron_path: Path, service: BaseCIService provenance_assets=[], release={}, provenances=[], + build_info_results=InTotoV01Payload(statement=Provenance().payload), ) # Set up the context object with provenances. @@ -182,6 +185,7 @@ def test_provenance_on_supported_ci(macaron_path: Path, test_dir: Path) -> None: provenance_assets=[], release={}, provenances=[], + build_info_results=InTotoV01Payload(statement=Provenance().payload), ) # Set up the context object with provenances. diff --git a/tests/slsa_analyzer/checks/test_build_as_code_check.py b/tests/slsa_analyzer/checks/test_build_as_code_check.py index 2cd4dd0eb..99aba2af1 100644 --- a/tests/slsa_analyzer/checks/test_build_as_code_check.py +++ b/tests/slsa_analyzer/checks/test_build_as_code_check.py @@ -24,7 +24,9 @@ ) from macaron.slsa_analyzer.ci_service.github_actions.github_actions_ci import GitHubActions from macaron.slsa_analyzer.ci_service.jenkins import Jenkins +from macaron.slsa_analyzer.provenance.intoto import InTotoV01Payload from macaron.slsa_analyzer.specs.ci_spec import CIInfo +from macaron.slsa_analyzer.specs.inferred_provenance import Provenance from tests.conftest import MockAnalyzeContext, build_github_actions_call_graph_for_commands @@ -56,6 +58,7 @@ def test_build_as_code_check_no_callgraph( provenance_assets=[], release={}, provenances=[], + build_info_results=InTotoV01Payload(statement=Provenance().payload), ) use_build_tool = MockAnalyzeContext(macaron_path=macaron_path, output_dir="") use_build_tool.dynamic_data["build_spec"]["tools"] = [build_tools[build_tool_name]] @@ -106,6 +109,7 @@ def test_deploy_commands( provenance_assets=[], release={}, provenances=[], + build_info_results=InTotoV01Payload(statement=Provenance().payload), ) ci_info["service"] = github_actions_service deploy_ctx.dynamic_data["ci_services"] = [ci_info] @@ -143,6 +147,7 @@ def test_gha_workflow_deployment( provenance_assets=[], release={}, provenances=[], + build_info_results=InTotoV01Payload(statement=Provenance().payload), ) workflows_dir = os.path.join(os.path.dirname(os.path.abspath(__file__)), "resources", "github", "workflow_files") @@ -188,6 +193,7 @@ def test_travis_ci_deploy( provenance_assets=[], release={}, provenances=[], + build_info_results=InTotoV01Payload(statement=Provenance().payload), ) gradle_deploy = MockAnalyzeContext(macaron_path=macaron_path, output_dir="") gradle_deploy.component.repository.fs_path = str(repo_path.absolute()) @@ -208,6 +214,7 @@ def test_multibuild_facts_saved( provenance_assets=[], release={}, provenances=[], + build_info_results=InTotoV01Payload(statement=Provenance().payload), ) multi_build = MockAnalyzeContext(macaron_path=macaron_path, output_dir="") diff --git a/tests/slsa_analyzer/checks/test_build_service_check.py b/tests/slsa_analyzer/checks/test_build_service_check.py index 74ee6e933..0f5d9aeb0 100644 --- a/tests/slsa_analyzer/checks/test_build_service_check.py +++ b/tests/slsa_analyzer/checks/test_build_service_check.py @@ -14,7 +14,9 @@ from macaron.slsa_analyzer.checks.check_result import CheckResultType from macaron.slsa_analyzer.ci_service.base_ci_service import BaseCIService from macaron.slsa_analyzer.ci_service.github_actions.github_actions_ci import GitHubActions +from macaron.slsa_analyzer.provenance.intoto import InTotoV01Payload from macaron.slsa_analyzer.specs.ci_spec import CIInfo +from macaron.slsa_analyzer.specs.inferred_provenance import Provenance from tests.conftest import MockAnalyzeContext, build_github_actions_call_graph_for_commands @@ -46,6 +48,7 @@ def test_build_service_check_no_callgraph( provenance_assets=[], release={}, provenances=[], + build_info_results=InTotoV01Payload(statement=Provenance().payload), ) use_build_tool = MockAnalyzeContext(macaron_path=macaron_path, output_dir="") use_build_tool.dynamic_data["build_spec"]["tools"] = [build_tools[build_tool_name]] @@ -96,6 +99,7 @@ def test_packaging_commands( provenance_assets=[], release={}, provenances=[], + build_info_results=InTotoV01Payload(statement=Provenance().payload), ) ci_info["service"] = github_actions_service package_ctx.dynamic_data["ci_services"] = [ci_info] @@ -114,6 +118,7 @@ def test_multibuild_facts_saved( provenance_assets=[], release={}, provenances=[], + build_info_results=InTotoV01Payload(statement=Provenance().payload), ) multi_build = MockAnalyzeContext(macaron_path=macaron_path, output_dir="") diff --git a/tests/slsa_analyzer/checks/test_infer_artifact_pipeline.py b/tests/slsa_analyzer/checks/test_infer_artifact_pipeline.py index f38874f1a..65f8a6042 100644 --- a/tests/slsa_analyzer/checks/test_infer_artifact_pipeline.py +++ b/tests/slsa_analyzer/checks/test_infer_artifact_pipeline.py @@ -1,4 +1,4 @@ -# Copyright (c) 2023 - 2023, Oracle and/or its affiliates. All rights reserved. +# Copyright (c) 2023 - 2024, Oracle and/or its affiliates. All rights reserved. # Licensed under the Universal Permissive License v 1.0 as shown at https://oss.oracle.com/licenses/upl/. """This module contains tests for the Infer Artifact Pipeline check.""" @@ -9,20 +9,25 @@ from macaron.database.table_definitions import Repository from macaron.slsa_analyzer.checks.check_result import CheckResultType -from macaron.slsa_analyzer.checks.infer_artifact_pipeline_check import InferArtifactPipelineCheck +from macaron.slsa_analyzer.checks.infer_artifact_pipeline_check import ArtifactPipelineCheck from tests.conftest import MockAnalyzeContext +RESOURCE_PATH = Path(__file__).parent.joinpath("resources") + @pytest.mark.parametrize( ("repository", "expected"), [ (None, CheckResultType.FAILED), - (Repository(complete_name="github.com/package-url/purl-spec"), CheckResultType.FAILED), + ( + Repository(complete_name="github.com/package-url/purl-spec", commit_date="2024-01-01T01:01:01+00:00"), + CheckResultType.FAILED, + ), ], ) -def test_infer_artifact_pipeline(macaron_path: Path, repository: Repository, expected: str) -> None: +def test_artifact_pipeline_errors(macaron_path: Path, repository: Repository, expected: str) -> None: """Test that the check handles repositories correctly.""" - check = InferArtifactPipelineCheck() + check = ArtifactPipelineCheck() # Set up the context object with provenances. ctx = MockAnalyzeContext(macaron_path=macaron_path, output_dir="") diff --git a/tests/slsa_analyzer/checks/test_provenance_l3_check.py b/tests/slsa_analyzer/checks/test_provenance_l3_check.py index 6f6220051..0715d33c1 100644 --- a/tests/slsa_analyzer/checks/test_provenance_l3_check.py +++ b/tests/slsa_analyzer/checks/test_provenance_l3_check.py @@ -13,7 +13,9 @@ from macaron.slsa_analyzer.ci_service.jenkins import Jenkins from macaron.slsa_analyzer.ci_service.travis import Travis from macaron.slsa_analyzer.git_service.api_client import GhAPIClient, GitHubReleaseAsset +from macaron.slsa_analyzer.provenance.intoto import InTotoV01Payload from macaron.slsa_analyzer.specs.ci_spec import CIInfo +from macaron.slsa_analyzer.specs.inferred_provenance import Provenance from tests.conftest import MockAnalyzeContext from ...macaron_testcase import MacaronTestCase @@ -72,6 +74,7 @@ def test_provenance_l3_check(self) -> None: provenance_assets=[], release={}, provenances=[], + build_info_results=InTotoV01Payload(statement=Provenance().payload), ) # Repo has provenances but no downloaded files. diff --git a/tests/slsa_analyzer/checks/test_provenance_l3_content_check.py b/tests/slsa_analyzer/checks/test_provenance_l3_content_check.py index c04eaf3fe..5decbd16e 100644 --- a/tests/slsa_analyzer/checks/test_provenance_l3_content_check.py +++ b/tests/slsa_analyzer/checks/test_provenance_l3_content_check.py @@ -16,9 +16,11 @@ from macaron.slsa_analyzer.ci_service.travis import Travis from macaron.slsa_analyzer.git_service.api_client import GhAPIClient from macaron.slsa_analyzer.provenance.expectations.cue import CUEExpectation +from macaron.slsa_analyzer.provenance.intoto import InTotoV01Payload from macaron.slsa_analyzer.provenance.loader import load_provenance_payload from macaron.slsa_analyzer.provenance.slsa import SLSAProvenanceData from macaron.slsa_analyzer.specs.ci_spec import CIInfo +from macaron.slsa_analyzer.specs.inferred_provenance import Provenance from tests.conftest import MockAnalyzeContext from ...macaron_testcase import MacaronTestCase @@ -84,6 +86,7 @@ def test_expectation_check(self) -> None: provenance_assets=[], release={}, provenances=[], + build_info_results=InTotoV01Payload(statement=Provenance().payload), ) ctx.dynamic_data["ci_services"] = [ci_info] diff --git a/tests/slsa_analyzer/checks/test_trusted_builder_l3_check.py b/tests/slsa_analyzer/checks/test_trusted_builder_l3_check.py index 88f2ec841..936a98957 100644 --- a/tests/slsa_analyzer/checks/test_trusted_builder_l3_check.py +++ b/tests/slsa_analyzer/checks/test_trusted_builder_l3_check.py @@ -18,7 +18,9 @@ build_call_graph_from_node, ) from macaron.slsa_analyzer.ci_service.github_actions.github_actions_ci import GitHubActions +from macaron.slsa_analyzer.provenance.intoto import InTotoV01Payload from macaron.slsa_analyzer.specs.ci_spec import CIInfo +from macaron.slsa_analyzer.specs.inferred_provenance import Provenance from tests.conftest import MockAnalyzeContext @@ -49,6 +51,7 @@ def test_trusted_builder_l3_check( provenance_assets=[], release={}, provenances=[], + build_info_results=InTotoV01Payload(statement=Provenance().payload), ) ctx = MockAnalyzeContext(macaron_path=macaron_path, output_dir="") diff --git a/tests/slsa_analyzer/ci_service/test_github_actions.py b/tests/slsa_analyzer/ci_service/test_github_actions.py index 59f5a021e..1995c3705 100644 --- a/tests/slsa_analyzer/ci_service/test_github_actions.py +++ b/tests/slsa_analyzer/ci_service/test_github_actions.py @@ -4,6 +4,7 @@ """This module tests GitHub Actions CI service.""" import os +from datetime import datetime, timedelta from pathlib import Path import pytest @@ -110,3 +111,63 @@ def test_gh_get_workflows(github_actions: GitHubActions, mock_repo: Path) -> Non def test_gh_get_workflows_fail_on_jenkins(github_actions: GitHubActions) -> None: """Assert GitHubActions workflow detection not working on Jenkins CI configuration files.""" assert not github_actions.get_workflows(str(jenkins_build)) + + +@pytest.mark.parametrize( + ("started_at", "publish_date_time", "commit_date_time", "time_range", "expected"), + [ + pytest.param( + datetime.now(), + datetime.now() - timedelta(hours=1), + datetime.now() + timedelta(minutes=10), + 3600, + False, + id="Publish time before CI start time.", + ), + pytest.param( + datetime.now(), + datetime.now() + timedelta(hours=1), + datetime.now() + timedelta(minutes=10), + 3600, + True, + id="Publish time 1h after CI run and source commit happened after CI trigger within acceptable range.", + ), + pytest.param( + datetime.now() - timedelta(hours=1), + datetime.now(), + datetime.now() + timedelta(minutes=10), + 3600, + False, + id="Source commit occurred after the CI run and outside the acceptable time range.", + ), + ], +) +def test_check_publish_start_commit_timestamps( + github_actions: GitHubActions, + started_at: datetime, + publish_date_time: datetime, + commit_date_time: datetime, + time_range: int, + expected: bool, +) -> None: + """Check that a CI run that has happened before the artifact publishing timestamp can be correctly identified.""" + assert ( + github_actions.check_publish_start_commit_timestamps( + started_at, publish_date_time, commit_date_time, time_range + ) + == expected + ) + + +@pytest.mark.parametrize( + ("timestamp", "expected"), + [ + ("2023-02-17T18:50:09+00:00", True), + ("2000-02-17T18:50:09+00:00", True), + ("3000-02-17T18:50:09+00:00", False), + ], +) +def test_workflow_run_deleted(github_actions: GitHubActions, timestamp: str, expected: bool) -> None: + """Test that deleted workflows can be detected.""" + timestamp_obj = datetime.strptime(timestamp, "%Y-%m-%dT%H:%M:%S%z") + assert github_actions.workflow_run_deleted(timestamp=timestamp_obj) == expected diff --git a/tests/slsa_analyzer/package_registry/resources/maven_central_files/empty_log4j-core@3.0.0-beta2-select.json b/tests/slsa_analyzer/package_registry/resources/maven_central_files/empty_log4j-core@3.0.0-beta2-select.json new file mode 100644 index 000000000..0967ef424 --- /dev/null +++ b/tests/slsa_analyzer/package_registry/resources/maven_central_files/empty_log4j-core@3.0.0-beta2-select.json @@ -0,0 +1 @@ +{} diff --git a/tests/slsa_analyzer/package_registry/resources/maven_central_files/invalid_log4j-core@3.0.0-beta2-select.json b/tests/slsa_analyzer/package_registry/resources/maven_central_files/invalid_log4j-core@3.0.0-beta2-select.json new file mode 100644 index 000000000..6fbd853a4 --- /dev/null +++ b/tests/slsa_analyzer/package_registry/resources/maven_central_files/invalid_log4j-core@3.0.0-beta2-select.json @@ -0,0 +1 @@ +{"responseHeader":{"status":0,"QTime":4,"params":{"q":"g:org.apache.logging.log4j AND a:log4j-core AND v:3.0.0-beta2","core":"gav","indent":"off","fl":"id,g,a,v,p,ec,timestamp,tags","start":"","sort":"score desc,timestamp desc,g asc,a asc,v desc","rows":"1","wt":"json","version":"2.2"}},"response":{"numFound":1,"start":0,"docs":[]}} diff --git a/tests/slsa_analyzer/package_registry/resources/maven_central_files/jackson-annotations@2.16.1-select.json b/tests/slsa_analyzer/package_registry/resources/maven_central_files/jackson-annotations@2.16.1-select.json new file mode 100644 index 000000000..0acb881f0 --- /dev/null +++ b/tests/slsa_analyzer/package_registry/resources/maven_central_files/jackson-annotations@2.16.1-select.json @@ -0,0 +1 @@ +{"responseHeader":{"status":0,"QTime":2,"params":{"q":"g:com.fasterxml.jackson.core AND a:jackson-annotations AND v:2.16.1","core":"gav","indent":"off","fl":"id,g,a,v,p,ec,timestamp,tags","start":"","sort":"score desc,timestamp desc,g asc,a asc,v desc","rows":"1","wt":"json","version":"2.2"}},"response":{"numFound":1,"start":0,"docs":[{"id":"com.fasterxml.jackson.core:jackson-annotations:2.16.1","g":"com.fasterxml.jackson.core","a":"jackson-annotations","v":"2.16.1","p":"jar","timestamp":1703390559843,"ec":["-sources.jar",".module",".pom","-javadoc.jar",".jar"],"tags":["core","types","jackson","package","data","annotations","binding","used","value"]}]}} diff --git a/tests/slsa_analyzer/package_registry/resources/maven_central_files/log4j-core@3.0.0-beta2-select.json b/tests/slsa_analyzer/package_registry/resources/maven_central_files/log4j-core@3.0.0-beta2-select.json new file mode 100644 index 000000000..5623a1276 --- /dev/null +++ b/tests/slsa_analyzer/package_registry/resources/maven_central_files/log4j-core@3.0.0-beta2-select.json @@ -0,0 +1 @@ +{"responseHeader":{"status":0,"QTime":4,"params":{"q":"g:org.apache.logging.log4j AND a:log4j-core AND v:3.0.0-beta2","core":"gav","indent":"off","fl":"id,g,a,v,p,ec,timestamp,tags","start":"","sort":"score desc,timestamp desc,g asc,a asc,v desc","rows":"1","wt":"json","version":"2.2"}},"response":{"numFound":1,"start":0,"docs":[{"id":"org.apache.logging.log4j:log4j-core:3.0.0-beta2","g":"org.apache.logging.log4j","a":"log4j-core","v":"3.0.0-beta2","p":"jar","timestamp":1708195809000,"ec":["-sources.jar","-cyclonedx.xml",".pom",".jar"],"tags":["apache","implementation","log4j"]}]}} diff --git a/tests/slsa_analyzer/package_registry/resources/npm_registry_files/_sigstore.mock@0.7.5.json b/tests/slsa_analyzer/package_registry/resources/npm_registry_files/_sigstore.mock@0.7.5.json new file mode 100644 index 000000000..6ceaee958 --- /dev/null +++ b/tests/slsa_analyzer/package_registry/resources/npm_registry_files/_sigstore.mock@0.7.5.json @@ -0,0 +1 @@ +{"version":{"versionKey":{"system":"NPM","name":"@sigstore/mock","version":"0.7.5"},"purl":"pkg:npm/%40sigstore/mock@0.7.5","publishedAt":"2024-06-11T23:49:17Z","isDefault":true,"isDeprecated":false,"licenses":["Apache-2.0"],"licenseDetails":[{"license":"Apache-2.0","spdx":"Apache-2.0"}],"advisoryKeys":[],"links":[{"label":"HOMEPAGE","url":"https://github.com/sigstore/sigstore-js/tree/main/packages/mock#readme"},{"label":"ISSUE_TRACKER","url":"https://github.com/sigstore/sigstore-js/issues"},{"label":"ATTESTATION","url":"https://registry.npmjs.org/-/npm/v1/attestations/@sigstore%2fmock@0.7.5"},{"label":"ORIGIN","url":"https://registry.npmjs.org/@sigstore%2Fmock/0.7.5"},{"label":"SOURCE_REPO","url":"git+https://github.com/sigstore/sigstore-js.git"}],"slsaProvenances":[{"sourceRepository":"https://github.com/sigstore/sigstore-js","commit":"426540e2142edc2aa438e5390b64bdeb3c8f507d","url":"https://registry.npmjs.org/-/npm/v1/attestations/@sigstore%2fmock@0.7.5","verified":true}],"registries":["https://registry.npmjs.org/"],"relatedProjects":[{"projectKey":{"id":"github.com/sigstore/sigstore-js"},"relationProvenance":"UNVERIFIED_METADATA","relationType":"ISSUE_TRACKER"},{"projectKey":{"id":"github.com/sigstore/sigstore-js"},"relationProvenance":"UNVERIFIED_METADATA","relationType":"SOURCE_REPO"},{"projectKey":{"id":"github.com/sigstore/sigstore-js"},"relationProvenance":"SLSA_ATTESTATION","relationType":"SOURCE_REPO"}],"upstreamIdentifiers":[{"packageName":"@sigstore/mock","versionString":"0.7.5","source":"NPM_NPMJS_ORG"}]}} diff --git a/tests/slsa_analyzer/package_registry/resources/npm_registry_files/empty_sigstore.mock@0.7.5.json b/tests/slsa_analyzer/package_registry/resources/npm_registry_files/empty_sigstore.mock@0.7.5.json new file mode 100644 index 000000000..e69de29bb diff --git a/tests/slsa_analyzer/package_registry/resources/npm_registry_files/invalid_sigstore.mock@0.7.5.json b/tests/slsa_analyzer/package_registry/resources/npm_registry_files/invalid_sigstore.mock@0.7.5.json new file mode 100644 index 000000000..9e53589f5 --- /dev/null +++ b/tests/slsa_analyzer/package_registry/resources/npm_registry_files/invalid_sigstore.mock@0.7.5.json @@ -0,0 +1 @@ +{"version":{"versionKey":{"system":"NPM","name":"@sigstore/mock","version":"0.7.5"},"purl":"pkg:npm/%40sigstore/mock@0.7.5","isDefault":true,"isDeprecated":false,"licenses":["Apache-2.0"],"licenseDetails":[{"license":"Apache-2.0","spdx":"Apache-2.0"}],"advisoryKeys":[],"links":[{"label":"HOMEPAGE","url":"https://github.com/sigstore/sigstore-js/tree/main/packages/mock#readme"},{"label":"ISSUE_TRACKER","url":"https://github.com/sigstore/sigstore-js/issues"},{"label":"ATTESTATION","url":"https://registry.npmjs.org/-/npm/v1/attestations/@sigstore%2fmock@0.7.5"},{"label":"ORIGIN","url":"https://registry.npmjs.org/@sigstore%2Fmock/0.7.5"},{"label":"SOURCE_REPO","url":"git+https://github.com/sigstore/sigstore-js.git"}],"slsaProvenances":[{"sourceRepository":"https://github.com/sigstore/sigstore-js","commit":"426540e2142edc2aa438e5390b64bdeb3c8f507d","url":"https://registry.npmjs.org/-/npm/v1/attestations/@sigstore%2fmock@0.7.5","verified":true}],"registries":["https://registry.npmjs.org/"],"relatedProjects":[{"projectKey":{"id":"github.com/sigstore/sigstore-js"},"relationProvenance":"UNVERIFIED_METADATA","relationType":"ISSUE_TRACKER"},{"projectKey":{"id":"github.com/sigstore/sigstore-js"},"relationProvenance":"UNVERIFIED_METADATA","relationType":"SOURCE_REPO"},{"projectKey":{"id":"github.com/sigstore/sigstore-js"},"relationProvenance":"SLSA_ATTESTATION","relationType":"SOURCE_REPO"}],"upstreamIdentifiers":[{"packageName":"@sigstore/mock","versionString":"0.7.5","source":"NPM_NPMJS_ORG"}]}} diff --git a/tests/slsa_analyzer/package_registry/test_maven_central_registry.py b/tests/slsa_analyzer/package_registry/test_maven_central_registry.py new file mode 100644 index 000000000..8a0287b36 --- /dev/null +++ b/tests/slsa_analyzer/package_registry/test_maven_central_registry.py @@ -0,0 +1,249 @@ +# Copyright (c) 2023 - 2024, Oracle and/or its affiliates. All rights reserved. +# Licensed under the Universal Permissive License v 1.0 as shown at https://oss.oracle.com/licenses/upl/. + +"""Tests for the Maven Central registry.""" + +import json +import os +import urllib.parse +from datetime import datetime +from pathlib import Path + +import pytest +from pytest_httpserver import HTTPServer + +from macaron.config.defaults import load_defaults +from macaron.errors import ConfigurationError, InvalidHTTPResponseError +from macaron.slsa_analyzer.build_tool.base_build_tool import BaseBuildTool +from macaron.slsa_analyzer.package_registry.maven_central_registry import MavenCentralRegistry + + +@pytest.fixture(name="resources_path") +def resources() -> Path: + """Create the resources path.""" + return Path(__file__).parent.joinpath("resources") + + +@pytest.fixture(name="maven_central") +def maven_central_instance() -> MavenCentralRegistry: + """Create a ``MavenCentralRegistry`` object for the following tests.""" + return MavenCentralRegistry( + search_netloc="search.maven.org", + search_scheme="https", + search_endpoint="solrsearch/select", + registry_url_netloc="repo1.maven.org/maven2", + registry_url_scheme="https", + ) + + +def test_load_defaults(tmp_path: Path) -> None: + """Test the ``load_defaults`` method.""" + user_config_path = os.path.join(tmp_path, "config.ini") + user_config_input = """ + [package_registry.maven_central] + search_netloc = search.maven.test + search_scheme = http + search_endpoint = test + registry_url_netloc = test.repo1.maven.org/maven2 + registry_url_scheme = http + request_timeout = 5 + """ + with open(user_config_path, "w", encoding="utf-8") as user_config_file: + user_config_file.write(user_config_input) + + # We don't have to worry about modifying the ``defaults`` object causing test + # pollution here, since we reload the ``defaults`` object before every test with the + # ``setup_test`` fixture. + load_defaults(user_config_path) + + maven_central = MavenCentralRegistry() + maven_central.load_defaults() + assert maven_central.search_netloc == "search.maven.test" + assert maven_central.search_scheme == "http" + assert maven_central.search_endpoint == "test" + assert maven_central.registry_url_netloc == "test.repo1.maven.org/maven2" + assert maven_central.registry_url_scheme == "http" + + +def test_load_defaults_without_maven_central_config() -> None: + """Test the ``load_defaults`` method in trivial case when no config is given.""" + maven_central = MavenCentralRegistry() + maven_central.load_defaults() + + +@pytest.mark.parametrize( + ("user_config_input"), + [ + pytest.param( + """ + [package_registry.maven_central] + search_netloc = + """, + id="Missing search netloc", + ), + pytest.param( + """ + [package_registry.maven_central] + search_endpoint = + """, + id="Missing search endpoint", + ), + pytest.param( + """ + [package_registry.maven_central] + request_timeout = foo + """, + id="Invalid value for request_timeout", + ), + ], +) +def test_load_defaults_with_invalid_config(tmp_path: Path, user_config_input: str) -> None: + """Test the ``load_defaults`` method in case the config is invalid.""" + user_config_path = os.path.join(tmp_path, "config.ini") + with open(user_config_path, "w", encoding="utf-8") as user_config_file: + user_config_file.write(user_config_input) + + # We don't have to worry about modifying the ``defaults`` object causing test + # pollution here, since we reload the ``defaults`` object before every test with the + # ``setup_test`` fixture. + load_defaults(user_config_path) + + maven_central = MavenCentralRegistry() + with pytest.raises(ConfigurationError): + maven_central.load_defaults() + + +@pytest.mark.parametrize( + ("build_tool_name", "expected_result"), + [ + ("maven", True), + ("gradle", True), + ("pip", False), + ("poetry", False), + ], +) +def test_is_detected( + maven_central: MavenCentralRegistry, + build_tools: dict[str, BaseBuildTool], + build_tool_name: str, + expected_result: bool, +) -> None: + """Test the ``is_detected`` method.""" + assert maven_central.is_detected(build_tools[build_tool_name]) == expected_result + + +@pytest.mark.parametrize( + ("purl", "mc_json_path", "query_string", "expected_timestamp"), + [ + ( + "pkg:maven/org.apache.logging.log4j/log4j-core@3.0.0-beta2", + "log4j-core@3.0.0-beta2-select.json", + "q=g:org.apache.logging.log4j+AND+a:log4j-core+AND+v:3.0.0-beta2&core=gav&rows=1&wt=json", + "2024-02-17T18:50:09+00:00", + ), + ( + "pkg:maven/com.fasterxml.jackson.core/jackson-annotations@2.16.1", + "jackson-annotations@2.16.1-select.json", + "q=g:com.fasterxml.jackson.core+AND+a:jackson-annotations+AND+v:2.16.1&core=gav&rows=1&wt=json", + "2023-12-24T04:02:40+00:00", + ), + ], +) +def test_find_publish_timestamp( + resources_path: Path, + httpserver: HTTPServer, + tmp_path: Path, + purl: str, + mc_json_path: str, + query_string: str, + expected_timestamp: str, +) -> None: + """Test that the function finds the timestamp correctly.""" + base_url_parsed = urllib.parse.urlparse(httpserver.url_for("")) + + maven_central = MavenCentralRegistry() + + # Set up responses of solrsearch endpoints using the httpserver plugin. + user_config_input = f""" + [package_registry.maven_central] + request_timeout = 20 + search_netloc = {base_url_parsed.netloc} + search_scheme = {base_url_parsed.scheme} + """ + user_config_path = os.path.join(tmp_path, "config.ini") + with open(user_config_path, "w", encoding="utf-8") as user_config_file: + user_config_file.write(user_config_input) + # We don't have to worry about modifying the ``defaults`` object causing test + # pollution here, since we reload the ``defaults`` object before every test with the + # ``setup_test`` fixture. + load_defaults(user_config_path) + maven_central.load_defaults() + + with open(os.path.join(resources_path, "maven_central_files", mc_json_path), encoding="utf8") as page: + mc_json_response = json.load(page) + + httpserver.expect_request( + "/solrsearch/select", + query_string=query_string, + ).respond_with_json(mc_json_response) + + publish_time_obj = maven_central.find_publish_timestamp(purl=purl) + expected_time_obj = datetime.strptime(expected_timestamp, "%Y-%m-%dT%H:%M:%S%z") + assert publish_time_obj == expected_time_obj + + +@pytest.mark.parametrize( + ("purl", "mc_json_path", "expected_msg"), + [ + ( + "pkg:maven/org.apache.logging.log4j/log4j-core@3.0.0-beta2", + "empty_log4j-core@3.0.0-beta2-select.json", + "Empty response returned by (.)*", + ), + ( + "pkg:maven/org.apache.logging.log4j/log4j-core@3.0.0-beta2", + "invalid_log4j-core@3.0.0-beta2-select.json", + "The response returned by (.)* misses `response.docs` attribute or it is empty", + ), + ], +) +def test_find_publish_timestamp_errors( + resources_path: Path, + httpserver: HTTPServer, + tmp_path: Path, + purl: str, + mc_json_path: str, + expected_msg: str, +) -> None: + """Test that the function handles errors correctly.""" + base_url_parsed = urllib.parse.urlparse(httpserver.url_for("")) + + maven_central = MavenCentralRegistry() + + # Set up responses of solrsearch endpoints using the httpserver plugin. + user_config_input = f""" + [package_registry.maven_central] + request_timeout = 20 + search_netloc = {base_url_parsed.netloc} + search_scheme = {base_url_parsed.scheme} + """ + user_config_path = os.path.join(tmp_path, "config.ini") + with open(user_config_path, "w", encoding="utf-8") as user_config_file: + user_config_file.write(user_config_input) + # We don't have to worry about modifying the ``defaults`` object causing test + # pollution here, since we reload the ``defaults`` object before every test with the + # ``setup_test`` fixture. + load_defaults(user_config_path) + maven_central.load_defaults() + + with open(os.path.join(resources_path, "maven_central_files", mc_json_path), encoding="utf8") as page: + mc_json_response = json.load(page) + + httpserver.expect_request( + "/solrsearch/select", + query_string="q=g:org.apache.logging.log4j+AND+a:log4j-core+AND+v:3.0.0-beta2&core=gav&rows=1&wt=json", + ).respond_with_json(mc_json_response) + + pat = f"^{expected_msg}" + with pytest.raises(InvalidHTTPResponseError, match=pat): + maven_central.find_publish_timestamp(purl=purl) diff --git a/tests/slsa_analyzer/package_registry/test_npm_registry.py b/tests/slsa_analyzer/package_registry/test_npm_registry.py index b35d423cf..ef4ed893e 100644 --- a/tests/slsa_analyzer/package_registry/test_npm_registry.py +++ b/tests/slsa_analyzer/package_registry/test_npm_registry.py @@ -4,17 +4,25 @@ """Tests for the npm registry.""" import os +from datetime import datetime from pathlib import Path import pytest +from pytest_httpserver import HTTPServer from macaron.config.defaults import load_defaults -from macaron.errors import ConfigurationError +from macaron.errors import ConfigurationError, InvalidHTTPResponseError from macaron.slsa_analyzer.build_tool.base_build_tool import BaseBuildTool from macaron.slsa_analyzer.build_tool.npm import NPM from macaron.slsa_analyzer.package_registry.npm_registry import NPMAttestationAsset, NPMRegistry +@pytest.fixture(name="resources_path") +def resources() -> Path: + """Create the resources path.""" + return Path(__file__).parent.joinpath("resources") + + @pytest.fixture(name="npm_registry") def create_npm_registry() -> NPMRegistry: """Create an npm registry instance.""" @@ -123,3 +131,72 @@ def test_npm_attestation_asset_url( ) assert asset.name == artifact_id assert asset.url == f"https://{npm_registry.hostname}/{npm_registry.attestation_endpoint}/{expected}" + + +@pytest.mark.parametrize( + ("purl", "npm_json_path", "expected_timestamp"), + [ + ( + "pkg:npm/@sigstore/mock@0.7.5", + "_sigstore.mock@0.7.5.json", + "2024-06-11T23:49:17Z", + ), + ], +) +def test_find_publish_timestamp( + resources_path: Path, + httpserver: HTTPServer, + purl: str, + npm_json_path: str, + expected_timestamp: str, +) -> None: + """Test that the function finds the timestamp correctly.""" + registry = NPMRegistry() + + with open(os.path.join(resources_path, "npm_registry_files", npm_json_path), encoding="utf8") as page: + response = page.read() + + httpserver.expect_request( + "/".join(["/v3alpha", "purl", purl]), + ).respond_with_data(response) + + publish_time_obj = registry.find_publish_timestamp(purl=purl, registry_url=httpserver.url_for("")) + expected_time_obj = datetime.strptime(expected_timestamp, "%Y-%m-%dT%H:%M:%S%z") + assert publish_time_obj == expected_time_obj + + +@pytest.mark.parametrize( + ("purl", "npm_json_path", "expected_msg"), + [ + ( + "pkg:npm/@sigstore/mock@0.7.5", + "empty_sigstore.mock@0.7.5.json", + "Invalid response from deps.dev for (.)*", + ), + ( + "pkg:npm/@sigstore/mock@0.7.5", + "invalid_sigstore.mock@0.7.5.json", + "The timestamp is missing in the response returned by", + ), + ], +) +def test_find_publish_timestamp_errors( + resources_path: Path, + httpserver: HTTPServer, + purl: str, + npm_json_path: str, + expected_msg: str, +) -> None: + """Test that the function handles errors correctly.""" + registry = NPMRegistry() + + with open(os.path.join(resources_path, "npm_registry_files", npm_json_path), encoding="utf8") as page: + response = page.read() + + httpserver.expect_request( + "/".join(["/v3alpha", "purl", purl]), + ).respond_with_data(response) + + pat = f"^{expected_msg}" + with pytest.raises(InvalidHTTPResponseError, match=pat): + registry.find_publish_timestamp(purl=purl, registry_url=httpserver.url_for("")) diff --git a/tests/slsa_analyzer/test_analyze_context.py b/tests/slsa_analyzer/test_analyze_context.py index 7328b862e..c9bb23c4e 100644 --- a/tests/slsa_analyzer/test_analyze_context.py +++ b/tests/slsa_analyzer/test_analyze_context.py @@ -10,10 +10,11 @@ from macaron.json_tools import JsonType from macaron.slsa_analyzer.asset import VirtualReleaseAsset from macaron.slsa_analyzer.ci_service.github_actions.github_actions_ci import GitHubActions -from macaron.slsa_analyzer.provenance.intoto import validate_intoto_payload +from macaron.slsa_analyzer.provenance.intoto import InTotoV01Payload, validate_intoto_payload from macaron.slsa_analyzer.provenance.slsa import SLSAProvenanceData from macaron.slsa_analyzer.slsa_req import ReqName, SLSAReqStatus from macaron.slsa_analyzer.specs.ci_spec import CIInfo +from macaron.slsa_analyzer.specs.inferred_provenance import Provenance from tests.conftest import MockAnalyzeContext @@ -100,6 +101,7 @@ def test_provenances(self) -> None: payload=expected_payload, asset=VirtualReleaseAsset(name="No_ASSET", url="NO_URL", size_in_bytes=0) ), ], + build_info_results=InTotoV01Payload(statement=Provenance().payload), ) self.analyze_ctx.dynamic_data["ci_services"].append(gh_actions_ci_info)