Skip to content

feat!: introduce confidence scores for check facts #620

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Feb 15, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
649 changes: 385 additions & 264 deletions docs/source/assets/er-diagram.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
209 changes: 208 additions & 1 deletion docs/source/pages/developers_guide/index.rst
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
.. Copyright (c) 2023 - 2023, Oracle and/or its affiliates. All rights reserved.
.. Copyright (c) 2023 - 2024, Oracle and/or its affiliates. All rights reserved.
.. Licensed under the Universal Permissive License v 1.0 as shown at https://oss.oracle.com/licenses/upl/.

=========================
Expand All @@ -11,6 +11,213 @@ To follow the project's code style, see the :doc:`Macaron Style Guide </pages/de

For API reference, see the :doc:`API Reference </pages/developers_guide/apidoc/index>` page.

-------------------
Writing a New Check
-------------------

Contributors to Macaron are very likely to need to write a new check or modify an existing one at some point. In this
section, we will explain how Macaron checks work. We will also show how to develop a new check.

+++++++++++++++++
High-level Design
+++++++++++++++++

Before jumping into coding, it is useful to understand how Macaron as a framework works. Macaron is an extensible
framework designed to make writing new supply chain security analyses easy. It provides an interface
that you can leverage to access existing models and abstractions instead of implementing everything from scratch. For
instance, many security checks require traversing through the code in GitHub Actions configurations. Normally,
you would need to find the right repository and commit, clone it, find the workflows, and parse them. With Macaron,
you don't need to do any of that and can simply write your security check by using the parsed shell scripts that are
triggered in the CI.

Another important aspect of our design is that all the check results are automatically mapped and stored in a local database.
By performing this mapping, we make it possible to enforce use case-specific policies on the results of the checks. While storing
the check results in the database happens automatically in Macaron's backend, the developer needs to add a brief specification
to make that possible as we will see later.

Once you get familiar with writing a basic check, you can explore the check dependency feature in Macaron. The checks
in our framework can be customized to only run if another check has run and returned a specific
:class:`result type <macaron.slsa_analyzer.checks.check_result.CheckResultType>`. This feature can be used when checks
have an ordering and a parent-child relationship, i.e., one check implements a weaker or stronger version of a
security property in a parent check. Therefore, it might make sense to skip running the check and report a
:class:`result type <macaron.slsa_analyzer.checks.check_result.CheckResultType>` based on the result of the parent check.

+++++++++++++++++++
The Check Interface
+++++++++++++++++++

Each check needs to be implemented as a Python class in a Python module under ``src/macaron/slsa_analyzer/checks``.
A check class should subclass the :class:`BaseCheck <macaron.slsa_analyzer.checks.base_check.BaseCheck>` class.

The main logic of a check should be implemented in the :func:`run_check <macaron.slsa_analyzer.checks.base_check.BaseCheck.run_check>` abstract method. It is important to understand the input
parameters and output objects computed by this method.

.. code-block: python
def run_check(self, ctx: AnalyzeContext) -> CheckResultData:

''''''''''''''''
Input Parameters
''''''''''''''''

The :func:`run_check <macaron.slsa_analyzer.checks.base_check.BaseCheck.run_check>` method is a callback called by our checker framework. The framework pre-computes a context object,
:class:`ctx: AnalyzeContext <macaron.slsa_analyzer.analyze_context.AnalyzeContext>` and makes it available as the input
parameter to the function. The ``ctx`` object contains various intermediate representations and models as the input parameter.
Most likely, you will need to use the following properties:

* :attr:`component <macaron.slsa_analyzer.analyze_context.AnalyzeContext.component>`
* :attr:`dynamic_data <macaron.slsa_analyzer.analyze_context.AnalyzeContext.dynamic_data>`

The :attr:`component <macaron.slsa_analyzer.analyze_context.AnalyzeContext.component>`
object acts as a representation of a software component and contains data, such as it's
corresponding :class:`Repository <macaron.database.table_definitions.Repository>` and
:data:`dependencies <macaron.database.table_definitions.components_association_table>`.
Note that :attr:`component <macaron.slsa_analyzer.analyze_context.AnalyzeContext.component>` will also be stored
in the database and its attributes, such as :attr:`repository <macaron.database.table_definitions.Component.repository>`
are established as database relationships. You can see the existing tables and their relationships
in our :mod:`data model <macaron.database.table_definitions>`.

The :attr:`dynamic_data <macaron.slsa_analyzer.analyze_context.AnalyzeContext.dynamic_data>` property would be particularly useful as it contains
data about the CI service, artifact registry, and build tool used for building the software component.
Note that this object is a shared state among checks. If a check runs before another check, it can
make changes to this object, which will be accessible to the checks run subsequently.

''''''
Output
''''''

The :func:`run_check <macaron.slsa_analyzer.checks.base_check.BaseCheck.run_check>` method returns a :class:`CheckResultData <macaron.slsa_analyzer.checks.check_result.CheckResultData>` object.
This object consists of :attr:`result_tables <macaron.slsa_analyzer.checks.check_result.CheckResultData.result_tables>` and
:attr:`result_type <macaron.slsa_analyzer.checks.check_result.CheckResultData.result_type>`.
The :attr:`result_tables <macaron.slsa_analyzer.checks.check_result.CheckResultData.result_tables>` object is the list of facts generated from the check. The :attr:`result_type <macaron.slsa_analyzer.checks.check_result.CheckResultData.result_type>`
value shows the final result type of the check.

+++++++
Example
+++++++

In this example, we show how to add a check to determine if a software component has a source-code repository.
Note that this is a simple example to just demonstrate how to add a check from scratch.
Feel free to explore other existing checks under ``src/macaron/slsa_analyzer/checks`` for more examples.

As discussed earlier, each check needs to be implemented as a Python class in a Python module under ``src/macaron/slsa_analyzer/checks``.
A check class should subclass the :class:`BaseCheck <macaron.slsa_analyzer.checks.base_check.BaseCheck>` class.

'''''''''''''''
Create a module
'''''''''''''''
First create a module called ``repo_check.py`` under ``src/macaron/slsa_analyzer/checks``.


''''''''''''''''''''''''''''
Add a class for the database
''''''''''''''''''''''''''''

* Add a class that subclasses :class:`CheckFacts <macaron.database.table_definitions.CheckFacts>` to map your outputs to a table in the database. The class name should follow the ``<MyCheck>Facts`` pattern.
* Specify the table name in the ``__tablename__`` class variable. Note that the table name should start with ``_`` and it should not have been used by other checks.
* Add the ``id`` column as the primary key where the foreign key is ``_check_facts.id``.
* Add columns for the check outputs that you would like to store in the database. If a column needs to appear as a justification in the HTML/JSON report, pass ``info={"justification": JustificationType.<TEXT or HREF>}`` to the column mapper.
* Add ``__mapper_args__`` class variable and set ``"polymorphic_identity"`` key to the table name.

.. code-block:: python

# Add this line at the top of the file to create the logger object if you plan to use it.
logger: logging.Logger = logging.getLogger(__name__)


class RepoCheckFacts(CheckFacts):
"""The ORM mapping for justifications in the check repository check."""

__tablename__ = "_repo_check"

#: The primary key.
id: Mapped[int] = mapped_column(ForeignKey("_check_facts.id"), primary_key=True)

#: The Git repository path.
git_repo: Mapped[str] = mapped_column(String, nullable=True, info={"justification": JustificationType.HREF})

__mapper_args__ = {
"polymorphic_identity": "_repo_check",
}

'''''''''''''''''''
Add the check class
'''''''''''''''''''

Add a class for your check that subclasses :class:`BaseCheck <macaron.slsa_analyzer.checks.base_check.BaseCheck>`,
provide the check details in the initializer method, and implement the logic of the check in
:func:`run_check <macaron.slsa_analyzer.checks.base_check.BaseCheck.run_check>`.

A ``check_id`` should match the ``^mcn_([a-z]+_)+([0-9]+)$`` regular expression, which means it should meet the following requirements:

- The general format: ``mcn_<name>_<digits>``.
- Use lowercase alphabetical letters in ``name``. If ``name`` contains multiple words, they must be separated by underscores.

You can set the ``depends_on`` attribute in the initializer method to declare such dependencies. In this example, we leave this list empty.

.. code-block:: python

class RepoCheck(BaseCheck):
"""This Check checks whether the target software component has a source-code repository."""

def __init__(self) -> None:
"""Initialize instance."""
check_id = "mcn_repo_exists_1"
description = "Check whether the target software component has a source-code repository."
depends_on: list[tuple[str, CheckResultType]] = [] # This check doesn't depend on any other checks.
eval_reqs = [
ReqName.VCS
] # Choose a SLSA requirement that roughly matches this check from the ReqName enum class.
super().__init__(check_id=check_id, description=description, depends_on=depends_on, eval_reqs=eval_reqs)

def run_check(self, ctx: AnalyzeContext) -> CheckResultData:
"""Implement the check in this method.

Parameters
----------
ctx : AnalyzeContext
The object containing processed data for the target software component.

Returns
-------
CheckResultData
The result of the check.
"""
if not ctx.component.repository:
logger.info("Unable to find a Git repository for %s", ctx.component.purl)
# We do not store any results in the database if a check fails. So, just leave result_tables empty.
return CheckResultData(result_tables=[], result_type=CheckResultType.FAILED)

return CheckResultData(
result_tables=[RepoCheckFacts(git_repo=ctx.component.repository.remote_path, confidence=Confidence.HIGH)],
result_type=CheckResultType.PASSED,
)

As you can see, the result of the check is returned via the :class:`CheckResultData <macaron.slsa_analyzer.checks.check_result.CheckResultData>` object.
You should specify a :class:`Confidence <macaron.slsa_analyzer.checks.check_result.Confidence>`
score choosing one of the :class:`Confidence <macaron.slsa_analyzer.checks.check_result.Confidence>` enum values,
e.g., :class:`Confidence.HIGH <macaron.slsa_analyzer.checks.check_result.Confidence.HIGH>` and pass it via keyword
argument :attr:`confidence <macaron.database.table_definitions.CheckFacts.confidence>`. You should choose a suitable
confidence score based on the accuracy of your check analysis.

'''''''''''''''''''
Register your check
'''''''''''''''''''

Finally, you need to register your check by adding it to the :mod:`registry module <macaron.slsa_analyzer.registry>` at the end of your check module:

.. code-block:: python

registry.register(RepoCheck())


'''''''''''''''
Test your check
'''''''''''''''

Finally, you can add tests for you check by adding ``tests/slsa_analyzer/checks/test_repo_check.py`` module. Macaron
uses `pytest <https://docs.pytest.org>`_ and `hypothesis <https://hypothesis.readthedocs.io>`_ for testing. Take a look
at other tests for inspiration!

.. toctree::
:maxdepth: 1

Expand Down
88 changes: 25 additions & 63 deletions src/macaron/database/table_definitions.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright (c) 2023 - 2023, Oracle and/or its affiliates. All rights reserved.
# Copyright (c) 2023 - 2024, Oracle and/or its affiliates. All rights reserved.
# Licensed under the Universal Permissive License v 1.0 as shown at https://oss.oracle.com/licenses/upl/.

"""
Expand All @@ -10,23 +10,31 @@

For table associated with a check see the check module.
"""
import hashlib
import logging
import os
import string
from datetime import datetime
from pathlib import Path
from typing import Any, Self
from typing import Any

from packageurl import PackageURL
from sqlalchemy import Boolean, Column, Enum, ForeignKey, Integer, String, Table, UniqueConstraint
from sqlalchemy import (
Boolean,
CheckConstraint,
Column,
Enum,
Float,
ForeignKey,
Integer,
String,
Table,
UniqueConstraint,
)
from sqlalchemy.orm import Mapped, mapped_column, relationship

from macaron.database.database_manager import ORMBase
from macaron.database.rfc3339_datetime import RFC3339DateTime
from macaron.errors import CUEExpectationError, CUERuntimeError, InvalidPURLError
from macaron.slsa_analyzer.provenance.expectations.cue import cue_validator
from macaron.slsa_analyzer.provenance.expectations.expectation import Expectation
from macaron.errors import InvalidPURLError
from macaron.slsa_analyzer.slsa_req import ReqName

logger: logging.Logger = logging.getLogger(__name__)
Expand Down Expand Up @@ -415,6 +423,16 @@ class CheckFacts(ORMBase):
#: The primary key.
id: Mapped[int] = mapped_column(Integer, primary_key=True, autoincrement=True) # noqa: A003

#: The confidence score to estimate the accuracy of the check fact. This value should be in the range [0.0, 1.0] with
#: a lower value depicting a lower confidence. Because some analyses used in checks may use
#: heuristics, the results can be inaccurate in certain cases.
#: We use the confidence score to enable the check designer to assign a confidence estimate.
#: This confidence is stored in the database to be used by the policy. This confidence score is
#: also used to decide which evidence should be shown to the user in the HTML/JSON report.
confidence: Mapped[float] = mapped_column(
Float, CheckConstraint("confidence>=0.0 AND confidence<=1.0"), nullable=False
)

#: The foreign key to the software component.
component_id: Mapped[int] = mapped_column(Integer, ForeignKey("_component.id"), nullable=False)

Expand All @@ -437,62 +455,6 @@ class CheckFacts(ORMBase):
}


class CUEExpectation(Expectation, CheckFacts):
"""ORM Class for an expectation."""

# TODO: provenance content check should store the expectation, its evaluation result,
# and which PROVENANCE it was applied to rather than only linking to the repository.

__tablename__ = "_expectation"

#: The primary key, which is also a foreign key to the base check table.
id: Mapped[int] = mapped_column(ForeignKey("_check_facts.id"), primary_key=True) # noqa: A003

#: The polymorphic inheritance configuration.
__mapper_args__ = {
"polymorphic_identity": "_expectation",
}

@classmethod
def make_expectation(cls, expectation_path: str) -> Self | None:
"""Construct a CUE expectation from a CUE file.

Note: we require the CUE expectation file to have a "target" field.

Parameters
----------
expectation_path: str
The path to the expectation file.

Returns
-------
Self
The instantiated expectation object.
"""
logger.info("Generating an expectation from file %s", expectation_path)
expectation: CUEExpectation = CUEExpectation(
description="CUE expectation",
path=expectation_path,
target="",
expectation_type="CUE",
)

try:
with open(expectation_path, encoding="utf-8") as expectation_file:
expectation.text = expectation_file.read()
expectation.sha = str(hashlib.sha256(expectation.text.encode("utf-8")).hexdigest())
expectation.target = cue_validator.get_target(expectation.text)
expectation._validator = ( # pylint: disable=protected-access
lambda provenance: cue_validator.validate_expectation(expectation.text, provenance)
)
except (OSError, CUERuntimeError, CUEExpectationError) as error:
logger.error("CUE expectation error: %s", error)
return None

# TODO remove type ignore once mypy adds support for Self.
return expectation # type: ignore


class Provenance(ORMBase):
"""ORM class for a provenance document."""

Expand Down
6 changes: 4 additions & 2 deletions src/macaron/policy_engine/souffle_code_generator.py
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
# Copyright (c) 2023 - 2023, Oracle and/or its affiliates. All rights reserved.
# Copyright (c) 2023 - 2024, Oracle and/or its affiliates. All rights reserved.
# Licensed under the Universal Permissive License v 1.0 as shown at https://oss.oracle.com/licenses/upl/.

"""Generate souffle datalog for policy prelude."""

import logging
import os

from sqlalchemy import Column, MetaData, Table
from sqlalchemy import Column, Float, MetaData, Table
from sqlalchemy.sql.sqltypes import Boolean, Integer, String, Text

logger: logging.Logger = logging.getLogger(__name__)
Expand Down Expand Up @@ -81,6 +81,8 @@ def column_to_souffle_type(column: Column) -> str:
souffle_type = "symbol"
elif isinstance(sql_type, Integer):
souffle_type = "number"
elif isinstance(sql_type, Float):
souffle_type = "number"
elif isinstance(sql_type, Text):
souffle_type = "symbol"
elif isinstance(sql_type, Boolean):
Expand Down
Loading