Skip to content
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
649 changes: 385 additions & 264 deletions docs/source/assets/er-diagram.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
158 changes: 157 additions & 1 deletion docs/source/pages/developers_guide/index.rst
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
.. Copyright (c) 2023 - 2023, Oracle and/or its affiliates. All rights reserved.
.. Copyright (c) 2023 - 2024, Oracle and/or its affiliates. All rights reserved.
.. Licensed under the Universal Permissive License v 1.0 as shown at https://oss.oracle.com/licenses/upl/.

=========================
Expand All @@ -11,6 +11,162 @@ To follow the project's code style, see the :doc:`Macaron Style Guide </pages/de

For API reference, see the :doc:`API Reference </pages/developers_guide/apidoc/index>` page.

-------------------
Writing a New Check
-------------------

As a contributor to Macaron, it is very likely to need to write a new check or modify an existing one at some point. In this
section, we will understand how Macaron checks work and what we need to do to develop one.

+++++++++++++++++
High-level Design
+++++++++++++++++

Before jumping into coding, it is useful to understand how Macaron as a framework works. Macaron is an extensible
framework designed to make writing new supply chain security analyses easy. It provides an interface
that you can leverage to access existing models and abstractions instead of implementing everything from scratch. For
instance, many security checks require to traverse through the code in GitHub Actions configurations. Normally,
you would need to find the right repository and commit, clone it, find the workflows, and parse them. With Macaron,
you don't need to do any of that and can simply write your security check by using the parsed shell scripts that are
triggered in the CI.

Another important aspect of our design is that all the check results are automatically mapped and stored in a local database.
By performing this mapping, we make it possible to enforce flexible policies on the results of the checks. While storing
the check results to the database happens automatically by Macaron's backend, the developer needs to add a brief specification
to make that possible as we will see later.

+++++++++++++++++++
The Check Interface
+++++++++++++++++++

Each check needs to be implemented as a Python class in a Python module under ``src/macaron/slsa_analyzer/checks``.
A check class should subclass the ``BaseCheck`` class in :ref:`base_check module <pages/developers_guide/apidoc/macaron\.slsa_analyzer\.checks:macaron.slsa\\_analyzer.checks.base\\_check module>`.

You need to set the name, description, and other details of your new check in the ``__init__`` method. After implementing
the initializer, you need to implement the ``run_check`` abstract method. This method provides the context object
:ref:`AnalyzeContext <pages/developers_guide/apidoc/macaron\.slsa_analyzer:macaron.slsa\\_analyzer.analyze\\_context module>`, which contains various
intermediate representations and models. The ``dynamic_data`` property would be particularly useful as it contains
data about the CI service, artifact registry, and build tool used for building the software component.

``component`` is another useful attribute in the :ref:`AnalyzeContext <pages/developers_guide/apidoc/macaron\.slsa_analyzer:macaron.slsa\\_analyzer.analyze\\_context module>` object
that you should know about. This attribute contains the information about a software component, such
as it's corresponding ``repository`` and ``dependencies``. Note that ``component`` will also be stored into the database and its attributes
such as ``repository`` are established as database relationships. You can see the existing tables and their
relationships in our :ref:`data model <pages/developers_guide/apidoc/macaron.database:macaron.database.table\\_definitions module>`.

Once you implement the logic of your check in the ``run_check`` method, you need to add a class to help
Macaron handle your check's output:

* Add a class that subclasses ``CheckFacts`` to map your outputs to a table in the database. The class name should follow the ``<MyCheck>Facts`` pattern.
* Specify the table name in the ``__tablename__ = "_my_check"`` class variable. Note that the table name should start with ``_`` and it should not have been used by other checks.
* Add the ``id`` column as the primary key where the foreign key is ``_check_facts.id``.
* Add columns for the check outputs that you would like to store into the database. If a column needs to appear as a justification in the HTML/JSON report, pass ``info={"justification": JustificationType.<TEXT or HREF>}`` to the column mapper.
* Add ``__mapper_args__`` class variable and set ``"polymorphic_identity"`` key to the table name.

Next, you need to create a ``result_tables`` list and append check facts as part of the ``run_check`` implementation.
You should also specify a :ref:`Confidence <pages/developers_guide/apidoc/macaron\.slsa_analyzer\.checks:macaron.slsa\\_analyzer.checks.check\\_result module>`
score choosing one of the ``Confidence`` enum values, e.g., ``Confidence.HIGH`` and pass it via keyword
argument ``confidence``. You should choose a suitable confidence score based on the accuracy
of your check analysis.

.. code-block:: python

result_tables.append(MyCheckFacts(col_foo=foo, col_bar=bar, confidence=Confidence.HIGH))

This list as well as the check result status should be stored in a :ref:`CheckResultData <pages/developers_guide/apidoc/macaron\.slsa_analyzer\.checks:macaron.slsa\\_analyzer.checks.check\\_result module>`
object and returned by ``run_check``.

Finally, you need to register your check by adding it to the :ref:`registry module <pages/developers_guide/apidoc/macaron\.slsa_analyzer:macaron.slsa\\_analyzer.registry module>`:

.. code-block:: python

registry.register(MyCheck())

And of course, make sure to add tests for you check by adding a module under ``tests/slsa_analyzer/checks/``.

+++++++
Example
+++++++

In this example, we show how to add a check determine if a software component has a source-code repository.
Feel free to explore other existing checks under ``src/macaron/slsa_analyzer/checks`` for more examples.

1. First create a module called ``repo_check.py`` under ``src/macaron/slsa_analyzer/checks``.

2. Add a class and specify the columns that you want to store for the check outputs to the database.

.. code-block:: python

# Add this line at the top of the file to create the logger object if you plan to use it.
logger: logging.Logger = logging.getLogger(__name__)


class RepoCheckFacts(CheckFacts):
"""The ORM mapping for justifications in the check repository check."""

__tablename__ = "_repo_check"

#: The primary key.
id: Mapped[int] = mapped_column(ForeignKey("_check_facts.id"), primary_key=True)

#: The Git repository path.
git_repo: Mapped[str] = mapped_column(String, nullable=True, info={"justification": JustificationType.HREF})

__mapper_args__ = {
"polymorphic_identity": "__repo_check",
}

3. Add a class for your check, provide the check details in the initializer method, and implement the logic of the check in ``run_check``.

.. code-block:: python

class RepoCheck(BaseCheck):
"""This Check checks whether the target software component has a source-code repository."""

def __init__(self) -> None:
"""Initialize instance."""
check_id = "mcn_repo_exists_1"
description = "Check whether the target software component has a source-code repository."
depends_on: list[tuple[str, CheckResultType]] = [] # This check doesn't depend on any other checks.
eval_reqs = [
ReqName.VCS
] # Choose a SLSA requirement that roughly matches this check from the ReqName enum class.
super().__init__(check_id=check_id, description=description, depends_on=depends_on, eval_reqs=eval_reqs)

def run_check(self, ctx: AnalyzeContext) -> CheckResultData:
"""Implement the check in this method.

Parameters
----------
ctx : AnalyzeContext
The object containing processed data for the target software component.

Returns
-------
CheckResultData
The result of the check.
"""
if not ctx.component.repository:
logger.info("Unable to find a Git repository for %s", ctx.component.purl)
# We do not store any results in the database if a check fails. So, just leave result_tables empty.
return CheckResultData(result_tables=[], result_type=CheckResultType.FAILED)

return CheckResultData(
result_tables=[RepoCheckFacts(git_repo=ctx.component.repository.remote_path, confidence=Confidence.HIGH)],
result_type=CheckResultType.PASSED,
)

4. Register your check.

.. code-block:: python

registry.register(RepoCheck())


Finally, you can add tests for you check by adding ``tests/slsa_analyzer/checks/test_repo_check.py`` module. Macaron
uses `pytest <https://docs.pytest.org>`_ and `hypothesis <https://hypothesis.readthedocs.io>`_ for testing. Take a look
at other tests for inspiration!

.. toctree::
:maxdepth: 1

Expand Down
104 changes: 45 additions & 59 deletions src/macaron/database/table_definitions.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright (c) 2023 - 2023, Oracle and/or its affiliates. All rights reserved.
# Copyright (c) 2023 - 2024, Oracle and/or its affiliates. All rights reserved.
# Licensed under the Universal Permissive License v 1.0 as shown at https://oss.oracle.com/licenses/upl/.

"""
Expand All @@ -10,7 +10,6 @@

For table associated with a check see the check module.
"""
import hashlib
import logging
import os
import string
Expand All @@ -19,14 +18,23 @@
from typing import Any, Self

from packageurl import PackageURL
from sqlalchemy import Boolean, Column, Enum, ForeignKey, Integer, String, Table, UniqueConstraint
from sqlalchemy import (
Boolean,
CheckConstraint,
Column,
Enum,
Float,
ForeignKey,
Integer,
String,
Table,
UniqueConstraint,
)
from sqlalchemy.orm import Mapped, mapped_column, relationship

from macaron.database.database_manager import ORMBase
from macaron.database.rfc3339_datetime import RFC3339DateTime
from macaron.errors import CUEExpectationError, CUERuntimeError, InvalidPURLError
from macaron.slsa_analyzer.provenance.expectations.cue import cue_validator
from macaron.slsa_analyzer.provenance.expectations.expectation import Expectation
from macaron.errors import InvalidPURLError
from macaron.slsa_analyzer.slsa_req import ReqName

logger: logging.Logger = logging.getLogger(__name__)
Expand Down Expand Up @@ -415,6 +423,16 @@ class CheckFacts(ORMBase):
#: The primary key.
id: Mapped[int] = mapped_column(Integer, primary_key=True, autoincrement=True) # noqa: A003

#: The confidence score to estimate the accuracy of the check fact. This value should be in [0.0, 1.0] with
#: a lower value depicting a lower confidence. Because some analyses used in checks may use
#: heuristics, the results can be inaccurate in certain cases.
#: We use the confidence score to enable the check designer to assign a confidence estimate.
#: This confidence is stored in the database to be used by the policy. This confidence score is
#: also used to decide which evidence should be shown to the user in the HTML/JSON report.
confidence: Mapped[float] = mapped_column(
Float, CheckConstraint("confidence>=0.0 AND confidence<=1.0"), nullable=False
)

#: The foreign key to the software component.
component_id: Mapped[int] = mapped_column(Integer, ForeignKey("_component.id"), nullable=False)

Expand All @@ -430,68 +448,36 @@ class CheckFacts(ORMBase):
#: A many-to-one relationship with check results.
checkresult: Mapped["MappedCheckResult"] = relationship(back_populates="checkfacts")

#: The polymorphic inheritance configuration.
__mapper_args__ = {
"polymorphic_identity": "CheckFacts",
"polymorphic_on": "check_type",
}

def __lt__(self, other: Self) -> bool:
"""Compare two check facts using their confidence values.

class CUEExpectation(Expectation, CheckFacts):
"""ORM Class for an expectation."""
This comparison function is intended to be used by a heapq, which is a Min-Heap data structure.
The root element in a heapq is the minimum element in the queue and each `confidence` value is in [0, 1].
Therefore, we need reverse the comparison function to make sure the fact with highest confidence is stored
in the root element. This implementation compares `1 - confidence` to return True if the confidence of
`fact_a` is greater than the confidence of `fact_b`.

# TODO: provenance content check should store the expectation, its evaluation result,
# and which PROVENANCE it was applied to rather than only linking to the repository.
.. code-block:: pycon

__tablename__ = "_expectation"
>>> fact_a = CheckFacts()
>>> fact_b = CheckFacts()
>>> fact_a.confidence = 0.2
>>> fact_b.confidence = 0.7
>>> fact_b < fact_a
True

#: The primary key, which is also a foreign key to the base check table.
id: Mapped[int] = mapped_column(ForeignKey("_check_facts.id"), primary_key=True) # noqa: A003
Return
------
bool
"""
return (1 - self.confidence) < (1 - other.confidence)

#: The polymorphic inheritance configuration.
__mapper_args__ = {
"polymorphic_identity": "_expectation",
"polymorphic_identity": "CheckFacts",
"polymorphic_on": "check_type",
}

@classmethod
def make_expectation(cls, expectation_path: str) -> Self | None:
"""Construct a CUE expectation from a CUE file.

Note: we require the CUE expectation file to have a "target" field.

Parameters
----------
expectation_path: str
The path to the expectation file.

Returns
-------
Self
The instantiated expectation object.
"""
logger.info("Generating an expectation from file %s", expectation_path)
expectation: CUEExpectation = CUEExpectation(
description="CUE expectation",
path=expectation_path,
target="",
expectation_type="CUE",
)

try:
with open(expectation_path, encoding="utf-8") as expectation_file:
expectation.text = expectation_file.read()
expectation.sha = str(hashlib.sha256(expectation.text.encode("utf-8")).hexdigest())
expectation.target = cue_validator.get_target(expectation.text)
expectation._validator = ( # pylint: disable=protected-access
lambda provenance: cue_validator.validate_expectation(expectation.text, provenance)
)
except (OSError, CUERuntimeError, CUEExpectationError) as error:
logger.error("CUE expectation error: %s", error)
return None

# TODO remove type ignore once mypy adds support for Self.
return expectation # type: ignore


class Provenance(ORMBase):
"""ORM class for a provenance document."""
Expand Down
6 changes: 4 additions & 2 deletions src/macaron/policy_engine/souffle_code_generator.py
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
# Copyright (c) 2023 - 2023, Oracle and/or its affiliates. All rights reserved.
# Copyright (c) 2023 - 2024, Oracle and/or its affiliates. All rights reserved.
# Licensed under the Universal Permissive License v 1.0 as shown at https://oss.oracle.com/licenses/upl/.

"""Generate souffle datalog for policy prelude."""

import logging
import os

from sqlalchemy import Column, MetaData, Table
from sqlalchemy import Column, Float, MetaData, Table
from sqlalchemy.sql.sqltypes import Boolean, Integer, String, Text

logger: logging.Logger = logging.getLogger(__name__)
Expand Down Expand Up @@ -81,6 +81,8 @@ def column_to_souffle_type(column: Column) -> str:
souffle_type = "symbol"
elif isinstance(sql_type, Integer):
souffle_type = "number"
elif isinstance(sql_type, Float):
souffle_type = "number"
elif isinstance(sql_type, Text):
souffle_type = "symbol"
elif isinstance(sql_type, Boolean):
Expand Down
Loading