Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
139 changes: 96 additions & 43 deletions docs/source/pages/developers_guide/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -25,75 +25,98 @@ High-level Design
Before jumping into coding, it is useful to understand how Macaron as a framework works. Macaron is an extensible
framework designed to make writing new supply chain security analyses easy. It provides an interface
that you can leverage to access existing models and abstractions instead of implementing everything from scratch. For
instance, many security checks require to traverse through the code in GitHub Actions configurations. Normally,
instance, many security checks require traversing through the code in GitHub Actions configurations. Normally,
you would need to find the right repository and commit, clone it, find the workflows, and parse them. With Macaron,
you don't need to do any of that and can simply write your security check by using the parsed shell scripts that are
triggered in the CI.

Another important aspect of our design is that all the check results are automatically mapped and stored in a local database.
By performing this mapping, we make it possible to enforce flexible policies on the results of the checks. While storing
the check results to the database happens automatically by Macaron's backend, the developer needs to add a brief specification
By performing this mapping, we make it possible to enforce use case-specific policies on the results of the checks. While storing
the check results in the database happens automatically in Macaron's backend, the developer needs to add a brief specification
to make that possible as we will see later.

Once you get familiar with writing a basic check, you can explore the check dependency feature in Macaron. The checks
in our framework can be customized to only run if another check has run and returned a specific
:class:`result type <macaron.slsa_analyzer.checks.check_result.CheckResultType>`. This feature can be used when some checks
can be ordered and have a parent-child relationship, i.e., one check implements a weaker or stronger version of a
security property in a parent check. Therefore, it might make sense to skip running the check and report a
:class:`result type <macaron.slsa_analyzer.checks.check_result.CheckResultType>` based on the result of the parent check.

+++++++++++++++++++
The Check Interface
+++++++++++++++++++

Each check needs to be implemented as a Python class in a Python module under ``src/macaron/slsa_analyzer/checks``.
A check class should subclass the ``BaseCheck`` class in :ref:`base_check module <pages/developers_guide/apidoc/macaron\.slsa_analyzer\.checks:macaron.slsa\\_analyzer.checks.base\\_check module>`.

You need to set the name, description, and other details of your new check in the ``__init__`` method. After implementing
the initializer, you need to implement the ``run_check`` abstract method. This method provides the context object
:ref:`AnalyzeContext <pages/developers_guide/apidoc/macaron\.slsa_analyzer:macaron.slsa\\_analyzer.analyze\\_context module>`, which contains various
intermediate representations and models. The ``dynamic_data`` property would be particularly useful as it contains
data about the CI service, artifact registry, and build tool used for building the software component.

``component`` is another useful attribute in the :ref:`AnalyzeContext <pages/developers_guide/apidoc/macaron\.slsa_analyzer:macaron.slsa\\_analyzer.analyze\\_context module>` object
that you should know about. This attribute contains the information about a software component, such
as it's corresponding ``repository`` and ``dependencies``. Note that ``component`` will also be stored into the database and its attributes
such as ``repository`` are established as database relationships. You can see the existing tables and their
relationships in our :ref:`data model <pages/developers_guide/apidoc/macaron.database:macaron.database.table\\_definitions module>`.

Once you implement the logic of your check in the ``run_check`` method, you need to add a class to help
Macaron handle your check's output:
A check class should subclass the :class:`BaseCheck <macaron.slsa_analyzer.checks.base_check.BaseCheck>` class.

* Add a class that subclasses ``CheckFacts`` to map your outputs to a table in the database. The class name should follow the ``<MyCheck>Facts`` pattern.
* Specify the table name in the ``__tablename__ = "_my_check"`` class variable. Note that the table name should start with ``_`` and it should not have been used by other checks.
* Add the ``id`` column as the primary key where the foreign key is ``_check_facts.id``.
* Add columns for the check outputs that you would like to store into the database. If a column needs to appear as a justification in the HTML/JSON report, pass ``info={"justification": JustificationType.<TEXT or HREF>}`` to the column mapper.
* Add ``__mapper_args__`` class variable and set ``"polymorphic_identity"`` key to the table name.
The main logic of a check should be implemented in the :func:`run_check <macaron.slsa_analyzer.checks.base_check.BaseCheck.run_check>` abstract method. It is important to understand the input
parameters and output objects computed by this method.

Next, you need to create a ``result_tables`` list and append check facts as part of the ``run_check`` implementation.
You should also specify a :ref:`Confidence <pages/developers_guide/apidoc/macaron\.slsa_analyzer\.checks:macaron.slsa\\_analyzer.checks.check\\_result module>`
score choosing one of the ``Confidence`` enum values, e.g., ``Confidence.HIGH`` and pass it via keyword
argument ``confidence``. You should choose a suitable confidence score based on the accuracy
of your check analysis.
.. code-block: python
def run_check(self, ctx: AnalyzeContext) -> CheckResultData:

.. code-block:: python
''''''''''''''''
Input Parameters
''''''''''''''''

result_tables.append(MyCheckFacts(col_foo=foo, col_bar=bar, confidence=Confidence.HIGH))
The :func:`run_check <macaron.slsa_analyzer.checks.base_check.BaseCheck.run_check>` method is a callback called by our checker framework. The framework pre-computes a context object,
:class:`ctx: AnalyzeContext <macaron.slsa_analyzer.analyze_context.AnalyzeContext>` and makes it available as the input
parameter to the function. The ``ctx`` object contains various intermediate representations and models as the input parameter.
Most likely, you will need to use the following properties:

This list as well as the check result status should be stored in a :ref:`CheckResultData <pages/developers_guide/apidoc/macaron\.slsa_analyzer\.checks:macaron.slsa\\_analyzer.checks.check\\_result module>`
object and returned by ``run_check``.
* :attr:`component <macaron.slsa_analyzer.analyze_context.AnalyzeContext.component>`
* :attr:`dynamic_data <macaron.slsa_analyzer.analyze_context.AnalyzeContext.dynamic_data>`

Finally, you need to register your check by adding it to the :ref:`registry module <pages/developers_guide/apidoc/macaron\.slsa_analyzer:macaron.slsa\\_analyzer.registry module>`:
The :attr:`component <macaron.slsa_analyzer.analyze_context.AnalyzeContext.component>`
object acts as a representation of a software component and contains data, such as it's
corresponding :class:`Repository <macaron.database.table_definitions.Repository>` and
:data:`dependencies <macaron.database.table_definitions.components_association_table>`.
Note that :attr:`component <macaron.slsa_analyzer.analyze_context.AnalyzeContext.component>` will also be stored
in the database and its attributes, such as :attr:`repository <macaron.database.table_definitions.Component.repository>`
are established as database relationships. You can see the existing tables and their relationships
in our :mod:`data model <macaron.database.table_definitions>`.

.. code-block:: python
The :attr:`dynamic_data <macaron.slsa_analyzer.analyze_context.AnalyzeContext.dynamic_data>` property would be particularly useful as it contains
data about the CI service, artifact registry, and build tool used for building the software component.
Note that this object is a shared state among checks. If a check runs before another check, it can
make changes to this object, which will be accessible to the checks run subsequently.

registry.register(MyCheck())
''''''
Output
''''''

And of course, make sure to add tests for you check by adding a module under ``tests/slsa_analyzer/checks/``.
The :func:`run_check <macaron.slsa_analyzer.checks.base_check.BaseCheck.run_check>` method returns a :class:`CheckResultData <macaron.slsa_analyzer.checks.check_result.CheckResultData>` object.
This object consists of :attr:`result_tables <macaron.slsa_analyzer.checks.check_result.CheckResultData.result_tables>` and
:attr:`result_type <macaron.slsa_analyzer.checks.check_result.CheckResultData.result_type>`.
The :attr:`result_tables <macaron.slsa_analyzer.checks.check_result.CheckResultData.result_tables>` object is the list of facts generated from the check. The :attr:`result_type <macaron.slsa_analyzer.checks.check_result.CheckResultData.result_type>`
value shows the final result type of the check.

+++++++
Example
+++++++

In this example, we show how to add a check determine if a software component has a source-code repository.
In this example, we show how to add a check to determine if a software component has a source-code repository.
Note that this is a simple example to just demonstrate how to add a check from scratch.
Feel free to explore other existing checks under ``src/macaron/slsa_analyzer/checks`` for more examples.

1. First create a module called ``repo_check.py`` under ``src/macaron/slsa_analyzer/checks``.
As discussed earlier, each check needs to be implemented as a Python class in a Python module under ``src/macaron/slsa_analyzer/checks``.
A check class should subclass the :class:`BaseCheck <macaron.slsa_analyzer.checks.base_check.BaseCheck>` class.

'''''''''''''''
Create a module
'''''''''''''''
First create a module called ``repo_check.py`` under ``src/macaron/slsa_analyzer/checks``.


2. Add a class and specify the columns that you want to store for the check outputs to the database.
''''''''''''''''''''''''''''
Add a class for the database
''''''''''''''''''''''''''''

* Add a class that subclasses :class:`CheckFacts <macaron.database.table_definitions.CheckFacts>` to map your outputs to a table in the database. The class name should follow the ``<MyCheck>Facts`` pattern.
* Specify the table name in the ``__tablename__`` class variable. Note that the table name should start with ``_`` and it should not have been used by other checks.
* Add the ``id`` column as the primary key where the foreign key is ``_check_facts.id``.
* Add columns for the check outputs that you would like to store in the database. If a column needs to appear as a justification in the HTML/JSON report, pass ``info={"justification": JustificationType.<TEXT or HREF>}`` to the column mapper.
* Add ``__mapper_args__`` class variable and set ``"polymorphic_identity"`` key to the table name.

.. code-block:: python

Expand All @@ -113,10 +136,25 @@ Feel free to explore other existing checks under ``src/macaron/slsa_analyzer/che
git_repo: Mapped[str] = mapped_column(String, nullable=True, info={"justification": JustificationType.HREF})

__mapper_args__ = {
"polymorphic_identity": "__repo_check",
"polymorphic_identity": "_repo_check",
}

3. Add a class for your check, provide the check details in the initializer method, and implement the logic of the check in ``run_check``.
'''''''''''''''''''
Add the check class
'''''''''''''''''''

Add a class for your check that subclasses :class:`BaseCheck <macaron.slsa_analyzer.checks.base_check.BaseCheck>`,
provide the check details in the initializer method, and implement the logic of the check in
:func:`run_check <macaron.slsa_analyzer.checks.base_check.BaseCheck.run_check>`.

A ``check_id`` should meet the following requirements:

- The general format: ``mcn_<name>_<digits>``
- In ``name``, only lowercase alphabetical letters are allowed. If ``name`` contains multiple \
words, they must be separated by underscores.


You can set the ``depends_on`` attribute in the initializer method to declare such dependencies. In this example, we leave this list empty.

.. code-block:: python

Expand Down Expand Up @@ -156,13 +194,28 @@ Feel free to explore other existing checks under ``src/macaron/slsa_analyzer/che
result_type=CheckResultType.PASSED,
)

4. Register your check.
As you can see, the result of the check is returned via the :class:`CheckResultData <macaron.slsa_analyzer.checks.check_result.CheckResultData>` object.
You should specify a :class:`Confidence <macaron.slsa_analyzer.checks.check_result.Confidence>`
score choosing one of the :class:`Confidence <macaron.slsa_analyzer.checks.check_result.Confidence>` enum values,
e.g., :class:`Confidence.HIGH <macaron.slsa_analyzer.checks.check_result.Confidence.HIGH>` and pass it via keyword
argument :attr:`confidence <macaron.database.table_definitions.CheckFacts.confidence>`. You should choose a suitable
confidence score based on the accuracy of your check analysis.

'''''''''''''''''''
Register your check
'''''''''''''''''''

Finally, you need to register your check by adding it to the :mod:`registry module <macaron.slsa_analyzer.registry>` at the end of your check module:

.. code-block:: python

registry.register(RepoCheck())


'''''''''''''''
Test your check
'''''''''''''''

Finally, you can add tests for you check by adding ``tests/slsa_analyzer/checks/test_repo_check.py`` module. Macaron
uses `pytest <https://docs.pytest.org>`_ and `hypothesis <https://hypothesis.readthedocs.io>`_ for testing. Take a look
at other tests for inspiration!
Expand Down
26 changes: 1 addition & 25 deletions src/macaron/database/table_definitions.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
import string
from datetime import datetime
from pathlib import Path
from typing import Any, Self
from typing import Any

from packageurl import PackageURL
from sqlalchemy import (
Expand Down Expand Up @@ -448,30 +448,6 @@ class CheckFacts(ORMBase):
#: A many-to-one relationship with check results.
checkresult: Mapped["MappedCheckResult"] = relationship(back_populates="checkfacts")

def __lt__(self, other: Self) -> bool:
"""Compare two check facts using their confidence values.

This comparison function is intended to be used by a heapq, which is a Min-Heap data structure.
The root element in a heapq is the minimum element in the queue and each `confidence` value is in [0, 1].
Therefore, we need reverse the comparison function to make sure the fact with highest confidence is stored
in the root element. This implementation compares `1 - confidence` to return True if the confidence of
`fact_a` is greater than the confidence of `fact_b`.

.. code-block:: pycon

>>> fact_a = CheckFacts()
>>> fact_b = CheckFacts()
>>> fact_a.confidence = 0.2
>>> fact_b.confidence = 0.7
>>> fact_b < fact_a
True

Return
------
bool
"""
return (1 - self.confidence) < (1 - other.confidence)

#: The polymorphic inheritance configuration.
__mapper_args__ = {
"polymorphic_identity": "CheckFacts",
Expand Down
22 changes: 19 additions & 3 deletions src/macaron/slsa_analyzer/analyze_context.py
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,9 @@ def __init__(
output_dir : str
The output dir.
"""
self.component = component
# The component attribute should be accessed via the `component` property.
self._component = component

self.ctx_data: dict[ReqName, SLSAReqStatus] = create_requirement_status_dict()

self.slsa_level = SLSALevels.LEVEL0
Expand Down Expand Up @@ -92,6 +94,20 @@ def __init__(
expectation=None,
)

@property
def component(self) -> Component:
"""Return the object associated with a target software component.

This property contains the information about a software component, such as it's
corresponding repository and dependencies.


Returns
-------
Component
"""
return self._component

@property
def dynamic_data(self) -> ChecksOutputs:
"""Return the `dynamic_data` object that contains various intermediate representations.
Expand All @@ -104,8 +120,8 @@ def dynamic_data(self) -> ChecksOutputs:
are that what you try to implement is already implemented and the results are available in the
`dynamic_data` object.

Return
------
Returns
-------
ChecksOutputs
"""
return self._dynamic_data
Expand Down
20 changes: 9 additions & 11 deletions src/macaron/slsa_analyzer/checks/check_result.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,6 @@
"""This module contains the CheckResult class for storing the result of a check."""
from dataclasses import dataclass
from enum import Enum
from heapq import heappush
from typing import TypedDict

from macaron.database.table_definitions import CheckFacts
Expand Down Expand Up @@ -78,19 +77,18 @@ class CheckResultData:
@property
def justification_report(self) -> list[tuple[Confidence, list]]:
"""
Return the list of justifications for the check result generated from the tables in the database.
Return a sorted list of justifications based on confidence scores in descending order.

Note that the elements in the justification will be rendered different based on their types:
These justifications are generated from the tables in the database.
Note that the elements in the justification will be rendered differently based on their types:

* a :class:`JustificationType.TEXT` element is displayed in plain text in the HTML report.
* a :class:`JustificationType.HREF` element is rendered as a hyperlink in the HTML report.

Return
------
Returns
-------
list[tuple[Confidence, list]]
"""
# Interestingly, mypy cannot infer the type of elements later at `heappush` if we specify
# list[tuple[Confidence, list]]. But still, it insists on specifying the `list` type here.
justification_list: list = []
for result in self.result_tables:
# The HTML report generator requires the justification elements that need to be rendered in HTML
Expand All @@ -112,15 +110,15 @@ def justification_report(self) -> list[tuple[Confidence, list]]:
if dict_elements:
list_elements.append(dict_elements)

# Use heapq to always keep the justification with the highest confidence score in the first element.
if list_elements:
heappush(justification_list, (result.confidence, list_elements))
justification_list.append((result.confidence, list_elements))

# If there are no justifications available, return a default "Not Available" one.
if not justification_list:
return [(Confidence.HIGH, ["Not Available."])]

return justification_list
# Sort the justification list based on the confidence score in descending order.
return sorted(justification_list, key=lambda item: item[0], reverse=True)


@dataclass(frozen=True)
Expand All @@ -147,7 +145,7 @@ def get_summary(self) -> dict:
"check_id": self.check.check_id,
"check_description": self.check.check_description,
"slsa_requirements": [str(BUILD_REQ_DESC.get(req)) for req in self.check.eval_reqs],
# The justification report is stored in a heapq where the first element has the highest confidence score.
# The justification report is sorted and the first element has the highest confidence score.
"justification": self.result.justification_report[0][1],
"result_tables": self.result.result_tables,
"result_type": self.result.result_type,
Expand Down
Loading