oracle · behnazh-w · Feb 15, 2024 · Jan 31, 2024 · Feb 1, 2024 · Feb 1, 2024
@@ -25,75 +25,98 @@ High-level Design
 Before jumping into coding, it is useful to understand how Macaron as a framework works. Macaron is an extensible
 framework designed to make writing new supply chain security analyses easy. It provides an interface
 that you can leverage to access existing models and abstractions instead of implementing everything from scratch. For
-instance, many security checks require to traverse through the code in GitHub Actions configurations. Normally,
+instance, many security checks require traversing through the code in GitHub Actions configurations. Normally,
 you would need to find the right repository and commit, clone it, find the workflows, and parse them. With Macaron,
 you don't need to do any of that and can simply write your security check by using the parsed shell scripts that are
 triggered in the CI.
 
 Another important aspect of our design is that all the check results are automatically mapped and stored in a local database.
-By performing this mapping, we make it possible to enforce flexible policies on the results of the checks. While storing
-the check results to the database happens automatically by Macaron's backend, the developer needs to add a brief specification
+By performing this mapping, we make it possible to enforce use case-specific policies on the results of the checks. While storing
+the check results in the database happens automatically in Macaron's backend, the developer needs to add a brief specification
 to make that possible as we will see later.
 
+Once you get familiar with writing a basic check, you can explore the check dependency feature in Macaron. The checks
+in our framework can be customized to only run if another check has run and returned a specific
+:class:`result type <macaron.slsa_analyzer.checks.check_result.CheckResultType>`. This feature can be used when some checks
+can be ordered and have a parent-child relationship, i.e., one check implements a weaker or stronger version of a
+security property in a parent check. Therefore, it might make sense to skip running the check and report a
+:class:`result type <macaron.slsa_analyzer.checks.check_result.CheckResultType>` based on the result of the parent check.
+
 +++++++++++++++++++
 The Check Interface
 +++++++++++++++++++
 
 Each check needs to be implemented as a Python class in a Python module under ``src/macaron/slsa_analyzer/checks``.
-A check class should subclass the ``BaseCheck`` class in :ref:`base_check module <pages/developers_guide/apidoc/macaron\.slsa_analyzer\.checks:macaron.slsa\\_analyzer.checks.base\\_check module>`.
-
-You need to set the name, description, and other details of your new check in the ``__init__`` method. After implementing
-the initializer, you need to implement the ``run_check`` abstract method. This method provides the context object
-:ref:`AnalyzeContext <pages/developers_guide/apidoc/macaron\.slsa_analyzer:macaron.slsa\\_analyzer.analyze\\_context module>`, which contains various
-intermediate representations and models. The ``dynamic_data`` property would be particularly useful as it contains
-data about the CI service, artifact registry, and build tool used for building the software component.
-
-``component`` is another useful attribute in the :ref:`AnalyzeContext <pages/developers_guide/apidoc/macaron\.slsa_analyzer:macaron.slsa\\_analyzer.analyze\\_context module>` object
-that you should know about. This attribute contains the information about a software component, such
-as it's corresponding ``repository`` and ``dependencies``. Note that ``component`` will also be stored into the database and its attributes
-such as ``repository`` are established as database relationships. You can see the existing tables and their
-relationships in our :ref:`data model <pages/developers_guide/apidoc/macaron.database:macaron.database.table\\_definitions module>`.
-
-Once you implement the logic of your check in the ``run_check`` method, you need to add a class to help
-Macaron handle your check's output:
+A check class should subclass the :class:`BaseCheck <macaron.slsa_analyzer.checks.base_check.BaseCheck>` class.
 
-   * Add a class that subclasses ``CheckFacts`` to map your outputs to a table in the database. The class name should follow the ``<MyCheck>Facts`` pattern.
-   * Specify the table name in the ``__tablename__ = "_my_check"`` class variable. Note that the table name should start with ``_`` and it should not have been used by other checks.
-   * Add the ``id`` column as the primary key where the foreign key is ``_check_facts.id``.
-   * Add columns for the check outputs that you would like to store into the database. If a column needs to appear as a justification in the HTML/JSON report, pass ``info={"justification": JustificationType.<TEXT or HREF>}`` to the column mapper.
-   * Add ``__mapper_args__`` class variable and set ``"polymorphic_identity"`` key to the table name.
+The main logic of a check should be implemented in the :func:`run_check <macaron.slsa_analyzer.checks.base_check.BaseCheck.run_check>` abstract method. It is important to understand the input
+parameters and output objects computed by this method.
 
-Next, you need to create a ``result_tables`` list and append check facts as part of the ``run_check`` implementation.
-You should also specify a :ref:`Confidence <pages/developers_guide/apidoc/macaron\.slsa_analyzer\.checks:macaron.slsa\\_analyzer.checks.check\\_result module>`
-score choosing one of  the ``Confidence`` enum values, e.g., ``Confidence.HIGH`` and pass it via keyword
-argument ``confidence``. You should choose a suitable confidence score based on the accuracy
-of your check analysis.
+.. code-block: python
+    def run_check(self, ctx: AnalyzeContext) -> CheckResultData:
 
-.. code-block:: python
+''''''''''''''''
+Input Parameters
+''''''''''''''''
 
-   result_tables.append(MyCheckFacts(col_foo=foo, col_bar=bar, confidence=Confidence.HIGH))
+The :func:`run_check <macaron.slsa_analyzer.checks.base_check.BaseCheck.run_check>` method is a callback called by our checker framework. The framework pre-computes a context object,
+:class:`ctx: AnalyzeContext <macaron.slsa_analyzer.analyze_context.AnalyzeContext>` and makes it available as the input
+parameter to the function. The ``ctx`` object contains various intermediate representations and models as the input parameter.
+Most likely, you will need to use the following properties:
 
-This list as well as the check result status should be stored in a :ref:`CheckResultData <pages/developers_guide/apidoc/macaron\.slsa_analyzer\.checks:macaron.slsa\\_analyzer.checks.check\\_result module>`
-object and returned by ``run_check``.
+* :attr:`component <macaron.slsa_analyzer.analyze_context.AnalyzeContext.component>`
+* :attr:`dynamic_data <macaron.slsa_analyzer.analyze_context.AnalyzeContext.dynamic_data>`
 
-Finally, you need to register your check by adding it to the :ref:`registry module <pages/developers_guide/apidoc/macaron\.slsa_analyzer:macaron.slsa\\_analyzer.registry module>`:
+The :attr:`component <macaron.slsa_analyzer.analyze_context.AnalyzeContext.component>`
+object acts as a representation of a software component and contains data, such as it's
+corresponding :class:`Repository <macaron.database.table_definitions.Repository>` and
+:data:`dependencies <macaron.database.table_definitions.components_association_table>`.
+Note that :attr:`component <macaron.slsa_analyzer.analyze_context.AnalyzeContext.component>` will also be stored
+in the database and its attributes, such as :attr:`repository <macaron.database.table_definitions.Component.repository>`
+are established as database relationships. You can see the existing tables and their relationships
+in our :mod:`data model <macaron.database.table_definitions>`.
 
-.. code-block:: python
+The :attr:`dynamic_data <macaron.slsa_analyzer.analyze_context.AnalyzeContext.dynamic_data>` property would be particularly useful as it contains
+data about the CI service, artifact registry, and build tool used for building the software component.
+Note that this object is a shared state among checks. If a check runs before another check, it can
+make changes to this object, which will be accessible to the checks run subsequently.
 
-   registry.register(MyCheck())
+''''''
+Output
+''''''
 
-And of course, make sure to add tests for you check by adding a module under ``tests/slsa_analyzer/checks/``.
+The :func:`run_check <macaron.slsa_analyzer.checks.base_check.BaseCheck.run_check>` method returns a :class:`CheckResultData <macaron.slsa_analyzer.checks.check_result.CheckResultData>` object.
+This object consists of :attr:`result_tables <macaron.slsa_analyzer.checks.check_result.CheckResultData.result_tables>` and
+:attr:`result_type <macaron.slsa_analyzer.checks.check_result.CheckResultData.result_type>`.
+The :attr:`result_tables <macaron.slsa_analyzer.checks.check_result.CheckResultData.result_tables>` object is the list of facts generated from the check. The :attr:`result_type <macaron.slsa_analyzer.checks.check_result.CheckResultData.result_type>`
+value shows the final result type of the check.
 
 +++++++
 Example
 +++++++
 
-In this example, we show how to add a check determine if a software component has a source-code repository.
+In this example, we show how to add a check to determine if a software component has a source-code repository.
+Note that this is a simple example to just demonstrate how to add a check from scratch.
 Feel free to explore other existing checks under ``src/macaron/slsa_analyzer/checks`` for more examples.
 
-1. First create a module called ``repo_check.py`` under ``src/macaron/slsa_analyzer/checks``.
+As discussed earlier, each check needs to be implemented as a Python class in a Python module under ``src/macaron/slsa_analyzer/checks``.
+A check class should subclass the :class:`BaseCheck <macaron.slsa_analyzer.checks.base_check.BaseCheck>` class.
+
+'''''''''''''''
+Create a module
+'''''''''''''''
+First create a module called ``repo_check.py`` under ``src/macaron/slsa_analyzer/checks``.
+
 
-2. Add a class and specify the columns that you want to store for the check outputs to the database.
+''''''''''''''''''''''''''''
+Add a class for the database
+''''''''''''''''''''''''''''
+
+* Add a class that subclasses :class:`CheckFacts <macaron.database.table_definitions.CheckFacts>` to map your outputs to a table in the database. The class name should follow the ``<MyCheck>Facts`` pattern.
+* Specify the table name in the ``__tablename__`` class variable. Note that the table name should start with ``_`` and it should not have been used by other checks.
+* Add the ``id`` column as the primary key where the foreign key is ``_check_facts.id``.
+* Add columns for the check outputs that you would like to store in the database. If a column needs to appear as a justification in the HTML/JSON report, pass ``info={"justification": JustificationType.<TEXT or HREF>}`` to the column mapper.
+* Add ``__mapper_args__`` class variable and set ``"polymorphic_identity"`` key to the table name.
 
 .. code-block:: python
 
@@ -113,10 +136,25 @@ Feel free to explore other existing checks under ``src/macaron/slsa_analyzer/che
        git_repo: Mapped[str] = mapped_column(String, nullable=True, info={"justification": JustificationType.HREF})
 
        __mapper_args__ = {
-           "polymorphic_identity": "__repo_check",
+           "polymorphic_identity": "_repo_check",
        }
 
-3. Add a class for your check, provide the check details in the initializer method, and implement the logic of the check in ``run_check``.
+'''''''''''''''''''
+Add the check class
+'''''''''''''''''''
+
+Add a class for your check that subclasses :class:`BaseCheck <macaron.slsa_analyzer.checks.base_check.BaseCheck>`,
+provide the check details in the initializer method, and implement the logic of the check in
+:func:`run_check <macaron.slsa_analyzer.checks.base_check.BaseCheck.run_check>`.
+
+A ``check_id`` should meet the following requirements:
+
+    - The general format: ``mcn_<name>_<digits>``
+    - In ``name``, only lowercase alphabetical letters are allowed. If ``name`` contains multiple \
+        words, they must be separated by underscores.
+
+
+You can set the ``depends_on`` attribute in the initializer method to declare such dependencies. In this example, we leave this list empty.
 
 .. code-block:: python
 
@@ -156,13 +194,28 @@ Feel free to explore other existing checks under ``src/macaron/slsa_analyzer/che
                result_type=CheckResultType.PASSED,
            )
 
-4. Register your check.
+As you can see, the result of the check is returned via the :class:`CheckResultData <macaron.slsa_analyzer.checks.check_result.CheckResultData>` object.
+You should specify a :class:`Confidence <macaron.slsa_analyzer.checks.check_result.Confidence>`
+score choosing one of the :class:`Confidence <macaron.slsa_analyzer.checks.check_result.Confidence>` enum values,
+e.g., :class:`Confidence.HIGH <macaron.slsa_analyzer.checks.check_result.Confidence.HIGH>` and pass it via keyword
+argument :attr:`confidence <macaron.database.table_definitions.CheckFacts.confidence>`. You should choose a suitable
+confidence score based on the accuracy of your check analysis.
+
+'''''''''''''''''''
+Register your check
+'''''''''''''''''''
+
+Finally, you need to register your check by adding it to the :mod:`registry module <macaron.slsa_analyzer.registry>` at the end of your check module:
 
 .. code-block:: python
 
    registry.register(RepoCheck())
 
 
+'''''''''''''''
+Test your check
+'''''''''''''''
+
 Finally, you can add tests for you check by adding ``tests/slsa_analyzer/checks/test_repo_check.py`` module. Macaron
 uses `pytest <https://docs.pytest.org>`_ and `hypothesis <https://hypothesis.readthedocs.io>`_ for testing. Take a look
 at other tests for inspiration!

@@ -15,7 +15,7 @@
 import string
 from datetime import datetime
 from pathlib import Path
-from typing import Any, Self
+from typing import Any
 
 from packageurl import PackageURL
 from sqlalchemy import (
@@ -448,30 +448,6 @@ class CheckFacts(ORMBase):
     #: A many-to-one relationship with check results.
     checkresult: Mapped["MappedCheckResult"] = relationship(back_populates="checkfacts")
 
-    def __lt__(self, other: Self) -> bool:
-        """Compare two check facts using their confidence values.
-
-        This comparison function is intended to be used by a heapq, which is a Min-Heap data structure.
-        The root element in a heapq is the minimum element in the queue and each `confidence` value is in [0, 1].
-        Therefore, we need reverse the comparison function to make sure the fact with highest confidence is stored
-        in the root element. This implementation compares `1 - confidence` to return True if the confidence of
-        `fact_a` is greater than the confidence of `fact_b`.
-
-        .. code-block:: pycon
-
-            >>> fact_a = CheckFacts()
-            >>> fact_b = CheckFacts()
-            >>> fact_a.confidence = 0.2
-            >>> fact_b.confidence = 0.7
-            >>> fact_b < fact_a
-            True
-
-        Return
-        ------
-        bool
-        """
-        return (1 - self.confidence) < (1 - other.confidence)
-
     #: The polymorphic inheritance configuration.
     __mapper_args__ = {
         "polymorphic_identity": "CheckFacts",

@@ -64,7 +64,9 @@ def __init__(
         output_dir : str
             The output dir.
         """
-        self.component = component
+        # The component attribute should be accessed via the `component` property.
+        self._component = component
+
         self.ctx_data: dict[ReqName, SLSAReqStatus] = create_requirement_status_dict()
 
         self.slsa_level = SLSALevels.LEVEL0
@@ -92,6 +94,20 @@ def __init__(
             expectation=None,
         )
 
+    @property
+    def component(self) -> Component:
+        """Return the object associated with a target software component.
+
+        This property contains the information about a software component, such as it's
+        corresponding repository and dependencies.
+
+
+        Returns
+        -------
+        Component
+        """
+        return self._component
+
     @property
     def dynamic_data(self) -> ChecksOutputs:
         """Return the `dynamic_data` object that contains various intermediate representations.
@@ -104,8 +120,8 @@ def dynamic_data(self) -> ChecksOutputs:
         are that what you try to implement is already implemented and the results are available in the
         `dynamic_data` object.
 
-        Return
-        ------
+        Returns
+        -------
         ChecksOutputs
         """
         return self._dynamic_data

@@ -4,7 +4,6 @@
 """This module contains the CheckResult class for storing the result of a check."""
 from dataclasses import dataclass
 from enum import Enum
-from heapq import heappush
 from typing import TypedDict
 
 from macaron.database.table_definitions import CheckFacts
@@ -78,19 +77,18 @@ class CheckResultData:
     @property
     def justification_report(self) -> list[tuple[Confidence, list]]:
         """
-        Return the list of justifications for the check result generated from the tables in the database.
+        Return a sorted list of justifications based on confidence scores in descending order.
 
-        Note that the elements in the justification will be rendered different based on their types:
+        These justifications are generated from the tables in the database.
+        Note that the elements in the justification will be rendered differently based on their types:
 
         * a :class:`JustificationType.TEXT` element is displayed in plain text in the HTML report.
         * a :class:`JustificationType.HREF` element is rendered as a hyperlink in the HTML report.
 
-        Return
-        ------
+        Returns
+        -------
         list[tuple[Confidence, list]]
         """
-        # Interestingly, mypy cannot infer the type of elements later at `heappush` if we specify
-        # list[tuple[Confidence, list]]. But still, it insists on specifying the `list` type here.
         justification_list: list = []
         for result in self.result_tables:
             # The HTML report generator requires the justification elements that need to be rendered in HTML
@@ -112,15 +110,15 @@ def justification_report(self) -> list[tuple[Confidence, list]]:
             if dict_elements:
                 list_elements.append(dict_elements)
 
-            # Use heapq to always keep the justification with the highest confidence score in the first element.
             if list_elements:
-                heappush(justification_list, (result.confidence, list_elements))
+                justification_list.append((result.confidence, list_elements))
 
         # If there are no justifications available, return a default "Not Available" one.
         if not justification_list:
             return [(Confidence.HIGH, ["Not Available."])]
 
-        return justification_list
+        # Sort the justification list based on the confidence score in descending order.
+        return sorted(justification_list, key=lambda item: item[0], reverse=True)
 
 
 @dataclass(frozen=True)
@@ -147,7 +145,7 @@ def get_summary(self) -> dict:
             "check_id": self.check.check_id,
             "check_description": self.check.check_description,
             "slsa_requirements": [str(BUILD_REQ_DESC.get(req)) for req in self.check.eval_reqs],
-            # The justification report is stored in a heapq where the first element has the highest confidence score.
+            # The justification report is sorted and the first element has the highest confidence score.
             "justification": self.result.justification_report[0][1],
             "result_tables": self.result.result_tables,
             "result_type": self.result.result_type,