Skip to content

Commit d25bf57

Browse files
committed
chore: address PR comments
Signed-off-by: behnazh-w <[email protected]>
1 parent aa93b0c commit d25bf57

File tree

9 files changed

+199
-115
lines changed

9 files changed

+199
-115
lines changed

docs/source/pages/developers_guide/index.rst

Lines changed: 96 additions & 43 deletions
Original file line numberDiff line numberDiff line change
@@ -25,75 +25,98 @@ High-level Design
2525
Before jumping into coding, it is useful to understand how Macaron as a framework works. Macaron is an extensible
2626
framework designed to make writing new supply chain security analyses easy. It provides an interface
2727
that you can leverage to access existing models and abstractions instead of implementing everything from scratch. For
28-
instance, many security checks require to traverse through the code in GitHub Actions configurations. Normally,
28+
instance, many security checks require traversing through the code in GitHub Actions configurations. Normally,
2929
you would need to find the right repository and commit, clone it, find the workflows, and parse them. With Macaron,
3030
you don't need to do any of that and can simply write your security check by using the parsed shell scripts that are
3131
triggered in the CI.
3232

3333
Another important aspect of our design is that all the check results are automatically mapped and stored in a local database.
34-
By performing this mapping, we make it possible to enforce flexible policies on the results of the checks. While storing
35-
the check results to the database happens automatically by Macaron's backend, the developer needs to add a brief specification
34+
By performing this mapping, we make it possible to enforce use case-specific policies on the results of the checks. While storing
35+
the check results in the database happens automatically in Macaron's backend, the developer needs to add a brief specification
3636
to make that possible as we will see later.
3737

38+
Once you get familiar with writing a basic check, you can explore the check dependency feature in Macaron. The checks
39+
in our framework can be customized to only run if another check has run and returned a specific
40+
:class:`result type <macaron.slsa_analyzer.checks.check_result.CheckResultType>`. This feature can be used when some checks
41+
can be ordered and have a parent-child relationship, i.e., one check implements a weaker or stronger version of a
42+
security property in a parent check. Therefore, it might make sense to skip running the check and report a
43+
:class:`result type <macaron.slsa_analyzer.checks.check_result.CheckResultType>` based on the result of the parent check.
44+
3845
+++++++++++++++++++
3946
The Check Interface
4047
+++++++++++++++++++
4148

4249
Each check needs to be implemented as a Python class in a Python module under ``src/macaron/slsa_analyzer/checks``.
43-
A check class should subclass the ``BaseCheck`` class in :ref:`base_check module <pages/developers_guide/apidoc/macaron\.slsa_analyzer\.checks:macaron.slsa\\_analyzer.checks.base\\_check module>`.
44-
45-
You need to set the name, description, and other details of your new check in the ``__init__`` method. After implementing
46-
the initializer, you need to implement the ``run_check`` abstract method. This method provides the context object
47-
:ref:`AnalyzeContext <pages/developers_guide/apidoc/macaron\.slsa_analyzer:macaron.slsa\\_analyzer.analyze\\_context module>`, which contains various
48-
intermediate representations and models. The ``dynamic_data`` property would be particularly useful as it contains
49-
data about the CI service, artifact registry, and build tool used for building the software component.
50-
51-
``component`` is another useful attribute in the :ref:`AnalyzeContext <pages/developers_guide/apidoc/macaron\.slsa_analyzer:macaron.slsa\\_analyzer.analyze\\_context module>` object
52-
that you should know about. This attribute contains the information about a software component, such
53-
as it's corresponding ``repository`` and ``dependencies``. Note that ``component`` will also be stored into the database and its attributes
54-
such as ``repository`` are established as database relationships. You can see the existing tables and their
55-
relationships in our :ref:`data model <pages/developers_guide/apidoc/macaron.database:macaron.database.table\\_definitions module>`.
56-
57-
Once you implement the logic of your check in the ``run_check`` method, you need to add a class to help
58-
Macaron handle your check's output:
50+
A check class should subclass the :class:`BaseCheck <macaron.slsa_analyzer.checks.base_check.BaseCheck>` class.
5951

60-
* Add a class that subclasses ``CheckFacts`` to map your outputs to a table in the database. The class name should follow the ``<MyCheck>Facts`` pattern.
61-
* Specify the table name in the ``__tablename__ = "_my_check"`` class variable. Note that the table name should start with ``_`` and it should not have been used by other checks.
62-
* Add the ``id`` column as the primary key where the foreign key is ``_check_facts.id``.
63-
* Add columns for the check outputs that you would like to store into the database. If a column needs to appear as a justification in the HTML/JSON report, pass ``info={"justification": JustificationType.<TEXT or HREF>}`` to the column mapper.
64-
* Add ``__mapper_args__`` class variable and set ``"polymorphic_identity"`` key to the table name.
52+
The main logic of a check should be implemented in the :func:`run_check <macaron.slsa_analyzer.checks.base_check.BaseCheck.run_check>` abstract method. It is important to understand the input
53+
parameters and output objects computed by this method.
6554

66-
Next, you need to create a ``result_tables`` list and append check facts as part of the ``run_check`` implementation.
67-
You should also specify a :ref:`Confidence <pages/developers_guide/apidoc/macaron\.slsa_analyzer\.checks:macaron.slsa\\_analyzer.checks.check\\_result module>`
68-
score choosing one of the ``Confidence`` enum values, e.g., ``Confidence.HIGH`` and pass it via keyword
69-
argument ``confidence``. You should choose a suitable confidence score based on the accuracy
70-
of your check analysis.
55+
.. code-block: python
56+
def run_check(self, ctx: AnalyzeContext) -> CheckResultData:
7157
72-
.. code-block:: python
58+
''''''''''''''''
59+
Input Parameters
60+
''''''''''''''''
7361

74-
result_tables.append(MyCheckFacts(col_foo=foo, col_bar=bar, confidence=Confidence.HIGH))
62+
The :func:`run_check <macaron.slsa_analyzer.checks.base_check.BaseCheck.run_check>` method is a callback called by our checker framework. The framework pre-computes a context object,
63+
:class:`ctx: AnalyzeContext <macaron.slsa_analyzer.analyze_context.AnalyzeContext>` and makes it available as the input
64+
parameter to the function. The ``ctx`` object contains various intermediate representations and models as the input parameter.
65+
Most likely, you will need to use the following properties:
7566

76-
This list as well as the check result status should be stored in a :ref:`CheckResultData <pages/developers_guide/apidoc/macaron\.slsa_analyzer\.checks:macaron.slsa\\_analyzer.checks.check\\_result module>`
77-
object and returned by ``run_check``.
67+
* :attr:`component <macaron.slsa_analyzer.analyze_context.AnalyzeContext.component>`
68+
* :attr:`dynamic_data <macaron.slsa_analyzer.analyze_context.AnalyzeContext.dynamic_data>`
7869

79-
Finally, you need to register your check by adding it to the :ref:`registry module <pages/developers_guide/apidoc/macaron\.slsa_analyzer:macaron.slsa\\_analyzer.registry module>`:
70+
The :attr:`component <macaron.slsa_analyzer.analyze_context.AnalyzeContext.component>`
71+
object acts as a representation of a software component and contains data, such as it's
72+
corresponding :class:`Repository <macaron.database.table_definitions.Repository>` and
73+
:data:`dependencies <macaron.database.table_definitions.components_association_table>`.
74+
Note that :attr:`component <macaron.slsa_analyzer.analyze_context.AnalyzeContext.component>` will also be stored
75+
in the database and its attributes, such as :attr:`repository <macaron.database.table_definitions.Component.repository>`
76+
are established as database relationships. You can see the existing tables and their relationships
77+
in our :mod:`data model <macaron.database.table_definitions>`.
8078

81-
.. code-block:: python
79+
The :attr:`dynamic_data <macaron.slsa_analyzer.analyze_context.AnalyzeContext.dynamic_data>` property would be particularly useful as it contains
80+
data about the CI service, artifact registry, and build tool used for building the software component.
81+
Note that this object is a shared state among checks. If a check runs before another check, it can
82+
make changes to this object, which will be accessible to the checks run subsequently.
8283

83-
registry.register(MyCheck())
84+
''''''
85+
Output
86+
''''''
8487

85-
And of course, make sure to add tests for you check by adding a module under ``tests/slsa_analyzer/checks/``.
88+
The :func:`run_check <macaron.slsa_analyzer.checks.base_check.BaseCheck.run_check>` method returns a :class:`CheckResultData <macaron.slsa_analyzer.checks.check_result.CheckResultData>` object.
89+
This object consists of :attr:`result_tables <macaron.slsa_analyzer.checks.check_result.CheckResultData.result_tables>` and
90+
:attr:`result_type <macaron.slsa_analyzer.checks.check_result.CheckResultData.result_type>`.
91+
The :attr:`result_tables <macaron.slsa_analyzer.checks.check_result.CheckResultData.result_tables>` object is the list of facts generated from the check. The :attr:`result_type <macaron.slsa_analyzer.checks.check_result.CheckResultData.result_type>`
92+
value shows the final result type of the check.
8693

8794
+++++++
8895
Example
8996
+++++++
9097

91-
In this example, we show how to add a check determine if a software component has a source-code repository.
98+
In this example, we show how to add a check to determine if a software component has a source-code repository.
99+
Note that this is a simple example to just demonstrate how to add a check from scratch.
92100
Feel free to explore other existing checks under ``src/macaron/slsa_analyzer/checks`` for more examples.
93101

94-
1. First create a module called ``repo_check.py`` under ``src/macaron/slsa_analyzer/checks``.
102+
As discussed earlier, each check needs to be implemented as a Python class in a Python module under ``src/macaron/slsa_analyzer/checks``.
103+
A check class should subclass the :class:`BaseCheck <macaron.slsa_analyzer.checks.base_check.BaseCheck>` class.
104+
105+
'''''''''''''''
106+
Create a module
107+
'''''''''''''''
108+
First create a module called ``repo_check.py`` under ``src/macaron/slsa_analyzer/checks``.
109+
95110

96-
2. Add a class and specify the columns that you want to store for the check outputs to the database.
111+
''''''''''''''''''''''''''''
112+
Add a class for the database
113+
''''''''''''''''''''''''''''
114+
115+
* Add a class that subclasses :class:`CheckFacts <macaron.database.table_definitions.CheckFacts>` to map your outputs to a table in the database. The class name should follow the ``<MyCheck>Facts`` pattern.
116+
* Specify the table name in the ``__tablename__`` class variable. Note that the table name should start with ``_`` and it should not have been used by other checks.
117+
* Add the ``id`` column as the primary key where the foreign key is ``_check_facts.id``.
118+
* Add columns for the check outputs that you would like to store in the database. If a column needs to appear as a justification in the HTML/JSON report, pass ``info={"justification": JustificationType.<TEXT or HREF>}`` to the column mapper.
119+
* Add ``__mapper_args__`` class variable and set ``"polymorphic_identity"`` key to the table name.
97120

98121
.. code-block:: python
99122
@@ -113,10 +136,25 @@ Feel free to explore other existing checks under ``src/macaron/slsa_analyzer/che
113136
git_repo: Mapped[str] = mapped_column(String, nullable=True, info={"justification": JustificationType.HREF})
114137
115138
__mapper_args__ = {
116-
"polymorphic_identity": "__repo_check",
139+
"polymorphic_identity": "_repo_check",
117140
}
118141
119-
3. Add a class for your check, provide the check details in the initializer method, and implement the logic of the check in ``run_check``.
142+
'''''''''''''''''''
143+
Add the check class
144+
'''''''''''''''''''
145+
146+
Add a class for your check that subclasses :class:`BaseCheck <macaron.slsa_analyzer.checks.base_check.BaseCheck>`,
147+
provide the check details in the initializer method, and implement the logic of the check in
148+
:func:`run_check <macaron.slsa_analyzer.checks.base_check.BaseCheck.run_check>`.
149+
150+
A ``check_id`` should meet the following requirements:
151+
152+
- The general format: ``mcn_<name>_<digits>``
153+
- In ``name``, only lowercase alphabetical letters are allowed. If ``name`` contains multiple \
154+
words, they must be separated by underscores.
155+
156+
157+
You can set the ``depends_on`` attribute in the initializer method to declare such dependencies. In this example, we leave this list empty.
120158

121159
.. code-block:: python
122160
@@ -156,13 +194,28 @@ Feel free to explore other existing checks under ``src/macaron/slsa_analyzer/che
156194
result_type=CheckResultType.PASSED,
157195
)
158196
159-
4. Register your check.
197+
As you can see, the result of the check is returned via the :class:`CheckResultData <macaron.slsa_analyzer.checks.check_result.CheckResultData>` object.
198+
You should specify a :class:`Confidence <macaron.slsa_analyzer.checks.check_result.Confidence>`
199+
score choosing one of the :class:`Confidence <macaron.slsa_analyzer.checks.check_result.Confidence>` enum values,
200+
e.g., :class:`Confidence.HIGH <macaron.slsa_analyzer.checks.check_result.Confidence.HIGH>` and pass it via keyword
201+
argument :attr:`confidence <macaron.database.table_definitions.CheckFacts.confidence>`. You should choose a suitable
202+
confidence score based on the accuracy of your check analysis.
203+
204+
'''''''''''''''''''
205+
Register your check
206+
'''''''''''''''''''
207+
208+
Finally, you need to register your check by adding it to the :mod:`registry module <macaron.slsa_analyzer.registry>` at the end of your check module:
160209

161210
.. code-block:: python
162211
163212
registry.register(RepoCheck())
164213
165214
215+
'''''''''''''''
216+
Test your check
217+
'''''''''''''''
218+
166219
Finally, you can add tests for you check by adding ``tests/slsa_analyzer/checks/test_repo_check.py`` module. Macaron
167220
uses `pytest <https://docs.pytest.org>`_ and `hypothesis <https://hypothesis.readthedocs.io>`_ for testing. Take a look
168221
at other tests for inspiration!

src/macaron/database/table_definitions.py

Lines changed: 1 addition & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@
1515
import string
1616
from datetime import datetime
1717
from pathlib import Path
18-
from typing import Any, Self
18+
from typing import Any
1919

2020
from packageurl import PackageURL
2121
from sqlalchemy import (
@@ -448,30 +448,6 @@ class CheckFacts(ORMBase):
448448
#: A many-to-one relationship with check results.
449449
checkresult: Mapped["MappedCheckResult"] = relationship(back_populates="checkfacts")
450450

451-
def __lt__(self, other: Self) -> bool:
452-
"""Compare two check facts using their confidence values.
453-
454-
This comparison function is intended to be used by a heapq, which is a Min-Heap data structure.
455-
The root element in a heapq is the minimum element in the queue and each `confidence` value is in [0, 1].
456-
Therefore, we need reverse the comparison function to make sure the fact with highest confidence is stored
457-
in the root element. This implementation compares `1 - confidence` to return True if the confidence of
458-
`fact_a` is greater than the confidence of `fact_b`.
459-
460-
.. code-block:: pycon
461-
462-
>>> fact_a = CheckFacts()
463-
>>> fact_b = CheckFacts()
464-
>>> fact_a.confidence = 0.2
465-
>>> fact_b.confidence = 0.7
466-
>>> fact_b < fact_a
467-
True
468-
469-
Return
470-
------
471-
bool
472-
"""
473-
return (1 - self.confidence) < (1 - other.confidence)
474-
475451
#: The polymorphic inheritance configuration.
476452
__mapper_args__ = {
477453
"polymorphic_identity": "CheckFacts",

src/macaron/slsa_analyzer/analyze_context.py

Lines changed: 19 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -64,7 +64,9 @@ def __init__(
6464
output_dir : str
6565
The output dir.
6666
"""
67-
self.component = component
67+
# The component attribute should be accessed via the `component` property.
68+
self._component = component
69+
6870
self.ctx_data: dict[ReqName, SLSAReqStatus] = create_requirement_status_dict()
6971

7072
self.slsa_level = SLSALevels.LEVEL0
@@ -92,6 +94,20 @@ def __init__(
9294
expectation=None,
9395
)
9496

97+
@property
98+
def component(self) -> Component:
99+
"""Return the object associated with a target software component.
100+
101+
This property contains the information about a software component, such as it's
102+
corresponding repository and dependencies.
103+
104+
105+
Returns
106+
-------
107+
Component
108+
"""
109+
return self._component
110+
95111
@property
96112
def dynamic_data(self) -> ChecksOutputs:
97113
"""Return the `dynamic_data` object that contains various intermediate representations.
@@ -104,8 +120,8 @@ def dynamic_data(self) -> ChecksOutputs:
104120
are that what you try to implement is already implemented and the results are available in the
105121
`dynamic_data` object.
106122
107-
Return
108-
------
123+
Returns
124+
-------
109125
ChecksOutputs
110126
"""
111127
return self._dynamic_data

src/macaron/slsa_analyzer/checks/check_result.py

Lines changed: 9 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,6 @@
44
"""This module contains the CheckResult class for storing the result of a check."""
55
from dataclasses import dataclass
66
from enum import Enum
7-
from heapq import heappush
87
from typing import TypedDict
98

109
from macaron.database.table_definitions import CheckFacts
@@ -78,19 +77,18 @@ class CheckResultData:
7877
@property
7978
def justification_report(self) -> list[tuple[Confidence, list]]:
8079
"""
81-
Return the list of justifications for the check result generated from the tables in the database.
80+
Return a sorted list of justifications based on confidence scores in descending order.
8281
83-
Note that the elements in the justification will be rendered different based on their types:
82+
These justifications are generated from the tables in the database.
83+
Note that the elements in the justification will be rendered differently based on their types:
8484
8585
* a :class:`JustificationType.TEXT` element is displayed in plain text in the HTML report.
8686
* a :class:`JustificationType.HREF` element is rendered as a hyperlink in the HTML report.
8787
88-
Return
89-
------
88+
Returns
89+
-------
9090
list[tuple[Confidence, list]]
9191
"""
92-
# Interestingly, mypy cannot infer the type of elements later at `heappush` if we specify
93-
# list[tuple[Confidence, list]]. But still, it insists on specifying the `list` type here.
9492
justification_list: list = []
9593
for result in self.result_tables:
9694
# The HTML report generator requires the justification elements that need to be rendered in HTML
@@ -112,15 +110,15 @@ def justification_report(self) -> list[tuple[Confidence, list]]:
112110
if dict_elements:
113111
list_elements.append(dict_elements)
114112

115-
# Use heapq to always keep the justification with the highest confidence score in the first element.
116113
if list_elements:
117-
heappush(justification_list, (result.confidence, list_elements))
114+
justification_list.append((result.confidence, list_elements))
118115

119116
# If there are no justifications available, return a default "Not Available" one.
120117
if not justification_list:
121118
return [(Confidence.HIGH, ["Not Available."])]
122119

123-
return justification_list
120+
# Sort the justification list based on the confidence score in descending order.
121+
return sorted(justification_list, key=lambda item: item[0], reverse=True)
124122

125123

126124
@dataclass(frozen=True)
@@ -147,7 +145,7 @@ def get_summary(self) -> dict:
147145
"check_id": self.check.check_id,
148146
"check_description": self.check.check_description,
149147
"slsa_requirements": [str(BUILD_REQ_DESC.get(req)) for req in self.check.eval_reqs],
150-
# The justification report is stored in a heapq where the first element has the highest confidence score.
148+
# The justification report is sorted and the first element has the highest confidence score.
151149
"justification": self.result.justification_report[0][1],
152150
"result_tables": self.result.result_tables,
153151
"result_type": self.result.result_type,

0 commit comments

Comments
 (0)