Skip to content

Commit bf118b3

Browse files
authored
feat: enable repo finder to support more languages via Open Source Insights (#388)
This feature modifies the Repo Finder, so that it can: be usable from anywhere within Macaron; accept PURL strings as input; and, support more languages via Google's Open Source Insights (deps.dev) This enables Macaron to accept artifact PURLs as input, whereby the Repo Finder will be used to attempt to retrieve the related repository. Additional languages include those supported by deps.dev: Python, NodeJS, .Net, and Rust. Note that currently these will only work when specifying an artifact PURL as input, or providing an SBOM. Full support for these extra languages will require the addition of new dependency analyzers. A new config option is also provided to disable API calls to Google's Open Source Insights, if desired. Signed-off-by: Ben Selwyn-Smith <[email protected]>
1 parent 7350b55 commit bf118b3

File tree

30 files changed

+2785
-1418
lines changed

30 files changed

+2785
-1418
lines changed

docs/source/pages/developers_guide/apidoc/macaron.dependency_analyzer.rst

Lines changed: 0 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -40,11 +40,3 @@ macaron.dependency\_analyzer.dependency\_resolver module
4040
:members:
4141
:undoc-members:
4242
:show-inheritance:
43-
44-
macaron.dependency\_analyzer.java\_repo\_finder module
45-
------------------------------------------------------
46-
47-
.. automodule:: macaron.dependency_analyzer.java_repo_finder
48-
:members:
49-
:undoc-members:
50-
:show-inheritance:
Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
macaron.repo\_finder package
2+
============================
3+
4+
.. automodule:: macaron.repo_finder
5+
:members:
6+
:undoc-members:
7+
:show-inheritance:
8+
9+
Submodules
10+
----------
11+
12+
macaron.repo\_finder.repo\_finder module
13+
----------------------------------------
14+
15+
.. automodule:: macaron.repo_finder.repo_finder
16+
:members:
17+
:undoc-members:
18+
:show-inheritance:
19+
20+
macaron.repo\_finder.repo\_finder\_base module
21+
----------------------------------------------
22+
23+
.. automodule:: macaron.repo_finder.repo_finder_base
24+
:members:
25+
:undoc-members:
26+
:show-inheritance:
27+
28+
macaron.repo\_finder.repo\_finder\_deps\_dev module
29+
---------------------------------------------------
30+
31+
.. automodule:: macaron.repo_finder.repo_finder_deps_dev
32+
:members:
33+
:undoc-members:
34+
:show-inheritance:
35+
36+
macaron.repo\_finder.repo\_finder\_java module
37+
----------------------------------------------
38+
39+
.. automodule:: macaron.repo_finder.repo_finder_java
40+
:members:
41+
:undoc-members:
42+
:show-inheritance:
43+
44+
macaron.repo\_finder.repo\_validator module
45+
-------------------------------------------
46+
47+
.. automodule:: macaron.repo_finder.repo_validator
48+
:members:
49+
:undoc-members:
50+
:show-inheritance:

docs/source/pages/developers_guide/apidoc/macaron.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@ Subpackages
1919
macaron.output_reporter
2020
macaron.parsers
2121
macaron.policy_engine
22+
macaron.repo_finder
2223
macaron.slsa_analyzer
2324

2425
Submodules

docs/source/pages/developers_guide/apidoc/macaron.slsa_analyzer.build_tool.rst

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,14 @@ macaron.slsa\_analyzer.build\_tool.base\_build\_tool module
1717
:undoc-members:
1818
:show-inheritance:
1919

20+
macaron.slsa\_analyzer.build\_tool.docker module
21+
------------------------------------------------
22+
23+
.. automodule:: macaron.slsa_analyzer.build_tool.docker
24+
:members:
25+
:undoc-members:
26+
:show-inheritance:
27+
2028
macaron.slsa\_analyzer.build\_tool.gradle module
2129
------------------------------------------------
2230

docs/source/pages/using.rst

Lines changed: 50 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -104,7 +104,7 @@ To simplify the examples, we use the same configurations as above if needed (e.g
104104
105105
The list bellow shows examples for the corresponding PURL strings for different git repositories:
106106

107-
.. list-table:: Example of PURL strings for git repositories.
107+
.. list-table:: Examples of PURL strings for git repositories.
108108
:widths: 50 50
109109
:header-rows: 1
110110

@@ -133,6 +133,39 @@ You can also provide the PURL string together with the repository path. In this
133133
134134
.. note:: When providing the PURL and the repository path, both the branch name and commit digest must be provided as well.
135135

136+
''''''''''''''''''''''''''''''''''''''
137+
Providing an artifact as a PURL string
138+
''''''''''''''''''''''''''''''''''''''
139+
140+
The PURL format supports artifacts as well as repositories, and Macaron supports (some of) these too.
141+
142+
.. code-block::
143+
144+
pkg:<package_type>/<artifact_details>
145+
146+
Where ``artifact_details`` varies based on the provided ``package_type``. Examples for those currently supported by Macaron are as follows:
147+
148+
.. list-table:: Examples of PURL strings for artifacts.
149+
:widths: 50 50
150+
:header-rows: 1
151+
152+
* - Package Type
153+
- PURL String
154+
* - Maven (Java)
155+
- ``pkg:maven/org.apache.xmlgraphics/[email protected]``
156+
* - PyPi (Python)
157+
- ``pkg:pypi/[email protected]``
158+
* - Cargo (Rust)
159+
- ``pkg:cargo/[email protected]``
160+
* - NuGet (.Net)
161+
- ``pkg:nuget/[email protected]``
162+
* - NPM (NodeJS)
163+
- ``pkg:npm/%40angular/[email protected]``
164+
165+
For more detailed information on converting a given artifact into a PURL, see `PURL Specification <https://github.com/package-url/purl-spec/blob/master/PURL-SPECIFICATION.rst>`_ and `PURL Types <https://github.com/package-url/purl-spec/blob/master/PURL-TYPES.rst>`_
166+
167+
.. note:: If a repository is not also provided, Macaron will try to discover it based on the artifact purl. For this to work, ``find_repos`` in the configuration file **must be enabled**\. See `Analyzing more dependencies <#more-deps>`_ for more information about the configuration options of the Repository Finding feature.
168+
136169
-------------------------------------------------
137170
Verifying provenance expectations in CUE language
138171
-------------------------------------------------
@@ -191,6 +224,8 @@ With the example above, the generated output reports can be seen here:
191224
- `micronaut-core.html <../_static/examples/micronaut-projects/micronaut-core/analyze_with_sbom/micronaut-core.html>`__
192225
- `micronaut-core.json <../_static/examples/micronaut-projects/micronaut-core/analyze_with_sbom/micronaut-core.json>`__
193226

227+
.. _more-deps:
228+
194229
'''''''''''''''''''''''''''
195230
Analyzing more dependencies
196231
'''''''''''''''''''''''''''
@@ -203,30 +238,38 @@ This feature is enabled by default. To disable, or configure its behaviour in ot
203238

204239
See :ref:`dump-defaults <action_dump_defaults>`, the CLI command to dump the default configurations in ``defaults.ini``. After making changes, see :ref:`analyze <analyze-action-cli>` CLI command for the option to pass the modified ``defaults.ini`` file.
205240

206-
Within the configuration file under the ``repofinder.java`` header, five options exist: ``find_repos``, ``artifact_repositories``, ``repo_pom_paths``, ``find_parents``, ``artifact_ignore_list``. These options behave as follows:
241+
Within the configuration file under the ``repofinder.java`` header, three options exist: ``artifact_repositories``, ``repo_pom_paths``, ``find_parents``. These options behave as follows:
207242

208-
- ``find_repos`` (Values: True or False) - Enables or disables the Repository Finding feature.
209243
- ``artifact_repositories`` (Values: List of URLs) - Determines the remote artifact repositories to attempt to retrieve dependency information from.
210244
- ``repo_pom_paths`` (Values: List of POM tags) - Determines where to search for repository information in the POM files. E.g. scm.url.
211245
- ``find_parents`` (Values: True or False) - When enabled, the Repository Finding feature will also search for repository URLs in parents POM files of the current dependency.
212-
- ``artifact_ignore_list`` (Values: List of GAs) - The Repository Finding feature will skip any artifact in this list. Format is "GroupId":"ArtifactId". E.g. org.apache.maven:maven
246+
247+
Under the related header ``repofinder``, two more options exist: ``find_repos``, and ``use_open_source_insights``:
248+
249+
- ``find_repos`` (Values: True or False) - Enables or disables the Repository Finding feature.
250+
- ``use_open_source_insights`` (Values: True or False) - Enables or disables use of Google's Open Source Insights API.
213251

214252
.. note:: Finding repositories requires at least one remote call, adding some additional overhead to an analysis run.
215253

254+
.. note:: Google's Open Source Insights API is currently used to find repositories for: Python, Rust, .Net, NodeJS
255+
216256
An example configuration file for utilising this feature:
217257

218258
.. code-block:: ini
219259
220-
[repofinder.java]
260+
[repofinder]
221261
find_repos = True
262+
use_open_source_insights = True
263+
264+
[repofinder.java]
222265
artifact_repositories = https://repo.maven.apache.org/maven2
223266
repo_pom_paths =
224267
scm.url
225268
scm.connection
226269
scm.developerConnection
227270
find_parents = True
228-
artifact_ignore_list =
229-
org.apache.maven:maven
271+
272+
230273
231274
-------------------------------------
232275
Analyzing a locally cloned repository

scripts/dev_scripts/integration_tests.sh

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@ HOMEDIR=$2
99
RESOURCES=$WORKSPACE/src/macaron/resources
1010
COMPARE_DEPS=$WORKSPACE/tests/dependency_analyzer/compare_dependencies.py
1111
COMPARE_JSON_OUT=$WORKSPACE/tests/e2e/compare_e2e_result.py
12+
TEST_REPO_FINDER=$WORKSPACE/tests/e2e/repo_finder/repo_finder.py
1213
RUN_MACARON="python -m macaron -o $WORKSPACE/output"
1314
RESULT_CODE=0
1415

@@ -532,3 +533,15 @@ then
532533
echo -e "Expected zero status code but got $RESULT_CODE."
533534
exit 1
534535
fi
536+
537+
# Testing the Repo Finder's remote calls.
538+
# This requires the 'packageurl' Python module
539+
echo -e "\n----------------------------------------------------------------------------------"
540+
echo "Testing Repo Finder functionality."
541+
echo -e "----------------------------------------------------------------------------------\n"
542+
python $TEST_REPO_FINDER || log_fail
543+
if [ $? -ne 0 ];
544+
then
545+
echo -e "Expect zero status code but got $?."
546+
log_fail
547+
fi

src/macaron/__main__.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -31,9 +31,9 @@ def analyze_slsa_levels_single(analyzer_single_args: argparse.Namespace) -> None
3131
# We don't mention --config-path as a possible option in this log message as it going to be move soon.
3232
# See: https://github.com/oracle/macaron/issues/417
3333
logger.error(
34-
"Analysis target missing. Please provide a package url (PURL) and/or repo path. "
35-
+ "Examples of a PURL can be seen at https://github.com/package-url/purl-spec: "
36-
+ "pkg:github/micronaut-projects/micronaut-core."
34+
"""Analysis target missing. Please provide a package url (PURL) and/or repo path.
35+
Examples of a PURL can be seen at https://github.com/package-url/purl-spec:
36+
pkg:github/micronaut-projects/micronaut-core."""
3737
)
3838
sys.exit(os.EX_USAGE)
3939

src/macaron/config/defaults.ini

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -44,19 +44,19 @@ timeout = 2400
4444
recursive = False
4545

4646
# This is the repo finder script.
47+
[repofinder]
48+
find_repos = True
49+
use_open_source_insights = True
50+
4751
[repofinder.java]
4852
# The list of maven-like repositories to attempt to retrieve artifact POMs from.
4953
artifact_repositories = https://repo.maven.apache.org/maven2
50-
find_repos = True
5154
repo_pom_paths =
5255
scm.url
5356
scm.connection
5457
scm.developerConnection
5558
find_parents = True
5659
parent_limit = 10
57-
# Disables repo finding for specific artifacts based on their group and artifact IDs. Format: {groupId}:{artifactId}
58-
# E.g. com.oracle.coherence.ce:coherence
59-
artifact_ignore_list =
6060

6161
# Git services that Macaron has access to clone repositories.
6262
# For security purposes, Macaron will only clone repositories from the hostnames specified.

src/macaron/config/global_config.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,6 @@ class GlobalConfig:
2121
gh_token: str = ""
2222
debug_level: int = logging.DEBUG
2323
resources_path: str = ""
24-
find_repos: bool = True
2524

2625
def load(
2726
self,

src/macaron/dependency_analyzer/cyclonedx.py

Lines changed: 24 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -9,11 +9,14 @@
99
from collections.abc import Iterable
1010
from pathlib import Path
1111

12+
from packageurl import PackageURL
13+
1214
from macaron.config.defaults import defaults
1315
from macaron.config.global_config import global_config
1416
from macaron.dependency_analyzer.dependency_resolver import DependencyAnalyzer, DependencyInfo
1517
from macaron.errors import MacaronError
1618
from macaron.output_reporter.scm import SCMStatus
19+
from macaron.repo_finder.repo_validator import find_valid_repository_url
1720

1821
logger: logging.Logger = logging.getLogger(__name__)
1922

@@ -160,21 +163,32 @@ def convert_components_to_artifacts(
160163
Returns
161164
-------
162165
dict
163-
A dictionary where dependency artifacts are grouped based on "artifactId:groupId".
166+
A dictionary where dependency artifacts are grouped based on "groupId:artifactId".
164167
"""
165168
all_versions: dict[str, list[DependencyInfo]] = {} # Stores all the versions of dependencies for debugging.
166169
latest_deps: dict[str, DependencyInfo] = {} # Stores the latest version of dependencies.
167170
url_to_artifact: dict[str, set] = {} # Used to detect artifacts that have similar repos.
168171
for component in components:
169172
try:
173+
# TODO make this function language agnostic when CycloneDX SBOM processing also is.
174+
# See https://github.com/oracle/macaron/issues/464
170175
key = f"{component.get('group')}:{component.get('name')}"
176+
if component.get("purl"):
177+
purl = PackageURL.from_string(str(component.get("purl")))
178+
else:
179+
# TODO remove maven assumption when optional non-existence of the component's purl is handled
180+
# See https://github.com/oracle/macaron/issues/464
181+
purl = PackageURL(
182+
type="maven",
183+
namespace=component.get("group"),
184+
name=component.get("name"),
185+
version=component.get("version") or None,
186+
)
187+
171188
# According to PEP-0589 all keys must be present in a TypedDict.
172189
# See https://peps.python.org/pep-0589/#totality
173190
item = DependencyInfo(
174-
version=component.get("version") or "",
175-
group=component.get("group") or "",
176-
name=component.get("name") or "",
177-
purl=component.get("purl") or "",
191+
purl=purl,
178192
url="",
179193
note="",
180194
available=SCMStatus.AVAILABLE,
@@ -187,10 +201,10 @@ def convert_components_to_artifacts(
187201
# IN case of a build error, we use this as a heuristic to avoid analyzing
188202
# submodules that produce development artifacts in the same repo.
189203
if (
190-
"snapshot"
191-
in (item.get("version") or "").lower() # or "" is not necessary but mypy produces a FP otherwise.
204+
"snapshot" in (purl.version or "").lower()
205+
# or "" is not necessary but mypy produces a FP otherwise.
192206
and root_component
193-
and item.get("group") == root_component.get("group")
207+
and purl.namespace == root_component.get("group")
194208
):
195209
continue
196210
logger.debug(
@@ -199,7 +213,7 @@ def convert_components_to_artifacts(
199213
)
200214
else:
201215
# Find a valid URL.
202-
item["url"] = DependencyAnalyzer.find_valid_url(
216+
item["url"] = find_valid_repository_url(
203217
link.get("url") for link in component.get("externalReferences") # type: ignore
204218
)
205219

@@ -228,7 +242,7 @@ def get_deps_from_sbom(sbom_path: str | Path) -> dict[str, DependencyInfo]:
228242
229243
Returns
230244
-------
231-
A dictionary where dependency artifacts are grouped based on "artifactId:groupId".
245+
A dictionary where dependency artifacts are grouped based on "groupId:artifactId".
232246
"""
233247
return convert_components_to_artifacts(
234248
get_dep_components(

0 commit comments

Comments
 (0)