Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: accept provenance data in artifact pipeline check #872

Merged
merged 4 commits into from
Nov 18, 2024

Conversation

behnazh-w
Copy link
Member

@behnazh-w behnazh-w commented Sep 27, 2024

Refactoring the artifact pipeline detection check

  • Renames mcn_infer_artifact_pipeline_1 to mcn_find_artifact_pipeline_1.
  • This check can support all the package registries now.
  • Modifies the check fact table schema by adding new columns and allowing some existing columns to be nullable. This change enables us to store the reasons for check failures, such as when a GitHub workflow run is deleted, which may result in some previous columns lacking values.
  • Improve the heusristics, e.g., if an artifact is published before the corresponding code is committed, there cannot be a CI pipeline that triggered the publishing.
  • This check depends on the deploy command identified by the mcn_build_as_code_1 check. If a deploy command is detected, this check will attempt to locate a successful CI pipeline that triggered the step containing the deploy command.
  • When a verifiable provenance is found for an artifact, we use it to obtain the pipeline trigger. Otherwise, we use heuristics to find the triggering pipeline.

Improvements to mcn_build_as_code_1

  • If a provenance is found, we obtain the workflow that has triggered the artifact release.
  • Add support for Reusable GitHub Actions that perform automatic deployment. Since we do not analyze the external Reusable GitHub Actions, we use an allow list of approved Actions.
  • A new function, infer_confidence_deploy_workflow is added to BaseBuildTool to infer the confidence for such Reusable workflows.

The store_inferred_build_info_results function

  • Renamed store_inferred_provenance to store_inferred_build_info_results.
  • To avoid confusion, we avoid using the term inferred provenance here and instead simplify store build related information in the context object provided to checks.
  • Instead of using CIInfo["provenances"] for inferred build command analysis results, we use a new field: CIInfo["build_info_results"].

Provenance Extractor

  • New abstractions added to the provenance extractor to reuse the logic for extracting information such as ProvenanceBuildDefinition and ProvenancePredicate. With these new abstractions, we don't need to hardcode the expected buildType value while processing a provenance.

find_publish_timestamp

  • Added an API that can obtain the artifact timestamp for all the supported package registries.
  • By default we use deps.dev to obtain the timestamp except for Maven artifacts because we have observed that Maven Central has more accurate results.
  • Decoupled the Maven Central search API from the repository, making the hostname fully configurable to enable offline testing with a localhost server.

Tutorial and integration tests

  • Changed the Detecting a malicious Java dependency uploaded manually to Maven Central tutorial to Detecting Java dependencies manually uploaded to Maven Central
  • Used log4j-core artifact instead of guava, which has an automated deployment workflow.
  • Fixed the integration tests and added a new one for log4j-core.

@oracle-contributor-agreement oracle-contributor-agreement bot added the OCA Verified All contributors have signed the Oracle Contributor Agreement. label Sep 27, 2024
@behnazh-w behnazh-w force-pushed the behnazh/refactor-infer-publish-check branch 5 times, most recently from ac6cbcd to 7eac146 Compare October 2, 2024 09:47
@behnazh-w behnazh-w force-pushed the behnazh/refactor-infer-publish-check branch from 1586789 to e592c5d Compare October 29, 2024 05:29
@behnazh-w behnazh-w marked this pull request as ready for review October 29, 2024 05:29
@behnazh-w behnazh-w requested a review from benmss October 29, 2024 05:30
@behnazh-w behnazh-w force-pushed the behnazh/refactor-infer-publish-check branch from e592c5d to c9761dc Compare November 6, 2024 06:03
Copy link
Member

@tromai tromai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have finished my review. Thanks.
Overall, there isn't any major changes needs. Most of my comments are for minor improvements/nit picking.

@behnazh-w behnazh-w force-pushed the behnazh/refactor-infer-publish-check branch from 678abb1 to 17f3453 Compare November 15, 2024 07:14
@tromai
Copy link
Member

tromai commented Nov 16, 2024

Except from a small question in #872 (comment), my approval still holds. The PR could be merged as it.

@behnazh-w behnazh-w force-pushed the behnazh/refactor-infer-publish-check branch from 17f3453 to c097331 Compare November 18, 2024 22:58
@behnazh-w behnazh-w merged commit 4235041 into staging Nov 18, 2024
9 checks passed
art1f1c3R pushed a commit that referenced this pull request Nov 29, 2024
This PR renames `mcn_infer_artifact_pipeline_1` to `mcn_find_artifact_pipeline_1`. This check can support all the package registries now. When a verifiable provenance is found for an artifact, we use it to obtain the pipeline trigger. Otherwise, we use heuristics to find the triggering pipeline.

Signed-off-by: behnazh-w <[email protected]>
@behnazh-w behnazh-w deleted the behnazh/refactor-infer-publish-check branch December 4, 2024 08:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
OCA Verified All contributors have signed the Oracle Contributor Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants