Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat!: introduce confidence scores for check facts #620

Merged
merged 6 commits into from
Feb 15, 2024

Conversation

behnazh-w
Copy link
Member

@behnazh-w behnazh-w commented Jan 31, 2024

Confidence score and justification type

  • This PR allows specifying confidence scores for check results, which is especially useful when a check reports multiple candidate results. All of these confidence scores are added to the check tables in the database. However, the fact that has the highest confidence is shown in the HTML/JSON report only.

  • The justifications are no longer required to be added manually to the CheckResultData. Instead, they are curated directly from the results in the table. If a column has specified JustificationType in the column mapping, it will be picked up automatically and rendered as plain text or href depending on the specified type. If a check fails or is skipped, we show a default Not Available. justification. This allows to create HTML/JSON reports from the database reproducibly.

Refactoring

For this feature to work, the following refactorings are done as well:

  • The checks are changed to add multiple facts to a result table.
  • The confidence score is added to CheckFacts ORM mapping, which all the check mappings inherit from. That caused some circular dependency issues for the Expectation ORM mapping. I have refactored and improved this mapping accordingly.
  • I have improved the check table columns that needed to be added as justification. For example, I have added asset_url to ProvenanceAvailableFacts. That required adding a new SLSAProvenanceData wrapper class for GitHub release provenances.

Documentation

I have added elaborate instructions for adding a new check and explained the current interface.

@oracle-contributor-agreement oracle-contributor-agreement bot added the OCA Verified All contributors have signed the Oracle Contributor Agreement. label Jan 31, 2024
@behnazh-w behnazh-w force-pushed the behnazh/improve-build-as-code branch from 9968c46 to 7f92fe2 Compare January 31, 2024 04:32
@behnazh-w behnazh-w force-pushed the behnazh/improve-build-as-code branch from 01d446e to 9aedac6 Compare February 1, 2024 03:49
@behnazh-w behnazh-w marked this pull request as ready for review February 1, 2024 05:32
@behnazh-w behnazh-w requested a review from tromai as a code owner February 1, 2024 05:32
@nicallen
Copy link
Member

nicallen commented Feb 5, 2024

Under this new schema, the confidence score is recorded for a check_facts entry, which currently contains multiple pieces of information that would be forced to share a confidence score. For example, the build_as_code_check entries contain both ci_service_name and deploy_command which are derived by different means and hence may have different levels of confidence, which cannot be expressed in the current schema. This could be fixed this by splitting the build_as_code_check and similar tables into smaller tables with more granularity of information. Are you intending to do that in later changes?

@behnazh-w
Copy link
Member Author

Under this new schema, the confidence score is recorded for a check_facts entry, which currently contains multiple pieces of information that would be forced to share a confidence score. For example, the build_as_code_check entries contain both ci_service_name and deploy_command which are derived by different means and hence may have different levels of confidence, which cannot be expressed in the current schema. This could be fixed this by splitting the build_as_code_check and similar tables into smaller tables with more granularity of information. Are you intending to do that in later changes?

For this particular example, I think of deploy_command as the main result of the build_as_code_check , which can have different confidence scores based on the analysis. The ci_service_name fact, however, is an informational column that shows in which CI configuration the deploy command was found. So, I don't see it necessary to split the table in this case. But if a check can generate more than one fact, and each fact needs to have a different score, I agree that splitting to two smaller tables would solve the problem.

@behnazh-w behnazh-w force-pushed the behnazh/improve-build-as-code branch from bc4374a to d25bf57 Compare February 11, 2024 22:29
@oracle oracle deleted a comment from behnazh-w Feb 12, 2024
@behnazh-w behnazh-w changed the title feat: introduce confidence scores for check facts feat!: introduce confidence scores for check facts Feb 13, 2024
Copy link
Member

@nathanwn nathanwn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the PR should be good to merge.
Thanks for the PR.

Signed-off-by: behnazh-w <[email protected]>
Copy link
Member

@benmss benmss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@behnazh-w behnazh-w merged commit db3231f into staging Feb 15, 2024
16 checks passed
@nathanwn nathanwn deleted the behnazh/improve-build-as-code branch February 15, 2024 01:59
art1f1c3R pushed a commit that referenced this pull request Nov 29, 2024
This PR changes the data model and allows specifying confidence scores for check results, which is especially useful when a check reports multiple candidate results. All of these confidence scores are added to the check tables in the database. However, the fact that has the highest confidence is shown in the HTML/JSON report only.

The justifications are no longer required to be added manually to the CheckResultData. Instead, they are curated directly from the results in the table. If a column has specified JustificationType in the column mapping, it will be picked up automatically and rendered as plain text or href depending on the specified type. If a check fails or is skipped, we show a default Not Available. justification. This allows to create HTML/JSON reports from the database reproducibly.

Signed-off-by: behnazh-w <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
OCA Verified All contributors have signed the Oracle Contributor Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants