Fix composite license expression detection for AND/OR cases#4691
Open
Kaushik-Kumar-CEG wants to merge 2 commits intoaboutcode-org:developfrom
Open
Fix composite license expression detection for AND/OR cases#4691Kaushik-Kumar-CEG wants to merge 2 commits intoaboutcode-org:developfrom
Kaushik-Kumar-CEG wants to merge 2 commits intoaboutcode-org:developfrom
Conversation
AyanSinhaMahapatra
requested changes
Jan 22, 2026
Member
AyanSinhaMahapatra
left a comment
There was a problem hiding this comment.
@Kaushik-Kumar-CEG IMHO the main fix for this particular issue would be to add rules.
- you have lots of test failures, have you even checked these? you need to regenerate the test expectations and see if your changes make any sense/they break other tests
- for a case like
apache-2.0 AND (apache-2.0 OR mit)you need to ensure these are perfect matches, on the same line/right next to each other to merge these. And the expressions matter - You have not crafted any test so show that the issue is actually fixed.
| is_license_reference: yes | ||
| relevance: 100 | ||
| --- | ||
| Apache-2.0 AND MIT No newline at end of file |
Member
There was a problem hiding this comment.
We need to add license rules for the whole texts instead:
licensed under Apache-2.0 AND MITlicensed under Apache-2.0 OR MIT
fe7811b to
8e68253
Compare
Signed-off-by: Kaushik <kaushikrjpm10@gmail.com>
8e68253 to
9512c29
Compare
Author
@AyanSinhaMahapatra I updated the PR and my approach based on your review:
The PR is ready for review. Please let me know if any further changes are needed! |
Signed-off-by: Kaushik <kaushikrjpm10@gmail.com>
df08b72 to
b944399
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.


Fixes #4690
The Issue
ScanCode produced incorrect results when normalizing certain short composite license references, involving AND/OR
The "AND" Failure
* Input:
Apache-2.0 AND MIT* Old Behavior: Dropped "MIT" as noise. Output:
Apache-2.0* Expected:
Apache-2.0 AND MITThe "OR" Failure
* Input:
Apache-2.0 OR MIT* Old Behavior: Reported both the combined license AND the individual part. Output:
Apache-2.0 OR MIT+Apache-2.0* Expected: Only
Apache-2.0 OR MITSummary of Changes
This PR addresses the issue by adding explicit license reference rules for the complete expressions
Added rules to match the full phrases:
Treating these phrases as single license references ensures that the full composite expressions are detected correctly,
Verification
Added datadriven license detection tests covering both scenarios.
The tests confirm that the expected composite license expressions are detected as intended.
I've regenerated test expectations that were directly affected by the new rules. The changes show the fix working correctly (redundant apache-2.0 detection removed). All tests pass locally. CI failures appear to be environment-related
Tasks