-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: migrate semantic
from osv-scanner
#316
Conversation
@another-rex @erikvarga thinking about this in the public context and especially in regards to error handling, I'm wondering if we want to start by making just about everything except for I don't know much about the desired uses for i.e. this usage of |
semantic/version.go
Outdated
// when parsed as the concrete Version relative to the subject Version. | ||
// | ||
// The result will be 0 if v == w, -1 if v < w, or +1 if v > w. | ||
CompareStr(str string) int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feels like this should be:
Comare(Version v) int
Instead of CompareStr
. This would solve some of the error / panic discussions above. And is also more inline with how most custom comparison within Go works (comparing objects of the same type).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No because then it'd mean we have to handle being given a version from a different ecosystem, which itself would require returning an error
- that was the reason why I did CompareStr
, though yes if we're talking about returning an error at all here then we could explore changing this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's still possible for someone to provide a version from a different ecosystem as a string no?
Here's the user journey that would results in this:
semantic
is run in a server that exposes a osv.dev like API that accepts package names, version, ecosystem, etc and returns vulnerabilities.- A user provides some incorrect package like
foo
, claims the ecosystem isDebian
, and provides a non-debian version, let's say1.2.3-not-a-debian-vesion!@#$
. - The server retrieves a vulnerability with version range [
1.0.0
,2.0.0
]. - The server runs
lowerBound := semantic.Parse(1.0.0, "Debian")
. - The server runs
lowerBound.CompareStr("1.2.3-not-a-debian-vesion!@#$")
, leading to a panic.
I'm surprised osv-scanner doesn't crash when there's a non-debian version in say a status
file that's scanned.
For our use case, we'd likely do:
func CompareVuln(vulnVersion, packageVersion, ecosystem str) (error) {
defer func() {
if r := recover(); r != nil {
fmt.Println("Recovered:", r)
}
}()
vulnV, err := semantic.Parse(lowerV, ecosystem)
if err != nil { // handle }
_ := vulnV.CompareStr(packaveVersion)
// Rest omitted for brevity.
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure it's possible, but it just doesn't break anything because either implicitly or not, generally the native comparators resolve every input down to a result rather than erroring and so we do too i.e. in your example semantic
handles that just fine in the same way that dpkg
does:
osv-detector on main [$!?] via 🐹 v1.21.11 via v20.11.0
❯ go test ./pkg/semantic/...
--- FAIL: TestVersion_Compare_Ecosystems (0.00s)
--- FAIL: TestVersion_Compare_Ecosystems/Debian (0.00s)
compare_test.go:235: Expected 1.2.3-not-a-debian-vesion!@#$ to be less than 1.2.3, but it was greater than
compare_test.go:235: 1 of 31 failed
FAIL
FAIL github.com/g-rath/osv-detector/pkg/semantic 0.230s
FAIL
osv-detector on main [$!?] via 🐹 v1.21.11 via v20.11.0
❯ dpkg --compare-versions '1.2.3-not-a-debian-vesion!@#$' 'lt' '1.2.3'
dpkg: warning: version '1.2.3-not-a-debian-vesion!@#$' has bad syntax: invalid character in revision number
osv-detector on main [$!?] via 🐹 v1.21.11 via v20.11.0
❯ echo $?
1
osv-detector on main [$!?] via 🐹 v1.21.11 via v20.11.0
❯ dpkg --compare-versions '1.2.3-not-a-debian-vesion!@#$' 'gt' '1.2.3'
dpkg: warning: version '1.2.3-not-a-debian-vesion!@#$' has bad syntax: invalid character in revision number
osv-detector on main [$!?] via 🐹 v1.21.11 via v20.11.0
❯ echo $?
0
(again, I'd really like to say this is really the case for "all comparators" but it's been so long since I implemented some of these that I want to confirm that before throwing such a big claim around)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm looking at the code, wouldn't this panic with 1.2.3-not-a-debian:version!@#$
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup that one does it, which also matches dpkg --compare-version
and is why I didn't want to double down on "this never panics" 😅
osv-detector on main [$!?] via 🐹 v1.21.11 via v20.11.0 took 4s
❯ go test ./pkg/semantic/...
--- FAIL: TestVersion_Compare_Ecosystems (0.00s)
--- FAIL: TestVersion_Compare_Ecosystems/Debian (0.00s)
panic: failed to convert 1.2.3-not-a-debian to a number [recovered]
panic: failed to convert 1.2.3-not-a-debian to a number
goroutine 35 [running]:
testing.tRunner.func1.2({0x52fc20, 0xc00038c040})
/home/jones/.goenv/versions/1.21.11/src/testing/testing.go:1545 +0x238
testing.tRunner.func1()
/home/jones/.goenv/versions/1.21.11/src/testing/testing.go:1548 +0x397
panic({0x52fc20?, 0xc00038c040?})
/home/jones/.goenv/versions/1.21.11/src/runtime/panic.go:914 +0x21f
github.com/g-rath/osv-detector/pkg/semantic.convertToBigIntOrPanic({0xc00039c000, 0x12})
/home/jones/workspace/projects-personal/osv-detector/pkg/semantic/utilities.go:13 +0xa5
github.com/g-rath/osv-detector/pkg/semantic.parseDebianVersion({0xc00039c000?, 0x0?})
/home/jones/workspace/projects-personal/osv-detector/pkg/semantic/version-debian.go:153 +0xc7
github.com/g-rath/osv-detector/pkg/semantic.Parse({0xc00039c000?, 0x2?}, {0x55c248?, 0x4c0505?})
/home/jones/workspace/projects-personal/osv-detector/pkg/semantic/parse.go:29 +0x777
github.com/g-rath/osv-detector/pkg/semantic_test.parseAsVersion(0xc0001d0680, {0xc00039c000, 0x1e}, {0x55c248, 0x6})
/home/jones/workspace/projects-personal/osv-detector/pkg/semantic/compare_test.go:97 +0x6b
github.com/g-rath/osv-detector/pkg/semantic_test.expectCompareResult(0xc0001d0680, {0x55c248, 0x6}, {0xc00039c000, 0x1e}, {0xc00039c021, 0x5}, 0xffffffffffffffff)
/home/jones/workspace/projects-personal/osv-detector/pkg/semantic/compare_test.go:115 +0x92
github.com/g-rath/osv-detector/pkg/semantic_test.expectEcosystemCompareResult(0xc0001d0680, {0x55c248, 0x6}, {0xc00039c000, 0x1e}, {0xc00039c01f, 0x1}, {0xc00039c021, 0x5})
/home/jones/workspace/projects-personal/osv-detector/pkg/semantic/compare_test.go:141 +0x97
github.com/g-rath/osv-detector/pkg/semantic_test.runAgainstEcosystemFixture(0xc0001d0680, {0x55c248, 0x6}, {0x55f5da, 0x13})
/home/jones/workspace/projects-personal/osv-detector/pkg/semantic/compare_test.go:78 +0x328
github.com/g-rath/osv-detector/pkg/semantic_test.TestVersion_Compare_Ecosystems.func1(0x0?)
/home/jones/workspace/projects-personal/osv-detector/pkg/semantic/compare_test.go:235 +0x5a
testing.tRunner(0xc0001d0680, 0xc000121560)
/home/jones/.goenv/versions/1.21.11/src/testing/testing.go:1595 +0xff
created by testing.(*T).Run in goroutine 18
/home/jones/.goenv/versions/1.21.11/src/testing/testing.go:1648 +0x3ad
FAIL github.com/g-rath/osv-detector/pkg/semantic 0.006s
FAIL
osv-detector on main [$!?] via 🐹 v1.21.11 via v20.11.0
❯ dpkg --compare-versions '1.2.3-not-a-debian:version!@#$' 'gt' '1.2.3'
dpkg: error: version '1.2.3-not-a-debian:version!@#$' has bad syntax: epoch in version is not number
osv-detector on main [$!?] via 🐹 v1.21.11 via v20.11.0
❯ echo $?
2
@alowayed @zpavlinovic @erikvarga thanks for the reviews and discussions so far! I think there's been enough for me to do another pass before we continue further - notably, I think we should look at making I expect too I'll probably go ahead for now with making most things private to make it easier to iterate on the internals - I think that probably comes down to I'll also try to do a general refresher on the package and related native version comparers as a whole since while I do have confidence in what I've said so far, I wrote a lot of this 1-2 years ago and its required very little change besides adding new comparators, which while a great sign of stability (or lack of use... 😅) does mean I'm not feeling as confident as I'd like when trying to think through tradeoffs and make architecture decisions Feel free to continue discussions and review, but also feel free to wait until I've actioned the above 🙂 |
cc237d6
to
eecef1a
Compare
@alowayed @zpavlinovic @erikvarga @another-rex I've updated the implementation so that instead of panicking I think long-term there should be some exploring into refactoring the code to better leverage what's already been asserted to eliminate the code paths that cannot actually even happen e.g. there are functions where we convert to a There's a few places already where we could just ignore the error as we know for sure it can only ever be a number, but I'm guessing people's preference is to just always check-and-return the error (let me know if I'm wrong, as that would let me revert some logic changes for the Alpine implementation in particular) I'm probably still going to remove |
// returning an ErrUnsupportedEcosystem error if the ecosystem is not supported. | ||
func Parse(str string, ecosystem string) (Version, error) { | ||
//nolint:exhaustive // Using strings to specify ecosystem instead of lockfile types | ||
switch ecosystem { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add the ecosystems we support in osv.dev (https://github.com/google/osv.dev/blob/master/osv/ecosystems/_ecosystems.py) here? The missing ones are all pretty simple to add I believe, either using an existing sorting method, or literally just an int.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exploiting my IDE, it looks like these are the ones not in our switch
statement:
case "Bitnami":
SemverEcosystem()
case "SwiftURL":
SemverEcosystem()
// # Non SemVer-based ecosystems
case "Bioconductor":
Bioconductor()
case "Chainguard":
Chainguard()
case "GHC":
GHC()
case "Hackage":
Hackage()
case "Wolfi":
Wolfi()
// # Ecosystems which require a release version for enumeration, which is
// # handled separately in get().
// # Ecosystems missing implementations:
case "Android":
OrderingUnsupportedEcosystem()
case "ConanCenter":
OrderingUnsupportedEcosystem()
case "GitHub Actions":
OrderingUnsupportedEcosystem()
case "Linux":
OrderingUnsupportedEcosystem()
case "OSS-Fuzz":
OrderingUnsupportedEcosystem()
case "Photon OS":
OrderingUnsupportedEcosystem()
of those, the semver ones can just be dropped in, and we can't support the last 6, so that leaves 5 that would need custom implementations (though I wouldn't be surprised if Bioconductor is just CRAN).
I'm happy to look into those, but think they probably shouldn't be a blocker to landing this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are not blockers and can be done in a followup, the hardest part will just be the tests I think.
Bioconductor -> Semver
Chainguard -> Alpine
GHC -> Semver
Hackage -> SemverLike (just numbers and . separators)
Wolfi -> Alpine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have made #457 for this
PR looks good to me now apart from some nits - and that I'd still prefer cachedregexp to be introduced separately and discussing its merits there (vs. removing it in a separate PR) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the changes! Similar to Erik, would like to revisit cachedregexp
in the future. From my end, I'd just like a mechanism to trigger all regex at server startup / initialization than lazily. But that's not a blocker to using the library.
@erikvarga @alowayed I've now officially removed The main thing left that could be actioned is adding additional ecosystems as mentioned by @another-rex , which I'm keen to do but want to push into a new PR since this has already been around a while and I don't think it's critical to landing this as-is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like our internal linter found a few more things that we didn't catch with the open source linter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We got a couple of linter errors on the python script files internally, can you address them here? They're mainly about the import order, line length and missing docstrings:
Missing module docstring [missing-module-docstring]
semantic/generators/generate-pypi-versions.py:1:1
Missing function or method docstring [missing-function-docstring]
semantic/generators/generate-pypi-versions.py:42:1
Redefining built-in 'dict' [redefined-builtin]
semantic/generators/generate-pypi-versions.py:43:3
Missing function or method docstring [missing-function-docstring]
semantic/generators/generate-pypi-versions.py:76:1
Use implicit True/False evaluation [g-explicit-bool-comparison]
semantic/generators/generate-pypi-versions.py:82:8
Redefining name 'f' from outer scope (line 153) [redefined-outer-name]
semantic/generators/generate-pypi-versions.py:109:26
Missing module docstring [missing-module-docstring]
semantic/generators/generate-redhat-versions.py:1:1
Importing a member of a module [g-importing-member]
semantic/generators/generate-redhat-versions.py:6:1
Missing function or method docstring [missing-function-docstring]
semantic/generators/generate-redhat-versions.py:53:1
Redefining built-in 'dict' [redefined-builtin]
semantic/generators/generate-redhat-versions.py:54:3
Missing class docstring [missing-class-docstring]
semantic/generators/generate-redhat-versions.py:85:1
Missing function or method docstring [missing-function-docstring]
semantic/generators/generate-redhat-versions.py:93:3
Redefining name 'f' from outer scope (line 286) [redefined-outer-name]
semantic/generators/generate-redhat-versions.py:96:42
Redefining name 'f' from outer scope (line 286) [redefined-outer-name]
semantic/generators/generate-redhat-versions.py:116:42
Missing function or method docstring [missing-function-docstring]
semantic/generators/generate-redhat-versions.py:119:3
'subprocess.run' used without explicitly defining the value for 'check'. [subprocess-run-check]
semantic/generators/generate-redhat-versions.py:121:11
Raising too general exception: Exception [broad-exception-raised]
semantic/generators/generate-redhat-versions.py:124:7
Missing function or method docstring [missing-function-docstring]
semantic/generators/generate-redhat-versions.py:140:3
'subprocess.run' used without explicitly defining the value for 'check'. [subprocess-run-check]
semantic/generators/generate-redhat-versions.py:149:11
Raising too general exception: Exception [broad-exception-raised]
semantic/generators/generate-redhat-versions.py:152:7
Raising too general exception: Exception [broad-exception-raised]
semantic/generators/generate-redhat-versions.py:164:5
Missing class docstring [missing-class-docstring]
semantic/generators/generate-redhat-versions.py:183:1
Missing function or method docstring [missing-function-docstring]
semantic/generators/generate-redhat-versions.py:209:1
Use implicit True/False evaluation [g-explicit-bool-comparison]
semantic/generators/generate-redhat-versions.py:215:8
Redefining name 'f' from outer scope (line 286) [redefined-outer-name]
semantic/generators/generate-redhat-versions.py:242:26
Extra separation in import group before 'org.json.JSONArray' [ImportOrder]
semantic/generators/GenerateMavenVersions.java:3:1
Extra separation in import group before 'java.io.*' [ImportOrder]
semantic/generators/GenerateMavenVersions.java:6:1
Wrong order for java.io.* import. Use /google/src/head/depot/google3/tools/java/fix_imports.py or the cider-v Organize Imports command to automatically sort your imports. [ImportOrder]
semantic/generators/GenerateMavenVersions.java:6:1
Using the '.*' form of import should be avoided - java.io.*. [AvoidStarImport]
semantic/generators/GenerateMavenVersions.java:6:15
Using the '.*' form of import should be avoided - java.util.*. [AvoidStarImport]
semantic/generators/GenerateMavenVersions.java:11:17
Missing module docstring [missing-module-docstring]
semantic/generators/generate-debian-versions.py:1:1
Importing a member of a module [g-importing-member]
semantic/generators/generate-debian-versions.py:6:1
Missing function or method docstring [missing-function-docstring]
semantic/generators/generate-debian-versions.py:50:1
Redefining built-in 'dict' [redefined-builtin]
semantic/generators/generate-debian-versions.py:51:3
Missing class docstring [missing-class-docstring]
semantic/generators/generate-debian-versions.py:75:1
Missing function or method docstring [missing-function-docstring]
semantic/generators/generate-debian-versions.py:83:3
Redefining name 'f' from outer scope (line 235) [redefined-outer-name]
semantic/generators/generate-debian-versions.py:86:42
Redefining name 'f' from outer scope (line 235) [redefined-outer-name]
semantic/generators/generate-debian-versions.py:106:42
Missing function or method docstring [missing-function-docstring]
semantic/generators/generate-debian-versions.py:109:3
'subprocess.run' used without explicitly defining the value for 'check'. [subprocess-run-check]
semantic/generators/generate-debian-versions.py:115:11
Missing class docstring [missing-class-docstring]
semantic/generators/generate-debian-versions.py:132:1
Missing function or method docstring [missing-function-docstring]
semantic/generators/generate-debian-versions.py:158:1
Use implicit True/False evaluation [g-explicit-bool-comparison]
semantic/generators/generate-debian-versions.py:164:8
Redefining name 'f' from outer scope (line 235) [redefined-outer-name]
semantic/generators/generate-debian-versions.py:191:26
Missing module docstring [missing-module-docstring]
semantic/generators/generate-alpine-versions.py:1:1
Importing a member of a module [g-importing-member]
semantic/generators/generate-alpine-versions.py:7:1
Missing function or method docstring [missing-function-docstring]
semantic/generators/generate-alpine-versions.py:54:1
Redefining built-in 'dict' [redefined-builtin]
semantic/generators/generate-alpine-versions.py:55:3
Missing class docstring [missing-class-docstring]
semantic/generators/generate-alpine-versions.py:79:1
One-line docstring summary should end with ".", ".)", "?", or "!" [g-short-docstring-punctuation]
semantic/generators/generate-alpine-versions.py:91:1
Exception 'Exception' not in Raises section of docstring [g-doc-exception]
semantic/generators/generate-alpine-versions.py:118:7
'subprocess.run' used without explicitly defining the value for 'check'. [subprocess-run-check]
semantic/generators/generate-alpine-versions.py:115:11
Raising too general exception: Exception [broad-exception-raised]
semantic/generators/generate-alpine-versions.py:118:7
Raising too general exception: Exception [broad-exception-raised]
semantic/generators/generate-alpine-versions.py:127:7
Using an f-string that does not have any interpolated variables [f-string-without-interpolation]
semantic/generators/generate-alpine-versions.py:127:23
'subprocess.run' used without explicitly defining the value for 'check'. [subprocess-run-check]
semantic/generators/generate-alpine-versions.py:130:11
Raising too general exception: Exception [broad-exception-raised]
semantic/generators/generate-alpine-versions.py:133:7
Missing function or method docstring [missing-function-docstring]
semantic/generators/generate-alpine-versions.py:138:3
Redefining name 'f' from outer scope (line 321) [redefined-outer-name]
semantic/generators/generate-alpine-versions.py:141:42
Redefining name 'f' from outer scope (line 321) [redefined-outer-name]
semantic/generators/generate-alpine-versions.py:161:42
Missing function or method docstring [missing-function-docstring]
semantic/generators/generate-alpine-versions.py:164:3
Missing function or method docstring [missing-function-docstring]
semantic/generators/generate-alpine-versions.py:191:3
'subprocess.run' used without explicitly defining the value for 'check'. [subprocess-run-check]
semantic/generators/generate-alpine-versions.py:196:11
Raising too general exception: Exception [broad-exception-raised]
semantic/generators/generate-alpine-versions.py:203:7
Missing class docstring [missing-class-docstring]
semantic/generators/generate-alpine-versions.py:218:1
Missing function or method docstring [missing-function-docstring]
semantic/generators/generate-alpine-versions.py:244:1
Use implicit True/False evaluation [g-explicit-bool-comparison]
semantic/generators/generate-alpine-versions.py:250:8
Redefining name 'f' from outer scope (line 321) [redefined-outer-name]
semantic/generators/generate-alpine-versions.py:277:26
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@erikvarga for now I've just removed the generator scripts entirely, as those linting errors look like they'll need some work + I want to do a bit of tidying of the scripts anyway, so I don't think it's worth blocking landing this
This brings across the
semantic
package fromosv-scanner
, which provides support for parsing and comparing versions across different ecosystems per their unique specifications (or lack thereof).Main actions / discussion points:
cachedregexp
(though worth noting thatsemantic
includes some chonky regexps which do take a second or two to compile)semantic
weekly workflowResolves #257