-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Determine how different sorts of file names should be normalized #28
Comments
For other Linux distribution file names to look at, here are counts of how many packages each one has (snapshot of most of the counts from https://pkgs.org/ on Dec 30, 2024):
Interesting Linux distros to also examine packages for could be Wolfi (has most packages), Debian vs Ubuntu, Alpine, Arch, Mageia, openSUSE, OpenWrt, Fedora, and Oracle vs RHEL. |
As mentioned in #5 (comment), there could be some cases where recognizing the package that created a particular folder name could be used to identify the package. Would need to look into this to make sure it doesn't introduce false positives and is fairly accurate (also a lot of packages with a "plugins" subfolder). |
Different file names need to be normalized to have good odds of finding a match in our datasets. Generally, these are centered around a few things:
What needs to get done may vary based on the type of file:
The text was updated successfully, but these errors were encountered: