Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: include inspector package urls as part of the malicious metadata facts for pypi packages #935

Merged
merged 10 commits into from
Dec 6, 2024

Conversation

art1f1c3R
Copy link
Member

To allow for retention of packages when they are taken off PyPI, this new feature includes the inspector.pypi.io URL for the distribution files as part of the MaliciousMetadataFacts detail_information field. This has been done by modifying the wheel_absence.py heuristic to instead of returning filenames, return the python hosted URL and corresponding pypi inspector URL. The unit test for this, test_wheel_absence.py has been updated to reflect these changes accordingly.

@oracle-contributor-agreement oracle-contributor-agreement bot added the OCA Verified All contributors have signed the Oracle Contributor Agreement. label Dec 3, 2024
@art1f1c3R art1f1c3R self-assigned this Dec 3, 2024
@art1f1c3R
Copy link
Member Author

art1f1c3R commented Dec 3, 2024

One change I have made that I am unsure about is, in wheel_absence.py:91 if _valid_url returns False, then currently I add an empty string. I am not sure what action should be taken if the constructed inspector link is not accessible.

The reason the check for a 200 OK response exists is due to the way I have constructed the pypi inspector link. I observe that the format of the pypi inspector link was the same as files.pythonhosted.org except replacing this prefix with the inspector URL, followed by the package name and then version. Everything else after was the same (using blake2b_256 and the filename), only this was only an observation and I haven't managed to find this documented anywhere, so I added in that check.

@behnazh-w
Copy link
Member

One change I have made that I am unsure about is, in wheel_absence.py:91 if _valid_url returns False, then currently I add an empty string. I am not sure what action should be taken if the constructed inspector link is not accessible.

The reason the check for a 200 OK response exists is due to the way I have constructed the pypi inspector link. I observe that the format of the pypi inspector link was the same as files.pythonhosted.org except replacing this prefix with the inspector URL, followed by the package name and then version. Everything else after was the same (using blake2b_256 and the filename), only this was only an observation and I haven't managed to find this documented anywhere, so I added in that check.

I have added a suggestion in this comment to improve the return value. If the inspector HEAD request fails, the value corresponding to the inspector could be None.

@art1f1c3R
Copy link
Member Author

art1f1c3R commented Dec 5, 2024

Took a look at the source code for inspector, and I have confirmed the way it organises its URLs is what is used in this new feature. At https://github.com/pypi/inspector/blob/main/inspector/main.py line 125:

@app.route(
    "/project/<project_name>/<version>/packages/<first>/<second>/<rest>/<distname>/"
)

Copy link
Member

@tromai tromai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks.

@art1f1c3R art1f1c3R merged commit fd17eaa into staging Dec 6, 2024
11 checks passed
@art1f1c3R art1f1c3R deleted the art1f1c3R/pypi-inspector-link branch December 10, 2024 02:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
OCA Verified All contributors have signed the Oracle Contributor Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants