Skip to content

Commit 3c076ee

Browse files
authored
Merge branch 'main' into dependabot/pip/black-24.10.0
2 parents af79f09 + a77029c commit 3c076ee

File tree

7 files changed

+84
-55
lines changed

7 files changed

+84
-55
lines changed

README.md

Lines changed: 29 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -15,9 +15,9 @@ This needs to change, and proper tooling is the first step.
1515

1616
![ModelScan Preview](/imgs/modelscan-unsafe-model.gif)
1717

18-
ModelScan is an open source project from [Protect AI](https://protectai.com/) that scans models to determine if they contain
19-
unsafe code. It is the first model scanning tool to support multiple model formats.
20-
ModelScan currently supports: H5, Pickle, and SavedModel formats. This protects you
18+
ModelScan is an open source project from [Protect AI](https://protectai.com/?utm_campaign=Homepage&utm_source=ModelScan%20GitHub%20Page&utm_medium=cta&utm_content=Open%20Source) that scans models to determine if they contain
19+
unsafe code. It is the first model scanning tool to support multiple model formats.
20+
ModelScan currently supports: H5, Pickle, and SavedModel formats. This protects you
2121
when using PyTorch, TensorFlow, Keras, Sklearn, XGBoost, with more on the way.
2222

2323
## TL;DR
@@ -38,9 +38,9 @@ modelscan -p /path/to/model_file.pkl
3838

3939
Models are often created from automated pipelines, others may come from a data scientist’s laptop. In either case the model needs to move from one machine to another before it is used. That process of saving a model to disk is called serialization.
4040

41-
A **Model Serialization Attack** is where malicious code is added to the contents of a model during serialization(saving) before distribution — a modern version of the Trojan Horse.
41+
A **Model Serialization Attack** is where malicious code is added to the contents of a model during serialization(saving) before distribution — a modern version of the Trojan Horse.
4242

43-
The attack functions by exploiting the saving and loading process of models. When you load a model with `model = torch.load(PATH)`, PyTorch opens the contents of the file and begins to running the code within. The second you load the model the exploit has executed.
43+
The attack functions by exploiting the saving and loading process of models. When you load a model with `model = torch.load(PATH)`, PyTorch opens the contents of the file and begins to running the code within. The second you load the model the exploit has executed.
4444

4545
A **Model Serialization Attack** can be used to execute:
4646

@@ -51,14 +51,27 @@ A **Model Serialization Attack** can be used to execute:
5151

5252
These attacks are incredibly simple to execute and you can view working examples in our 📓[notebooks](https://github.com/protectai/modelscan/tree/main/notebooks) folder.
5353

54+
## Enforcing And Automating Model Security
55+
56+
ModelScan offers robust open-source scanning. If you need comprehensive AI security, consider [Guardian](https://protectai.com/guardian?utm_campaign=Guardian&utm_source=ModelScan%20GitHub%20Page&utm_medium=cta&utm_content=Open%20Source). It is our enterprise-grade model scanning product.
57+
58+
![Guardian Overview](/imgs/guardian_overview.png)
59+
60+
### Guardian's Features:
61+
62+
1. **Cutting-Edge Scanning**: Access our latest scanners, broader model support, and automatic model format detection.
63+
2. **Proactive Security**: Define and enforce security requirements for Hugging Face models before they enter your environment—no code changes required.
64+
3. **Enterprise-Wide Coverage**: Implement a cohesive security posture across your organization, seamlessly integrating with your CI/CD pipelines.
65+
4. **Comprehensive Audit Trail**: Gain full visibility into all scans and results, empowering you to identify and mitigate threats effectively.
66+
5467
## Getting Started
5568

5669
### How ModelScan Works
5770

58-
If loading a model with your machine learning framework automatically executes the attack,
71+
If loading a model with your machine learning framework automatically executes the attack,
5972
how does ModelScan check the content without loading the malicious code?
6073

61-
Simple, it reads the content of the file one byte at a time just like a string, looking for
74+
Simple, it reads the content of the file one byte at a time just like a string, looking for
6275
code signatures that are unsafe. This makes it incredibly fast, scanning models in the time it
6376
takes for your computer to process the total filesize from disk(seconds in most cases). It also secure.
6477

@@ -78,7 +91,7 @@ it opens you up for attack. Use your discretion to determine if that is appropri
7891

7992
### What Models and Frameworks Are Supported?
8093

81-
This will be expanding continually, so look out for changes in our release notes.
94+
This will be expanding continually, so look out for changes in our release notes.
8295

8396
At present, ModelScan supports any Pickle derived format and many others:
8497

@@ -90,7 +103,7 @@ At present, ModelScan supports any Pickle derived format and many others:
90103
| | [keras.models.save(save_format= 'keras')](https://www.tensorflow.org/guide/keras/serialization_and_saving) | Keras V3 (Hierarchical Data Format) | Yes |
91104
| Classic ML Libraries (Sklearn, XGBoost etc.) | pickle.dump(), dill.dump(), joblib.dump(), cloudpickle.dump() | Pickle, Cloudpickle, Dill, Joblib | Yes |
92105

93-
### Installation
106+
### Installation
94107
ModelScan is installed on your systems as a Python package(Python 3.9 to 3.12 supported). As shown from above you can install
95108
it by running this in your terminal:
96109

@@ -114,7 +127,7 @@ pip install 'modelscan[ tensorflow, h5py ]'
114127

115128
ModelScan supports the following arguments via the CLI:
116129

117-
| Usage | Argument | Explanation |
130+
| Usage | Argument | Explanation |
118131
|----------------------------------------------------------------------------------|------------------|---------------------------------------------------------|
119132
| ```modelscan -h ``` | -h or --help | View usage help |
120133
| ```modelscan -v ``` | -v or --version | View version information |
@@ -143,9 +156,9 @@ Once a scan has been completed you'll see output like this if an issue is found:
143156
![ModelScan Scan Output](https://github.com/protectai/modelscan/raw/main/imgs/cli_output.png)
144157

145158
Here we have a model that has an unsafe operator for both `ReadFile` and `WriteFile` in the model.
146-
Clearly we do not want our models reading and writing files arbitrarily. We would now reach out
159+
Clearly we do not want our models reading and writing files arbitrarily. We would now reach out
147160
to the creator of this model to determine what they expected this to do. In this particular case
148-
it allows an attacker to read our AWS credentials and write them to another place.
161+
it allows an attacker to read our AWS credentials and write them to another place.
149162

150163
That is a firm NO for usage.
151164

@@ -182,7 +195,7 @@ to learn more!
182195

183196
## Licensing
184197

185-
Copyright 2023 Protect AI
198+
Copyright 2024 Protect AI
186199

187200
Licensed under the Apache License, Version 2.0 (the "License");
188201
you may not use this file except in compliance with the License.
@@ -201,9 +214,7 @@ limitations under the License.
201214
We were heavily inspired by [Matthieu Maitre](http://mmaitre314.github.io) who built [PickleScan](https://github.com/mmaitre314/picklescan).
202215
We appreciate the work and have extended it significantly with ModelScan. ModelScan is OSS’ed in the similar spirit as PickleScan.
203216

204-
## Contributing
205-
206-
We would love to have you contribute to our open source ModelScan project.
207-
If you would like to contribute, please follow the details on [Contribution page](https://github.com/protectai/modelscan/blob/main/CONTRIBUTING.md).
217+
## Contributing
208218

209-
219+
We would love to have you contribute to our open source ModelScan project.
220+
If you would like to contribute, please follow the details on [Contribution page](https://github.com/protectai/modelscan/blob/main/CONTRIBUTING.md).

imgs/guardian_overview.png

1.82 MB
Loading

modelscan/modelscan.py

Lines changed: 2 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -91,11 +91,7 @@ def _iterate_models(self, model_path: Path) -> Generator[Model, None, None]:
9191
with Model(file) as model:
9292
yield model
9393

94-
if (
95-
not _is_zipfile(file, model.get_stream())
96-
and Path(file).suffix
97-
not in self._settings["supported_zip_extensions"]
98-
):
94+
if not _is_zipfile(file, model.get_stream()):
9995
continue
10096

10197
try:
@@ -114,7 +110,7 @@ def _iterate_models(self, model_path: Path) -> Generator[Model, None, None]:
114110
continue
115111

116112
yield Model(file_name, file_io)
117-
except zipfile.BadZipFile as e:
113+
except (zipfile.BadZipFile, RuntimeError) as e:
118114
logger.debug(
119115
"Skipping zip file %s, due to error",
120116
str(model.get_source()),

modelscan/settings.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -128,6 +128,7 @@ class SupportedModelFormats:
128128
"bdb": "*",
129129
"pdb": "*",
130130
"shutil": "*",
131+
"asyncio": "*",
131132
},
132133
"HIGH": {
133134
"webbrowser": "*", # Includes webbrowser.open()

poetry.lock

Lines changed: 35 additions & 30 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

tests/data/password_protected.zip

162 Bytes
Binary file not shown.

tests/test_modelscan.py

Lines changed: 17 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@
1010
import dill
1111
import pytest
1212
import requests
13+
import shutil
1314
import socket
1415
import subprocess
1516
import sys
@@ -331,6 +332,10 @@ def file_path(tmp_path_factory: Any) -> Any:
331332

332333
initialize_data_file(f"{tmp}/data/malicious14.pkl", malicious14_gen())
333334

335+
shutil.copy(
336+
f"{os.path.dirname(__file__)}/data/password_protected.zip", f"{tmp}/data/"
337+
)
338+
334339
return tmp
335340

336341

@@ -1361,7 +1366,18 @@ def test_scan_directory_path(file_path: str) -> None:
13611366
"benign0_v3.dill",
13621367
"benign0_v4.dill",
13631368
}
1364-
assert results["summary"]["skipped"]["skipped_files"] == []
1369+
assert results["summary"]["skipped"]["skipped_files"] == [
1370+
{
1371+
"category": "SCAN_NOT_SUPPORTED",
1372+
"description": "Model Scan did not scan file",
1373+
"source": "password_protected.zip",
1374+
},
1375+
{
1376+
"category": "BAD_ZIP",
1377+
"description": "Skipping zip file due to error: File 'test.txt' is encrypted, password required for extraction",
1378+
"source": "password_protected.zip",
1379+
},
1380+
]
13651381
assert results["errors"] == []
13661382

13671383

0 commit comments

Comments
 (0)