Dynamic Scan for Machine Learning Models

A security-focused framework for dynamically scanning and testing machine learning (ML) models inside a controlled, secure Docker environment. This project helps detect suspicious or malicious behaviors in ML models—such as unauthorized network access, filesystem modification, or execution of dangerous system calls—providing developers with tools to ensure integrity and safety of their models.

Overview

Dynamic Scan for Machine Learning Models (version 1.0) is engineered to automate the setup, execution, and analysis of dynamic tests for ML models, emphasizing robust security and diagnostics. By leveraging isolated Docker sandboxes and PyTorch, this tool analyzes system call activity to flag indicators of compromise, like unauthorized network connections (e.g., unexpected listening/binding ports), or risky file access/modification (e.g., attempts to alter /etc/passwd).

The framework is highly modular, supporting customizable test cases and configurable scanning environments. Its primary audience is ML developers, researchers, and security engineers seeking automated, reproducible assurance of safe ML behavior.

Features

	Feature	Description
⚙️	Architecture	The project uses a Docker-based architecture to facilitate dynamic scanning and testing of machine learning models, integrating various scripts for setup, execution, and result parsing. It emphasizes modularity and flexibility in testing workflows.
🧩	Modularity	The repository is highly modular, with separate scripts handling environment setup, scanning, and testing processes, allowing for easy updates and maintenance.
🧪	Testcase	Testcase is facilitated through custom scripts like `sample_test_case.py` for model compatibility and performance, and shell scripts for error handling and resource management.
🛡️	Security	Security measures include disabling network communications, enforcing root access for certain scripts, and using non-root users for test execution, ensuring isolated and secure testing environments.

Repository Structure

.
├── README.md
├── build.sh
├── config.ini
├── env_config.py
├── prepare/
│   ├── Dockerfile.template
│   ├── parse.sh
│   ├── requirements_sample.txt
│   └── scan.sh
├── run.sh
├── setup_env.py
├── share/
│   ├── cleanup_irrelevant_io.sh
│   ├── explain_sys_calls.py
│   ├── analyze_result.sh
│   ├── testcase_tempate.py
│   └── scan_template.sh
├── test_cases/
│   └── sample_test_case.py
├── CONTRIBUTING.md
├── SECURITY.md
└── LICENSE.txt

Key Files

build.sh — Builds the Docker image and sets up required environment.
run.sh — Main entry, orchestrates scanning, logging, and reporting.
config.ini — Central configuration for environment and test behavior.
env_config.py — Handles dynamic import management via AST transformation.
setup_env.py — Generates Docker Compose configs and environment setup.
share/ — Contains utility scripts for system call filtering, analysis, and secure test execution.
test_cases/ — Example and user-provided test cases for specific ML models.

Modules

Root-level

File	Summary
`requirements.txt`	This file contains the needed libraries to run this scanner.
`setup_env.py`	Facilitates the setup of a secure Docker environment for machine learning model scanning by generating a Docker Compose configuration. It ensures appropriate resource allocation and manages dependencies between scanning and result parsing services, enhancing the security and efficiency of the scanning process within the repository's architecture.
`run.sh`	Orchestrates the dynamic scanning process by ensuring necessary prerequisites are met, setting up loopback filesystems, and managing Docker containers to analyze specified models for suspicious behavior. It logs findings and offers insights into potential issues, integrating seamlessly with the repository's scanning architecture.
`env_config.py`	Handles the dynamic collection and management of import statements within Python functions, using AST transformations to centralize import handling. Facilitates efficient and modular dynamic scanning and testing.
`build.sh`	Automates building the Docker image, generating a Dockerfile, constructing the Docker image as per the config, and cleaning up unnecessary data to support dynamic scans.

test_cases/

File	Summary
`sample_test_case.py`	Facilitates custom testing of ML models within a secure sandbox. Loads a specified model and executes inference tasks, ensuring compatibility with transformer models without relying on external datasets or file access.

share/

File	Summary
`scan_template.sh`	Automates error handling and cleanup during system testing, ensuring robust logging and resource management, and the detection of suspicious activities and potential storage attacks.
`testcase_tempate.py`	Facilitates dynamic testing of ML models by executing a given model checkpoint, clearing logs, ensuring log file security/integrity, and focusing on concise model evaluation in isolation.
`analyze_result.sh`	Analyzes scan logs, detects anomalies (e.g., storage attacks, excessive file opens, Python errors), and generates reports to ensure robust and reliable scanning.
`explain_sys_calls.py`	Parses and explains system call traces generated during scanning, providing structured representations and human-readable explanations for code diagnostics and troubleshooting.
`cleanup_irrelevant_io.sh`	Filters/processes system call logs to retain only relevant I/O operations, improving analysis efficiency for scan results.

prepare/

File	Summary
`scan.sh`	Provides secure execution of tests, requiring root access, disabling unwanted network communications, and using a non-root user for test execution. Sets up environment variables for scanning inside Docker.
`requirements_sample.txt`	Example Python package list used for dynamic scanning preparation, emphasizing PyTorch and related libraries for ML tasks.
`parse.sh`	Enforces root access, disables network, configures the scan environment, and launches result analysis scripts in the scan workflow.
`Dockerfile.template`	Template for Docker image build, integrates required tools/libraries and Conda environment setup, supporting the automated, sandboxed scan environment.

Getting Started

Prerequisites

Model to be scanned: Must be downloaded and placed under the share directory before the scan starts. No network connection is available in the container for security reasons.
Python: Requires Python 3.11 or higher. Set the correct Python executable path in config.ini.
Test Case: Write a simple function/script to load your model, and set its path in config.ini (CUSTOM_TEST_CASE_FILE).
User: The script runner needs passwordless sudo privilege.
Docker/Docker Compose: Both must be installed.
Operating System: Linux-based platforms recommended.

Installation

Clone the repository

git clone https://github.com/oracle-samples/ml-model-dynscan.git
cd ml-model-dynscan

Install dependencies (You may use a virtual environment such as conda or venv)
```
pip install -r prepare/requirements_sample.txt
```
Build Docker environment
```
bash build.sh
```

Usage

Prepare a test case, configure config.ini, and run a scan.

Model Download Place your ML model (e.g., Meta-Llama-3.1-8B) in the share directory.

Configure Edit config.ini to specify Python path, required dependencies, and custom test case script. Example:

[DEFAULT]
PYTHON_PATH = /usr/bin/python3.11

[TestCase]
CUSTOM_TEST_CASE_FILE = test_cases/sample_test_case.py
TIMEOUT_SECONDS = 60

[Dockerfile]
DOCKER_OS_BASE_IMAGE = container-registry.oracle.com/os/oraclelinux:9
PYTHON_REQUIREMENTS = prepare/requirements_sample.txt

[DockerEnv]
image_name = ml_sec
cpus = 8
memory = 72GB
nofile_limit = 4096
nproc_limit = 512
timezone = America/Los_Angeles
shared_folder = ./share

Make run.sh executable
```
chmod +x run.sh
```

Run the scanner

./run.sh <Model_to_be_scanned>
# e.g.
./run.sh Meta-Llama-3.1-8B

Configuration

Refer to the example config.ini above. Notable settings:

The PYTHON_PATH should point to a valid Python 3.11+ executable.
CUSTOM_TEST_CASE_FILE must point to the path of your test script.
PYTHON_REQUIREMENTS and Docker image settings can be adjusted for custom environments and package dependencies.
Resource limits (CPU, memory, open files, processes) may be tuned per your server resources and model needs.

Test Case Development

Your test case script must:

Define a single function with the model name as its only parameter.
Place all imports inside the function.
Be minimal/concise: avoid steps not needed for inference (to reduce false positives in monitoring).
Avoid network, file download, or resource-intensive operations.

Example:

def my_test_case(model_to_scan: str):
    from transformers import pipeline
    import torch
    try:
        task = "fill-mask"
        fmask = pipeline(
            task=task,
            model=model_to_scan,
            tokenizer=model_to_scan
        )
        output = fmask(
            'What should be inside a custom testcase [MASK]!'
        )
        return output

Result Analysis

Sample: No issue detected

ML_Security_Sandbox exited with code 0
ML_Security_Scan_Result_Parser exited with code 0

No system-affecting issue found.

Sample: Potential malicious behavior

Suspicious behaviors detected:
28    connect(4, {sa_family=AF_INET, sin_port=htons(8080), sin_addr=inet_addr("192.168.1.5")}, 16) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
    PID 28: The process is attempting to establish a network connection. It is trying to connect to IPv4 address 192.168.1.5 on port 8080. Exited with status: ERESTARTSYS (To be restarted if SA_RESTART is set).

Logs detailing any suspicious or rejected system calls will be produced and can be found in the share/ directory.

Documentation

CONTRIBUTING.md: Contribution guidelines and requirements
SECURITY.md: Responsible security disclosure policy

Oracle publishing standards require end-user product documentation at https://docs.oracle.com/ for official releases.

Usage Example

./run.sh Meta-Llama-3.1-8B
# Output and logs will indicate scan results

You must download your model (for example, from HuggingFace or a custom source) and place it in the share/ directory prior to scanning.

Get Support

Open an issue on GitHub for bugs or enhancement requests.
General Oracle developer questions:
For security vulnerabilities, see SECURITY.md.

Contributing

Contributions are welcome! Please review our contribution guidelines prior to submitting pull requests. Ensure any contributions conform to Oracle's policies and pass all applicable automated and manual checks.

Security

Please consult the security guide for Oracle’s responsible disclosure process and to report vulnerabilities.

License

Released under the Universal Permissive License v1.0 as shown at https://oss.oracle.com/licenses/upl/.

Please review LICENSE.txt for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Dynamic Scan for Machine Learning Models

Table of Contents

Overview

Features

Repository Structure

Key Files

Modules

Root-level

test_cases/

share/

prepare/

Getting Started

Prerequisites

Installation

Usage

Configuration

Test Case Development

Result Analysis

Documentation

Usage Example

Get Support

Contributing

Security

License

Oracle Resources

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.github		.github
prepare		prepare
share		share
test_cases		test_cases
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE.txt		LICENSE.txt
README.md		README.md
SECURITY.md		SECURITY.md
THIRD_PARTY_LICENSES.txt		THIRD_PARTY_LICENSES.txt
build.sh		build.sh
config.ini		config.ini
env_config.py		env_config.py
requirements.in		requirements.in
run.sh		run.sh
setup_env.py		setup_env.py

License

oracle-samples/ml-model-dynscan

Folders and files

Latest commit

History

Repository files navigation

Dynamic Scan for Machine Learning Models

Table of Contents

Overview

Features

Repository Structure

Key Files

Modules

Root-level

test_cases/

share/

prepare/

Getting Started

Prerequisites

Installation

Usage

Configuration

Test Case Development

Result Analysis

Documentation

Usage Example

Get Support

Contributing

Security

License

Oracle Resources

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages