Skip to content

A security-focused framework for dynamically scanning and testing machine learning models in an isolated Docker environment, detecting suspicious behaviors such as unauthorized network access or system modifications.

License

Notifications You must be signed in to change notification settings

oracle-samples/ml-model-dynscan

Dynamic Scan for Machine Learning Models

A security-focused framework for dynamically scanning and testing machine learning (ML) models inside a controlled, secure Docker environment. This project helps detect suspicious or malicious behaviors in ML modelsβ€”such as unauthorized network access, filesystem modification, or execution of dangerous system callsβ€”providing developers with tools to ensure integrity and safety of their models.


Table of Contents


Overview

Dynamic Scan for Machine Learning Models (version 1.0) is engineered to automate the setup, execution, and analysis of dynamic tests for ML models, emphasizing robust security and diagnostics. By leveraging isolated Docker sandboxes and PyTorch, this tool analyzes system call activity to flag indicators of compromise, like unauthorized network connections (e.g., unexpected listening/binding ports), or risky file access/modification (e.g., attempts to alter /etc/passwd).

The framework is highly modular, supporting customizable test cases and configurable scanning environments. Its primary audience is ML developers, researchers, and security engineers seeking automated, reproducible assurance of safe ML behavior.


Features

Feature Description
βš™οΈ Architecture The project uses a Docker-based architecture to facilitate dynamic scanning and testing of machine learning models, integrating various scripts for setup, execution, and result parsing. It emphasizes modularity and flexibility in testing workflows.
🧩 Modularity The repository is highly modular, with separate scripts handling environment setup, scanning, and testing processes, allowing for easy updates and maintenance.
πŸ§ͺ Testcase Testcase is facilitated through custom scripts like sample_test_case.py for model compatibility and performance, and shell scripts for error handling and resource management.
πŸ›‘οΈ Security Security measures include disabling network communications, enforcing root access for certain scripts, and using non-root users for test execution, ensuring isolated and secure testing environments.

Repository Structure

.
β”œβ”€β”€ README.md
β”œβ”€β”€ build.sh
β”œβ”€β”€ config.ini
β”œβ”€β”€ env_config.py
β”œβ”€β”€ prepare/
β”‚   β”œβ”€β”€ Dockerfile.template
β”‚   β”œβ”€β”€ parse.sh
β”‚   β”œβ”€β”€ requirements_sample.txt
β”‚   └── scan.sh
β”œβ”€β”€ run.sh
β”œβ”€β”€ setup_env.py
β”œβ”€β”€ share/
β”‚   β”œβ”€β”€ cleanup_irrelevant_io.sh
β”‚   β”œβ”€β”€ explain_sys_calls.py
β”‚   β”œβ”€β”€ analyze_result.sh
β”‚   β”œβ”€β”€ testcase_tempate.py
β”‚   └── scan_template.sh
β”œβ”€β”€ test_cases/
β”‚   └── sample_test_case.py
β”œβ”€β”€ CONTRIBUTING.md
β”œβ”€β”€ SECURITY.md
└── LICENSE.txt

Key Files

  • build.sh β€” Builds the Docker image and sets up required environment.
  • run.sh β€” Main entry, orchestrates scanning, logging, and reporting.
  • config.ini β€” Central configuration for environment and test behavior.
  • env_config.py β€” Handles dynamic import management via AST transformation.
  • setup_env.py β€” Generates Docker Compose configs and environment setup.
  • share/ β€” Contains utility scripts for system call filtering, analysis, and secure test execution.
  • test_cases/ β€” Example and user-provided test cases for specific ML models.

Modules

Root-level

File Summary
requirements.txt This file contains the needed libraries to run this scanner.
setup_env.py Facilitates the setup of a secure Docker environment for machine learning model scanning by generating a Docker Compose configuration. It ensures appropriate resource allocation and manages dependencies between scanning and result parsing services, enhancing the security and efficiency of the scanning process within the repository's architecture.
run.sh Orchestrates the dynamic scanning process by ensuring necessary prerequisites are met, setting up loopback filesystems, and managing Docker containers to analyze specified models for suspicious behavior. It logs findings and offers insights into potential issues, integrating seamlessly with the repository's scanning architecture.
env_config.py Handles the dynamic collection and management of import statements within Python functions, using AST transformations to centralize import handling. Facilitates efficient and modular dynamic scanning and testing.
build.sh Automates building the Docker image, generating a Dockerfile, constructing the Docker image as per the config, and cleaning up unnecessary data to support dynamic scans.

test_cases/

File Summary
sample_test_case.py Facilitates custom testing of ML models within a secure sandbox. Loads a specified model and executes inference tasks, ensuring compatibility with transformer models without relying on external datasets or file access.

share/

File Summary
scan_template.sh Automates error handling and cleanup during system testing, ensuring robust logging and resource management, and the detection of suspicious activities and potential storage attacks.
testcase_tempate.py Facilitates dynamic testing of ML models by executing a given model checkpoint, clearing logs, ensuring log file security/integrity, and focusing on concise model evaluation in isolation.
analyze_result.sh Analyzes scan logs, detects anomalies (e.g., storage attacks, excessive file opens, Python errors), and generates reports to ensure robust and reliable scanning.
explain_sys_calls.py Parses and explains system call traces generated during scanning, providing structured representations and human-readable explanations for code diagnostics and troubleshooting.
cleanup_irrelevant_io.sh Filters/processes system call logs to retain only relevant I/O operations, improving analysis efficiency for scan results.

prepare/

File Summary
scan.sh Provides secure execution of tests, requiring root access, disabling unwanted network communications, and using a non-root user for test execution. Sets up environment variables for scanning inside Docker.
requirements_sample.txt Example Python package list used for dynamic scanning preparation, emphasizing PyTorch and related libraries for ML tasks.
parse.sh Enforces root access, disables network, configures the scan environment, and launches result analysis scripts in the scan workflow.
Dockerfile.template Template for Docker image build, integrates required tools/libraries and Conda environment setup, supporting the automated, sandboxed scan environment.

Getting Started

Prerequisites

  • Model to be scanned: Must be downloaded and placed under the share directory before the scan starts. No network connection is available in the container for security reasons.
  • Python: Requires Python 3.11 or higher. Set the correct Python executable path in config.ini.
  • Test Case: Write a simple function/script to load your model, and set its path in config.ini (CUSTOM_TEST_CASE_FILE).
  • User: The script runner needs passwordless sudo privilege.
  • Docker/Docker Compose: Both must be installed.
  • Operating System: Linux-based platforms recommended.

Installation

  1. Clone the repository

    git clone https://github.com/oracle-samples/ml-model-dynscan.git
    cd ml-model-dynscan
  2. Install dependencies (You may use a virtual environment such as conda or venv)

    pip install -r prepare/requirements_sample.txt
  3. Build Docker environment

    bash build.sh

Usage

Prepare a test case, configure config.ini, and run a scan.

  1. Model Download Place your ML model (e.g., Meta-Llama-3.1-8B) in the share directory.

  2. Configure Edit config.ini to specify Python path, required dependencies, and custom test case script. Example:

    [DEFAULT]
    PYTHON_PATH = /usr/bin/python3.11
    
    [TestCase]
    CUSTOM_TEST_CASE_FILE = test_cases/sample_test_case.py
    TIMEOUT_SECONDS = 60
    
    [Dockerfile]
    DOCKER_OS_BASE_IMAGE = container-registry.oracle.com/os/oraclelinux:9
    PYTHON_REQUIREMENTS = prepare/requirements_sample.txt
    
    [DockerEnv]
    image_name = ml_sec
    cpus = 8
    memory = 72GB
    nofile_limit = 4096
    nproc_limit = 512
    timezone = America/Los_Angeles
    shared_folder = ./share
  3. Make run.sh executable

    chmod +x run.sh
  4. Run the scanner

    ./run.sh <Model_to_be_scanned>
    # e.g.
    ./run.sh Meta-Llama-3.1-8B

Configuration

Refer to the example config.ini above. Notable settings:

  • The PYTHON_PATH should point to a valid Python 3.11+ executable.
  • CUSTOM_TEST_CASE_FILE must point to the path of your test script.
  • PYTHON_REQUIREMENTS and Docker image settings can be adjusted for custom environments and package dependencies.
  • Resource limits (CPU, memory, open files, processes) may be tuned per your server resources and model needs.

Test Case Development

Your test case script must:

  • Define a single function with the model name as its only parameter.
  • Place all imports inside the function.
  • Be minimal/concise: avoid steps not needed for inference (to reduce false positives in monitoring).
  • Avoid network, file download, or resource-intensive operations.
  • Example:
    def my_test_case(model_to_scan: str):
        from transformers import pipeline
        import torch
        try:
            task = "fill-mask"
            fmask = pipeline(
                task=task,
                model=model_to_scan,
                tokenizer=model_to_scan
            )
            output = fmask(
                'What should be inside a custom testcase [MASK]!'
            )
            return output

Result Analysis

Sample: No issue detected

ML_Security_Sandbox exited with code 0
ML_Security_Scan_Result_Parser exited with code 0

No system-affecting issue found.

Sample: Potential malicious behavior

Suspicious behaviors detected:
28    connect(4, {sa_family=AF_INET, sin_port=htons(8080), sin_addr=inet_addr("192.168.1.5")}, 16) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
    PID 28: The process is attempting to establish a network connection. It is trying to connect to IPv4 address 192.168.1.5 on port 8080. Exited with status: ERESTARTSYS (To be restarted if SA_RESTART is set).

Logs detailing any suspicious or rejected system calls will be produced and can be found in the share/ directory.


Documentation

Oracle publishing standards require end-user product documentation at https://docs.oracle.com/ for official releases.


Usage Example

./run.sh Meta-Llama-3.1-8B
# Output and logs will indicate scan results

You must download your model (for example, from HuggingFace or a custom source) and place it in the share/ directory prior to scanning.


Get Support


Contributing

Contributions are welcome! Please review our contribution guidelines prior to submitting pull requests. Ensure any contributions conform to Oracle's policies and pass all applicable automated and manual checks.


Security

Please consult the security guide for Oracle’s responsible disclosure process and to report vulnerabilities.


License

Copyright (c) 2023, 2025 Oracle and/or its affiliates.

Released under the Universal Permissive License v1.0 as shown at https://oss.oracle.com/licenses/upl/.

Please review LICENSE.txt for details.


Oracle Resources

About

A security-focused framework for dynamically scanning and testing machine learning models in an isolated Docker environment, detecting suspicious behaviors such as unauthorized network access or system modifications.

Topics

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •