This repository contains the implementation of my MSC Dissertation project on "Training AI for Information Security." The project utilizes machine learning algorithms for detecting and classifying cyber threats in network traffic, specifically employing transformer-based models for zero-shot classification tasks.
To install the project, follow these steps:
- Clone the repository:
git clone https://github.com/niting3c/AiPacketClassifier.git
- Change directory to the cloned repository:
cd AiPacketClassifier
- Install Conda if you haven't done so already. You can download it from here.
- Create a Conda environment using the provided
environment.yml
file:conda env create -f environment.yml
- Activate the Conda environment:
conda activate AiPacketClassifier
Note: This project has been tested on Python 3.9.5, and the required dependencies are listed in the environment.yml
file.
Here are detailed descriptions of the main files in this repository:
-
run.py
: This is the main script that initializes multiple zero-shot classification models from the Transformers library, processes input files with each model, and writes the results. It uses the following functions:load_models()
: Loads the transformer models specified in themodels.py
file and initializes the zero-shot classifiers.process_files(model_entry, directory)
: Processes pcap files in the givendirectory
using the specifiedmodel_entry
. This function callsanalyse_packet()
andsend_to_llm_model()
for each pcap file.
-
utils.py
: This script contains helper functions to handle file-related operations such as creating file paths. It provides the following functions:create_result_file_path(file_path, extension=".txt", output_dir="./output/", suffix="model")
: Generates a new file path for a result file in the output directory. Thefile_path
parameter specifies the original file path,extension
specifies the desired file extension for the new file,output_dir
specifies the directory for the new file (default is "./output/"), andsuffix
specifies the extra folder inside the directory for easier segregation (default is "model").get_file_path(root, file_name)
: Generates a file path by combining the providedroot
andfile_name
.
-
promptmaker.py
: This script includes functions that generate prompts for the classification tasks. These prompts help guide the AI in its analysis of packets and instruct it on how to report its findings. It provides the following function:generate_prompt(protocol, payload)
: Generates a formatted prompt with the specifiedprotocol
andpayload
to be used as input for the transformer models.
-
pcapoperations.py
: This script contains functions that handle pcap file operations, including reading packets from pcap files, analyzing packets using the zero-shot classification models, and writing the results to an output file. It provides the following functions:process_files(model_entry, directory)
: Processes pcap files in the givendirectory
using the specifiedmodel_entry
. This function callsanalyse_packet()
andsend_to_llm_model()
for each pcap file.analyse_packet(file_path, model_entry)
: Analyzes the packets in the pcap file located atfile_path
using the specifiedmodel_entry
. This function extracts the protocol and payload from each packet and prepares input objects for classification.extract_payload_protocol(packet)
: Extracts the payload and protocol from thepacket
.send_to_llm_model(model_entry, file_name)
: Sends the prepared input objects to the ZeroShot model for classification and stores the results in themodel_entry
.
-
llm_model.py
: This script includes functions that handle the interaction with the transformer models. It prepares the inputs for the classifier, generates the classifier's response, and processes the response.
-
Make sure you have installed all necessary packages and activated the Conda environment (see Installation).
-
The
run.py
script expects input files to be located in the./inputs
directory. Make sure you have populated this directory with your pcap files for processing. -
To start the program, simply run:
python run.py
-
The results will be written to the
./output
directory.
The project uses the following transformer models for zero-shot classification tasks:
- Deep Night Research's ZSC Text
- Facebook's BART Large MNLI
- Moritz Laurer's DeBERTa v3 base MNLI+FEVER+ANLI
- Sileod's DeBERTa v3 base tasksource NLI
Contributions are what make the open-source
community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.
Distributed under the MIT License. See LICENSE
for more information.
Nitin Gupta - [email protected]
Project Link: https://github.com/niting3c/AiPacketClassifier
For specific requests or inquiries, feel free to contact me. Happy coding!
In this updated README file, I have provided more detailed explanations for each section, including function details and their usages. If you need any further improvements or additional information, please let me know!