Releases: vulnerability-lookup/VulnTrain
Releases · vulnerability-lookup/VulnTrain
Release 2.0.0
News
- Dataset generation: Introduced a new script to build datasets of structured vulnerabilities enriched with CWE identifiers and corresponding patches.
Each entry now includes the Git commit message and the full diff (Base64-encoded).
#10 by @3LS3-1F - Model generation: Added a new trainer for predicting CWE classifications from vulnerability descriptions and associated patches (commit messages).
#10 by @3LS3-1F
Related resources shared via Hugging Face: https://huggingface.co/collections/CIRCL/vlai-for-cwe-guessing-68bab22e3d71b513146d13b3
Changes
- Improved documentation and reorganized modules for better clarity and maintainability.
- Updated dependencies to their latest stable versions.
Release 1.5.0
Release 1.4.0
This version adds support for creating new AI-ready datasets based on the China National Vulnerability Database (CNVD). It also introduces a new training module designed to classify vulnerabilities using text classification models tailored for CNVD data. By default hfl/chinese-macbert-base
is used but it is possible to use hfl/chinese-bert-wwm-ext
or google-bert/bert-base-chinese
.
By @3LS3-1F
Release 1.3.1
Updated dependencies and fixed issues due to changes in transformers.
Release 1.3.0
Changes
- Updated dependencies.
Release 1.2.0
Changes
- Dataset generation: CVSS are now extracted from GitHub and PySec security advisories.
- Dataset generation: CVSS, CPE, title and description (summary) are now extracted from CSAF document.
Release 1.1.0
News
- Trainers: Support of roberta-base for the text classifier with improved
settings for TrainingArguments. - Validators: Validator for severity classification.
Release 1.0.0
News
- Introduced a new trainer to automatically classify vulnerabilities based on their descriptions,
even when CVSS scores are unavailable. - Added CVSS parsing to the dataset generation script.
Changes
- Refactored the project structure for better organization.
- Improved CPE parsing.
- Enhanced the dataset generation script.
- Optimized the trainer for text generation on vulnerability descriptions.
- Improved command-line argument parsing.
- Improved the process of pushing the tokenizer and trainer to Hugging Face.
Release 0.5.1
Fixed configuration module name.
Release 0.5.0
Added support of configuration file.