Skip to content

scikit-learn sensitive data leakage vulnerability

Moderate severity GitHub Reviewed Published Jun 6, 2024 to the GitHub Advisory Database • Updated Jun 17, 2024

Package

pip scikit-learn (pip)

Affected versions

< 1.5.0

Patched versions

1.5.0

Description

A sensitive data leakage vulnerability was identified in scikit-learn's TfidfVectorizer, specifically in versions up to and including 1.4.1.post1, which was fixed in version 1.5.0. The vulnerability arises from the unexpected storage of all tokens present in the training data within the stop_words_ attribute, rather than only storing the subset of tokens required for the TF-IDF technique to function. This behavior leads to the potential leakage of sensitive information, as the stop_words_ attribute could contain tokens that were meant to be discarded and not stored, such as passwords or keys. The impact of this vulnerability varies based on the nature of the data being processed by the vectorizer.

References

Published by the National Vulnerability Database Jun 6, 2024
Published to the GitHub Advisory Database Jun 6, 2024
Reviewed Jun 17, 2024
Last updated Jun 17, 2024

Severity

Moderate
5.3
/ 10

CVSS base metrics

Attack vector
Network
Attack complexity
High
Privileges required
Low
User interaction
None
Scope
Unchanged
Confidentiality
High
Integrity
None
Availability
None
CVSS:3.0/AV:N/AC:H/PR:L/UI:N/S:U/C:H/I:N/A:N

Weaknesses

CVE ID

CVE-2024-5206

GHSA ID

GHSA-jw8x-6495-233v
Loading Checking history
See something to contribute? Suggest improvements for this vulnerability.