Skip to content

Fusion de zds-antispam dans zds-site #6720

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 51 commits into
base: dev
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
51 commits
Select commit Hold shift + click to select a range
f487ef4
Ajoute les liens pour "Sujets créés", "Sujets suivis", "Messages post…
Jan 27, 2025
864b763
Merge branch 'zestedesavoir:dev' into dev
wassimaarab Feb 22, 2025
4a213e3
Fusion de zds-antispam dans zds-site
Jan-Ha-He Mar 10, 2025
2c9422b
Fusion de zds-antispam dans zds-site
Jan-Ha-He Mar 10, 2025
d33a543
Merge pull request #1 from zestedesavoir/dev
Jan-Ha-He Mar 25, 2025
008c371
Merge pull request #2 from wassimaarab/dev
Jan-Ha-He Mar 26, 2025
ba2d1b4
Ajout de la génération de profils spam pour tester le filtre.
Jan-Ha-He Apr 8, 2025
7216a9f
Merge remote-tracking branch 'origin/dev' into dev
Jan-Ha-He Apr 8, 2025
385b8f5
Suppression de la création d'utilisateurs en base de données
Apr 15, 2025
717cb10
Suppression de la création d'utilisateurs en base de données
Apr 15, 2025
611eabd
Reorganisation dans un nouvel package zds.antispam.
Jan-Ha-He Apr 21, 2025
da6705f
Ajout d'exigences, amélioration de la structure du code, ajout de la …
Jan-Ha-He Apr 21, 2025
a4c08b0
Ajout d'exigences, amélioration de la structure du code, ajout de la …
Jan-Ha-He Apr 21, 2025
a61bd29
Merge pull request #3 from Jan-Ha-He/restructuring
Jan-Ha-He Apr 21, 2025
c6aa6ab
Ajout de reentraîner le modèle que nécessaire.
Jan-Ha-He Apr 22, 2025
686b799
Merge pull request #4 from Jan-Ha-He/restructuring
Jan-Ha-He Apr 22, 2025
19ff103
Résolution de 14/22 des petits points (modifications des loggers, etc.)
Jan-Ha-He Apr 24, 2025
935ac60
Ajout de la commande antispam_train et déplacement des receivers vers…
Jan-Ha-He Apr 26, 2025
61d8797
Ajout d'un utilisateur antispam pour les notifications antispam.
Jan-Ha-He Apr 26, 2025
33f9d4d
Ajout du code pour créer et utiliser un nouvel dossier pour les fichi…
Jan-Ha-He Apr 26, 2025
5a088cd
Fusion de retrain() et train().
Jan-Ha-He Apr 26, 2025
fe96837
Correction pour le dernier commit : il manquait load_model dans spam_…
Jan-Ha-He Apr 26, 2025
e145169
Correction #2 pour le dernier commit : predict() manque aussi
Jan-Ha-He Apr 26, 2025
a4dcaa0
Correction #3 pour le dernier commit : train(), pas retrain()
Jan-Ha-He Apr 26, 2025
695b270
Fixation éventuelle pour le chemin d'accès au fichier.
Jan-Ha-He Apr 26, 2025
a8465f0
Fixation #2 éventuelle pour le chemin d'accès au fichier.
Jan-Ha-He Apr 26, 2025
dec6340
Ajout de la possibilité de tester d’autres attributs pour le spam, av…
Jan-Ha-He Apr 26, 2025
35febe6
Ajout de la possibilité d’avoir plusieurs modèles pour des attributs …
Jan-Ha-He Apr 27, 2025
78a545d
Correction de la génération des données de test et suppression des li…
Jan-Ha-He May 9, 2025
e0f644f
Correction de petits éléments + ajout du champ is_spam pour Profile e…
Jan-Ha-He May 9, 2025
46a467f
Changement de l’utilisation du scope dans send_alert.
Jan-Ha-He May 9, 2025
a75b3ae
Résolution de 14/22 des petits points (modifications des loggers, etc.)
Jan-Ha-He Apr 24, 2025
bda16df
Ajout de la commande antispam_train et déplacement des receivers vers…
Jan-Ha-He Apr 26, 2025
4a48286
Ajout d'un utilisateur antispam pour les notifications antispam.
Jan-Ha-He Apr 26, 2025
45f3756
Ajout du code pour créer et utiliser un nouvel dossier pour les fichi…
Jan-Ha-He Apr 26, 2025
a80346e
Fusion de retrain() et train().
Jan-Ha-He Apr 26, 2025
62377d8
Correction pour le dernier commit : il manquait load_model dans spam_…
Jan-Ha-He Apr 26, 2025
11cba74
Correction #2 pour le dernier commit : predict() manque aussi
Jan-Ha-He Apr 26, 2025
3ef3a4f
Correction #3 pour le dernier commit : train(), pas retrain()
Jan-Ha-He Apr 26, 2025
ea392e1
Fixation éventuelle pour le chemin d'accès au fichier.
Jan-Ha-He Apr 26, 2025
4bed8d7
Fixation #2 éventuelle pour le chemin d'accès au fichier.
Jan-Ha-He Apr 26, 2025
bbfbd81
Ajout de la possibilité de tester d’autres attributs pour le spam, av…
Jan-Ha-He Apr 26, 2025
e33158c
Ajout de la possibilité d’avoir plusieurs modèles pour des attributs …
Jan-Ha-He Apr 27, 2025
ff74cc2
Correction de la génération des données de test et suppression des li…
Jan-Ha-He May 9, 2025
cb2ee3d
Correction de petits éléments + ajout du champ is_spam pour Profile e…
Jan-Ha-He May 9, 2025
6d28d89
Changement de l’utilisation du scope dans send_alert.
Jan-Ha-He May 9, 2025
d604c97
Tester le modele spam_detector.py
May 10, 2025
8db663c
Tester le modele spam_manager.py
May 10, 2025
2977b63
Ajouter la documentation pour le modèle antispam
May 10, 2025
1805583
Merge remote-tracking branch 'origin/dev' into dev
Jan-Ha-He May 12, 2025
724785e
Petites améliorations : structure des posts spam et journalisation de…
Jan-Ha-He May 13, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ base.db
/zds/_version.py
/geodata
/errors/css
/antispam-data

/tutoriels-private-test
/tutoriels-public-test
Expand Down
84 changes: 84 additions & 0 deletions doc/source/back-end-code/antispam.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
.. _module-antispam:

=======================
Module Anti-Spam de ZDS
=======================

Le module ``zds.antispam`` fournit un système de détection de contenu indésirable.

Structure du module
===================
::

zds/antispam/
├── __init__.py
├── apps.py
├── management/
├── receivers.py # Signaux
├── spam_detector.py # Détection principale
├── spam_fields.py # Champs surveillés
├── spam_model_manager.py # Gestion des modèles d'entrainement
└── tests/ # Tests unitaires

Fonctionnalités principales
===========================
- Détection de spam dans différents types de contenu
- Entraînement de modèles spécifiques par type de contenu
- Système d'alertes automatisées

Composants clés
===============

SpamDetector (spam_detector.py)
-------------------------------
.. autoclass:: zds.antispam.spam_detector.SpamDetector
:members:
:undoc-members:

Principales méthodes:
- ``check_text(text, content_type)`` → bool
- ``send_alert(profile, field_name)`` → None

SpamModelManager (spam_model_manager.py)
----------------------------------------
.. autoclass:: zds.antispam.spam_model_manager.SpamModelManager
:members:
:undoc-members:

Fonctionnalités:
- Entraînement des modèles (``train(content_type)``)
- Sauvegarde/chargement des modèles

Utilisation typique
===================

Détection simple de la bibliographie:

.. code-block:: python

from zds.antispam.spam_detector import SpamDetector

detector = SpamDetector()
if detector.check_text(user_input, "PROFILE"):
detector.send_alert(self.clean_profile, "biography")

Entraînement d'un modèle:

.. code-block:: python

from zds.antispam.spam_model_manager import SpamModelManager

manager = SpamModelManager()
manager.train("PROFILE")

Intégration avec les signaux
============================
Le module écoute automatiquement les sauvegardes de modèles via ``receivers.py``.

Tests
=====
Pour lancer les tests:

.. code-block:: bash

python manage.py test zds.antispam.tests
2 changes: 2 additions & 0 deletions doc/source/back-end-code/arborescence-back.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,8 @@ On retrouve un dossier pour chaque module du site :
zds/
├── article/ # module des articles
│   └── ...
├── antispam/ # module d'antispam
│   └── ...
├── featured/ # module des mises en avant
│   └── ...
├── forum/ # module des forums
Expand Down
1 change: 1 addition & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ lxml==5.3.0
Pillow==10.4.0
pymemcache==4.0.0
requests==2.32.3
scikit-learn==1.6.1
typesense==0.21.0
ua-parser==0.18.0

Expand Down
Empty file added zds/antispam/__init__.py
Empty file.
8 changes: 8 additions & 0 deletions zds/antispam/apps.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
from django.apps import AppConfig


class AntispamConfig(AppConfig):
name = "zds.antispam"

def ready(self):
from . import receivers # noqa
Empty file.
Empty file.
40 changes: 40 additions & 0 deletions zds/antispam/management/commands/antispam_train.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
from django.core.management.base import BaseCommand

from zds.antispam.spam_fields import spam_fields
from zds.antispam.spam_model_manager import SpamModelManager


class Command(BaseCommand):
def __init__(self):
# Dynamically extract available models from spam_fields
available_models = {field["scope"] for field in spam_fields}
self.help = (
"Retrain the spam filter model(s) and save them to a file.\n"
f"The available models are: {', '.join(available_models)}.\n"
"Use the --model option to specify a model to train, or omit it to train all models."
)
super().__init__()

def add_arguments(self, parser):
# Dynamically extract available models from spam_fields
available_models = {field["scope"] for field in spam_fields}
parser.add_argument(
"--model",
type=str,
choices=available_models,
help=f"Specify the model to train ({', '.join(available_models)}). If omitted, all models will be trained.",
)

def handle(self, *args, **options):
model_manager = SpamModelManager()

if options["model"]:
self.stdout.write(f"Starting retraining of the {options['model']} spam filter model...")
model_manager.train(options["model"])
self.stdout.write(f"Retraining of the {options['model']} model completed successfully.")
else:
self.stdout.write("Starting retraining of all spam filter models...")
# Dynamically train all models based on spam_fields
for model in {field["scope"] for field in spam_fields}:
model_manager.train(model)
self.stdout.write("Retraining of all models completed successfully.")
20 changes: 20 additions & 0 deletions zds/antispam/receivers.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
from django.db.models.signals import post_save
from django.dispatch import receiver

from zds.antispam.spam_detector import SpamDetector
from zds.antispam.spam_fields import spam_fields


@receiver(post_save)
def analyze_record(sender, instance, **kwargs):
"""
Signal handler to detect spam in configured fields.
"""
for field_config in spam_fields:
if isinstance(instance, field_config["model"]):
detector = SpamDetector()
for field in field_config["fields"]:
field_value = getattr(instance, field, None)
if field_value and detector.check_text(field_value, field_config["scope"]):
detector.send_alert(instance, field)
break
88 changes: 88 additions & 0 deletions zds/antispam/spam_detector.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
import logging
from datetime import datetime

from django.contrib.auth.models import User
from django.utils.translation import gettext_lazy as _

from zds.antispam.spam_fields import spam_fields
from zds.antispam.spam_model_manager import SpamModelManager
from zds.utils.models import Alert


class SpamDetector:
def __init__(self):
self.logger = logging.getLogger(__name__)
self.model_manager = SpamModelManager()

def check_text(self, text, content_type):
"""
Check if a given text is spam for the specified content type.
"""
if not text:
self.logger.warning(f"Skipped spam check: Empty text for content type '{content_type}'.")
return False

try:
prediction = self.model_manager.predict(content_type, [text])[0]
if prediction == 0: # 0 indicates spam
self.logger.info(
f"✘ Spam detected for content type '{content_type}'. Text: '{text[:30]}...' (Length: {len(text)})"
)
return True
else:
self.logger.info(
f"✔️ No spam detected for content type '{content_type}'. Text: '{text[:30]}...' (Length: {len(text)})"
)
return False
except Exception as e:
self.logger.error(f"Error during spam detection for content type '{content_type}': {e}")
return False

def send_alert(self, instance, field_name):
"""
Create an alert for a spam-suspect field with detailed context.
"""
try:
# Find the spam field configuration for the instance
field_config = next(
(
config
for config in spam_fields
if isinstance(instance, config["model"]) and field_name in config["fields"]
),
None,
)
if not field_config:
self.logger.error(f"No spam field configuration found for {type(instance).__name__}.{field_name}")
return

# Extract scope and instance info
scope = field_config["scope"]
instance_info = field_config["get_instance_info"](instance)

# Map scope to the correct Alert model field
scope_to_alert_kwargs = {
"PROFILE": "profile",
"FORUM": "comment",
"CONTENT": "content",
}

if scope not in scope_to_alert_kwargs:
self.logger.error(f"Unsupported scope '{scope}' for alert creation.")
return

alert_kwargs = {
"author": User.objects.get(username="antispam"),
"scope": scope,
"text": _(f"Potential spam detected in {instance_info}, field '{field_name}'."),
"pubdate": datetime.now(),
scope_to_alert_kwargs[scope]: instance,
}

# Create the alert
Alert.objects.create(**alert_kwargs)
self.logger.info(f"Spam-Alert for {instance_info}, field '{field_name}' created.")
except User.DoesNotExist:
self.logger.error("The 'antispam' user does not exist. Please create this user.")
except Exception as e:
self.logger.error(f"Failed to create spam alert: {e}")
17 changes: 17 additions & 0 deletions zds/antispam/spam_fields.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
from zds.forum.models import Comment
from zds.member.models import Profile

spam_fields = [
{
"scope": "PROFILE",
"model": Profile,
"fields": ["biography", "sign"],
"get_instance_info": str,
},
{
"scope": "FORUM",
"model": Comment,
"fields": ["text"],
"get_instance_info": lambda instance: str(instance.author.username),
},
]
Loading