-
Notifications
You must be signed in to change notification settings - Fork 165
Fusion de zds-antispam dans zds-site #6720
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
Jan-Ha-He
wants to merge
51
commits into
zestedesavoir:dev
Choose a base branch
from
Jan-Ha-He:dev
base: dev
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
51 commits
Select commit
Hold shift + click to select a range
f487ef4
Ajoute les liens pour "Sujets créés", "Sujets suivis", "Messages post…
864b763
Merge branch 'zestedesavoir:dev' into dev
wassimaarab 4a213e3
Fusion de zds-antispam dans zds-site
Jan-Ha-He 2c9422b
Fusion de zds-antispam dans zds-site
Jan-Ha-He d33a543
Merge pull request #1 from zestedesavoir/dev
Jan-Ha-He 008c371
Merge pull request #2 from wassimaarab/dev
Jan-Ha-He ba2d1b4
Ajout de la génération de profils spam pour tester le filtre.
Jan-Ha-He 7216a9f
Merge remote-tracking branch 'origin/dev' into dev
Jan-Ha-He 385b8f5
Suppression de la création d'utilisateurs en base de données
717cb10
Suppression de la création d'utilisateurs en base de données
611eabd
Reorganisation dans un nouvel package zds.antispam.
Jan-Ha-He da6705f
Ajout d'exigences, amélioration de la structure du code, ajout de la …
Jan-Ha-He a4c08b0
Ajout d'exigences, amélioration de la structure du code, ajout de la …
Jan-Ha-He a61bd29
Merge pull request #3 from Jan-Ha-He/restructuring
Jan-Ha-He c6aa6ab
Ajout de reentraîner le modèle que nécessaire.
Jan-Ha-He 686b799
Merge pull request #4 from Jan-Ha-He/restructuring
Jan-Ha-He 19ff103
Résolution de 14/22 des petits points (modifications des loggers, etc.)
Jan-Ha-He 935ac60
Ajout de la commande antispam_train et déplacement des receivers vers…
Jan-Ha-He 61d8797
Ajout d'un utilisateur antispam pour les notifications antispam.
Jan-Ha-He 33f9d4d
Ajout du code pour créer et utiliser un nouvel dossier pour les fichi…
Jan-Ha-He 5a088cd
Fusion de retrain() et train().
Jan-Ha-He fe96837
Correction pour le dernier commit : il manquait load_model dans spam_…
Jan-Ha-He e145169
Correction #2 pour le dernier commit : predict() manque aussi
Jan-Ha-He a4dcaa0
Correction #3 pour le dernier commit : train(), pas retrain()
Jan-Ha-He 695b270
Fixation éventuelle pour le chemin d'accès au fichier.
Jan-Ha-He a8465f0
Fixation #2 éventuelle pour le chemin d'accès au fichier.
Jan-Ha-He dec6340
Ajout de la possibilité de tester d’autres attributs pour le spam, av…
Jan-Ha-He 35febe6
Ajout de la possibilité d’avoir plusieurs modèles pour des attributs …
Jan-Ha-He 78a545d
Correction de la génération des données de test et suppression des li…
Jan-Ha-He e0f644f
Correction de petits éléments + ajout du champ is_spam pour Profile e…
Jan-Ha-He 46a467f
Changement de l’utilisation du scope dans send_alert.
Jan-Ha-He a75b3ae
Résolution de 14/22 des petits points (modifications des loggers, etc.)
Jan-Ha-He bda16df
Ajout de la commande antispam_train et déplacement des receivers vers…
Jan-Ha-He 4a48286
Ajout d'un utilisateur antispam pour les notifications antispam.
Jan-Ha-He 45f3756
Ajout du code pour créer et utiliser un nouvel dossier pour les fichi…
Jan-Ha-He a80346e
Fusion de retrain() et train().
Jan-Ha-He 62377d8
Correction pour le dernier commit : il manquait load_model dans spam_…
Jan-Ha-He 11cba74
Correction #2 pour le dernier commit : predict() manque aussi
Jan-Ha-He 3ef3a4f
Correction #3 pour le dernier commit : train(), pas retrain()
Jan-Ha-He ea392e1
Fixation éventuelle pour le chemin d'accès au fichier.
Jan-Ha-He 4bed8d7
Fixation #2 éventuelle pour le chemin d'accès au fichier.
Jan-Ha-He bbfbd81
Ajout de la possibilité de tester d’autres attributs pour le spam, av…
Jan-Ha-He e33158c
Ajout de la possibilité d’avoir plusieurs modèles pour des attributs …
Jan-Ha-He ff74cc2
Correction de la génération des données de test et suppression des li…
Jan-Ha-He cb2ee3d
Correction de petits éléments + ajout du champ is_spam pour Profile e…
Jan-Ha-He 6d28d89
Changement de l’utilisation du scope dans send_alert.
Jan-Ha-He d604c97
Tester le modele spam_detector.py
8db663c
Tester le modele spam_manager.py
2977b63
Ajouter la documentation pour le modèle antispam
1805583
Merge remote-tracking branch 'origin/dev' into dev
Jan-Ha-He 724785e
Petites améliorations : structure des posts spam et journalisation de…
Jan-Ha-He File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,84 @@ | ||
.. _module-antispam: | ||
|
||
======================= | ||
Module Anti-Spam de ZDS | ||
======================= | ||
|
||
Le module ``zds.antispam`` fournit un système de détection de contenu indésirable. | ||
|
||
Structure du module | ||
=================== | ||
:: | ||
|
||
zds/antispam/ | ||
├── __init__.py | ||
├── apps.py | ||
├── management/ | ||
├── receivers.py # Signaux | ||
├── spam_detector.py # Détection principale | ||
├── spam_fields.py # Champs surveillés | ||
├── spam_model_manager.py # Gestion des modèles d'entrainement | ||
└── tests/ # Tests unitaires | ||
|
||
Fonctionnalités principales | ||
=========================== | ||
- Détection de spam dans différents types de contenu | ||
- Entraînement de modèles spécifiques par type de contenu | ||
- Système d'alertes automatisées | ||
|
||
Composants clés | ||
=============== | ||
|
||
SpamDetector (spam_detector.py) | ||
------------------------------- | ||
.. autoclass:: zds.antispam.spam_detector.SpamDetector | ||
:members: | ||
:undoc-members: | ||
|
||
Principales méthodes: | ||
- ``check_text(text, content_type)`` → bool | ||
- ``send_alert(profile, field_name)`` → None | ||
|
||
SpamModelManager (spam_model_manager.py) | ||
---------------------------------------- | ||
.. autoclass:: zds.antispam.spam_model_manager.SpamModelManager | ||
:members: | ||
:undoc-members: | ||
|
||
Fonctionnalités: | ||
- Entraînement des modèles (``train(content_type)``) | ||
- Sauvegarde/chargement des modèles | ||
|
||
Utilisation typique | ||
=================== | ||
|
||
Détection simple de la bibliographie: | ||
|
||
.. code-block:: python | ||
|
||
from zds.antispam.spam_detector import SpamDetector | ||
|
||
detector = SpamDetector() | ||
if detector.check_text(user_input, "PROFILE"): | ||
detector.send_alert(self.clean_profile, "biography") | ||
|
||
Entraînement d'un modèle: | ||
|
||
.. code-block:: python | ||
|
||
from zds.antispam.spam_model_manager import SpamModelManager | ||
|
||
manager = SpamModelManager() | ||
manager.train("PROFILE") | ||
|
||
Intégration avec les signaux | ||
============================ | ||
Le module écoute automatiquement les sauvegardes de modèles via ``receivers.py``. | ||
|
||
Tests | ||
===== | ||
Pour lancer les tests: | ||
|
||
.. code-block:: bash | ||
|
||
python manage.py test zds.antispam.tests |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Empty file.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
from django.apps import AppConfig | ||
|
||
|
||
class AntispamConfig(AppConfig): | ||
name = "zds.antispam" | ||
|
||
def ready(self): | ||
from . import receivers # noqa |
Empty file.
Empty file.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,40 @@ | ||
from django.core.management.base import BaseCommand | ||
|
||
from zds.antispam.spam_fields import spam_fields | ||
from zds.antispam.spam_model_manager import SpamModelManager | ||
|
||
|
||
class Command(BaseCommand): | ||
def __init__(self): | ||
# Dynamically extract available models from spam_fields | ||
available_models = {field["scope"] for field in spam_fields} | ||
self.help = ( | ||
"Retrain the spam filter model(s) and save them to a file.\n" | ||
f"The available models are: {', '.join(available_models)}.\n" | ||
"Use the --model option to specify a model to train, or omit it to train all models." | ||
) | ||
super().__init__() | ||
|
||
def add_arguments(self, parser): | ||
# Dynamically extract available models from spam_fields | ||
available_models = {field["scope"] for field in spam_fields} | ||
parser.add_argument( | ||
"--model", | ||
type=str, | ||
choices=available_models, | ||
help=f"Specify the model to train ({', '.join(available_models)}). If omitted, all models will be trained.", | ||
) | ||
|
||
def handle(self, *args, **options): | ||
model_manager = SpamModelManager() | ||
|
||
if options["model"]: | ||
self.stdout.write(f"Starting retraining of the {options['model']} spam filter model...") | ||
model_manager.train(options["model"]) | ||
self.stdout.write(f"Retraining of the {options['model']} model completed successfully.") | ||
else: | ||
self.stdout.write("Starting retraining of all spam filter models...") | ||
# Dynamically train all models based on spam_fields | ||
for model in {field["scope"] for field in spam_fields}: | ||
model_manager.train(model) | ||
self.stdout.write("Retraining of all models completed successfully.") |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
from django.db.models.signals import post_save | ||
from django.dispatch import receiver | ||
|
||
from zds.antispam.spam_detector import SpamDetector | ||
from zds.antispam.spam_fields import spam_fields | ||
|
||
|
||
@receiver(post_save) | ||
def analyze_record(sender, instance, **kwargs): | ||
""" | ||
Signal handler to detect spam in configured fields. | ||
""" | ||
for field_config in spam_fields: | ||
if isinstance(instance, field_config["model"]): | ||
detector = SpamDetector() | ||
for field in field_config["fields"]: | ||
field_value = getattr(instance, field, None) | ||
if field_value and detector.check_text(field_value, field_config["scope"]): | ||
detector.send_alert(instance, field) | ||
break |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,88 @@ | ||
import logging | ||
from datetime import datetime | ||
|
||
from django.contrib.auth.models import User | ||
from django.utils.translation import gettext_lazy as _ | ||
|
||
from zds.antispam.spam_fields import spam_fields | ||
from zds.antispam.spam_model_manager import SpamModelManager | ||
from zds.utils.models import Alert | ||
|
||
|
||
class SpamDetector: | ||
def __init__(self): | ||
self.logger = logging.getLogger(__name__) | ||
self.model_manager = SpamModelManager() | ||
|
||
def check_text(self, text, content_type): | ||
""" | ||
Check if a given text is spam for the specified content type. | ||
""" | ||
if not text: | ||
self.logger.warning(f"Skipped spam check: Empty text for content type '{content_type}'.") | ||
return False | ||
|
||
try: | ||
prediction = self.model_manager.predict(content_type, [text])[0] | ||
if prediction == 0: # 0 indicates spam | ||
self.logger.info( | ||
f"✘ Spam detected for content type '{content_type}'. Text: '{text[:30]}...' (Length: {len(text)})" | ||
) | ||
return True | ||
else: | ||
self.logger.info( | ||
f"✔️ No spam detected for content type '{content_type}'. Text: '{text[:30]}...' (Length: {len(text)})" | ||
) | ||
return False | ||
except Exception as e: | ||
self.logger.error(f"Error during spam detection for content type '{content_type}': {e}") | ||
return False | ||
|
||
def send_alert(self, instance, field_name): | ||
""" | ||
Create an alert for a spam-suspect field with detailed context. | ||
""" | ||
try: | ||
# Find the spam field configuration for the instance | ||
field_config = next( | ||
( | ||
config | ||
for config in spam_fields | ||
if isinstance(instance, config["model"]) and field_name in config["fields"] | ||
), | ||
None, | ||
) | ||
if not field_config: | ||
self.logger.error(f"No spam field configuration found for {type(instance).__name__}.{field_name}") | ||
return | ||
|
||
# Extract scope and instance info | ||
scope = field_config["scope"] | ||
instance_info = field_config["get_instance_info"](instance) | ||
|
||
# Map scope to the correct Alert model field | ||
scope_to_alert_kwargs = { | ||
"PROFILE": "profile", | ||
"FORUM": "comment", | ||
"CONTENT": "content", | ||
} | ||
|
||
if scope not in scope_to_alert_kwargs: | ||
self.logger.error(f"Unsupported scope '{scope}' for alert creation.") | ||
return | ||
|
||
alert_kwargs = { | ||
"author": User.objects.get(username="antispam"), | ||
"scope": scope, | ||
"text": _(f"Potential spam detected in {instance_info}, field '{field_name}'."), | ||
"pubdate": datetime.now(), | ||
scope_to_alert_kwargs[scope]: instance, | ||
} | ||
|
||
# Create the alert | ||
Alert.objects.create(**alert_kwargs) | ||
self.logger.info(f"Spam-Alert for {instance_info}, field '{field_name}' created.") | ||
except User.DoesNotExist: | ||
self.logger.error("The 'antispam' user does not exist. Please create this user.") | ||
except Exception as e: | ||
self.logger.error(f"Failed to create spam alert: {e}") |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
from zds.forum.models import Comment | ||
from zds.member.models import Profile | ||
|
||
spam_fields = [ | ||
{ | ||
"scope": "PROFILE", | ||
"model": Profile, | ||
"fields": ["biography", "sign"], | ||
"get_instance_info": str, | ||
}, | ||
{ | ||
"scope": "FORUM", | ||
"model": Comment, | ||
"fields": ["text"], | ||
"get_instance_info": lambda instance: str(instance.author.username), | ||
}, | ||
] |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.