Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collect spam data in a smart way #73

Open
schlessera opened this issue Jan 29, 2017 · 15 comments · May be fixed by #344
Open

Collect spam data in a smart way #73

schlessera opened this issue Jan 29, 2017 · 15 comments · May be fixed by #344

Comments

@schlessera
Copy link
Contributor

Anonymously collect non-detected spam comments.

What data to collect:

Comments that were not detected as spam and for which the site user manually clicked the "Spam" button.

When to collect:

When the site user first clicks this "Spam" button, we should ask the permission to anonymously send the comment data to a centralized database, in order to improve Antispam Bee.

How to collect:

At first, send to a HTTPS endpoint that stores everything in a simple database (probably NoSQL). We may need to evaluate a more scalable solution in the future. The collected data must not contain any mention of the sender or information about their user or system. It should contain as much information as possible about the actual spam content and where it originated.

@timse201
Copy link
Contributor

timse201 commented Jan 29, 2017

are we or the user allowed to do that?
are there some copyright/law issues?
and if we do it everytime there could be some false positives because someone dislikes an user and marks them as spam or its only because someone posted a comment several times by misklicking or caching issues etc.

but i agree
if we are allowed to (no law issues) then we should make it simpler to submit spam

@websupporter
Copy link
Contributor

websupporter commented Jan 29, 2017

I think, its a great idea. We should also include false positives.

I do not really see legal issues. In my understanding, if someone posts a comment, he gives the website owner the right to publish it. But honestly, I do not know how far this right can be stretched.

there could be some false positives

Yes, but right now, we have the same issue with our Google document. I think its worth a shot.

There should be an option in the settings like (send always, never send), maybe instead but as a addition to the question "do you want to send this specific comment?" to guarantee a quicker work flow.

@schlessera
Copy link
Contributor Author

An alternative would be to add a separate button besides the Spam & Trash buttons. Something like Send for Analysis or similar. If they just want to get rid of their uninteresting newsletters, they will probably not click on Send for Analysis for these...

@schlessera
Copy link
Contributor Author

And, yes, the original idea was to ask for permission once on clicking Spam and then have this be the new default.

@Zodiac1978
Copy link
Member

We could use the transformation action hooks comment_unapproved_to_spam and comment_approved_to_spam or we could provide a button / action link for this.

Possible problems: Privacy concerns (IP, Mail, Content, etc. from Comments) are submitted to us (or a Third-Party-Service like Google Forms).

This feature needs consent from the user:
https://developer.wordpress.org/plugins/wordpress-org/detailed-plugin-guidelines/#7-plugins-may-not-track-users-without-their-consent

@krafit
Copy link
Member

krafit commented Apr 12, 2020

In my opinion the best way to collect non-detected spam would be to add a link alongside “Mark as spam” — something like “report to Antispam Bee”. When a user clicks that link, they'll have to confirm that they are about to disclose the comment and its metadata to the ASB team for further investigation and to improve ASBs filters before its sent.

Bildschirmfoto 2020-04-12 um 12 44 32

@Zodiac1978
Copy link
Member

Zodiac1978 commented Apr 12, 2020

To get even more data, we could use the action hooks if someone marks a comment as spam and then ask for the data (like PoEdit does this):
Bildschirmfoto 2020-04-12 um 12 49 22

With an opportunity to opt-in to have this as the default.

@krafit
Copy link
Member

krafit commented Apr 12, 2020

I thought about an opt-in, but I didn't like the privacy implications of having this as a default for everyone after someone opted-in.
But we could handle the opt-in the way PoEdit does, by handling it on a per user basis. This way every user has the opportunity to give informed consent before sharing data (for the first time).

@Zodiac1978
Copy link
Member

Zodiac1978 commented Jul 21, 2020

@Zodiac1978
Copy link
Member

If someone wants to test this feature: Here is a working addon plugin:

<?php
/**
 * Plugin Name: Report Spam
 * Description: Addon for Antispam Bee to report spam.
 * Plugin URI:  https://torstenlandsiedel.de
 * Version:     1.0
 * Author:      Torsten Landsiedel
 * Author URI:  http://torstenlandsiedel.de
 * Licence:     GPL 2
 * License URI: http://opensource.org/licenses/GPL-2.0
 */

if ( ! defined( 'ABSPATH' ) ) {
	exit; // Exit if accessed directly.
}

/**
 * Add comment action link to report spam to ASB
 *
 * @param array   $actions Array of actions.
 * @param comment $comment Comment object.
 */
function add_report_comment_action_link( $actions, $comment ) {

	// URLencode comment data.
	$name    = rawurlencode( $comment->comment_author );
	$email   = rawurlencode( $comment->comment_author_email );
	$ip      = rawurlencode( $comment->comment_author_IP );
	$host    = rawurlencode( gethostbyaddr( $ip ) );
	$url     = rawurlencode( $comment->comment_author_url );
	$content = rawurlencode( $comment->comment_content );
	$agent   = rawurlencode( $comment->comment_agent );

	// Build action link.
	$target = ' target="_blank" ';
	$rel    = ' rel="noopener noreferrer" ';
	$href   = 'href="https://docs.google.com/forms/d/e/1FAIpQLSeQlKVZZYsF1qkKz7U78B2wy_6s6I7aNSdQc-DGpjeqWx70-A/viewform?c=0&w=1&entry.437446945=' . $name . '&entry.462884433=' . $ip . '&entry.1346967038=' . $host . '&entry.121560485=' . $email . '&entry.1210529682=' . $url . '&entry.1837399577=' . $content . '&entry.372858475=' . $agent . '" ';

	$action  = '';
	$action .= "<a $target $href $rel>";
	$action .= __( 'Report to Antispam Bee', 'antispam-bee' );
	$action .= '</a>';

	$actions['report_spam trash'] = $action;

	return $actions;
}
add_filter( 'comment_row_actions', 'add_report_comment_action_link', 10, 2 );

@Zodiac1978
Copy link
Member

Bildschirmfoto 2020-07-22 um 23 21 17

@Zodiac1978
Copy link
Member

Includes Comment User Agent as a new item (form is already extended for this) and it gets the host from the IP.

@Zodiac1978
Copy link
Member

there could be some false positives

We could add a checkbox at the end of the form "o This is a false positive and no spam" which could be checked before sending the form. Although I don't think many people would use it ...

Zodiac1978 added a commit that referenced this issue Aug 13, 2020
Add report spam action link to spam list (#73)
@Zodiac1978 Zodiac1978 linked a pull request Aug 13, 2020 that will close this issue
@Zodiac1978 Zodiac1978 linked a pull request Aug 13, 2020 that will close this issue
@stkjj
Copy link
Member

stkjj commented Feb 1, 2021

With regard to https://torstenlandsiedel.de/2021/01/31/antispam-bee-braucht-eure-juristische-hilfe/:

a) self hosted instead of google for sure (or at least a SaaS based within EU and proper data processing contract)
b) if consent is given by the submitter, everything is fine. Can the consent be withdrawn? Legally yes, factually no: once it's worked with, we of course could remove the data from the list of submittance, yet the evidence out of the case remains. At least as long as the submittance is taken care of in a timely manner ;-).
c) regarding the entity receiving: Indeed the biggest flaw as we are acting as a GbR which includes the chance that any random member of the GbR could be sued, fined, … This is the point where a discussion about changing the legal framework for the entity should take place. To be focused on the matter, I'ld suggest to seperate this from this issue. Happy to start this indeed internal discussion on our slack channel.

to get hands-on: The link "Report to Antispam Bee" should ideally give a modal with all neccessary information* e.g. which data is submitted, where it will be stored an for which amount of time, who will have access to it and how it will be purged as well as a note that the data is provided on a consensual base. At last each a confirm / decline button which than submits the data to a GDPR compliant server for further processing.

*let me draft something later this week

@stkjj
Copy link
Member

stkjj commented Feb 1, 2021

For further discussion a text for the modal (de/en):

Vielen Dank dass Du uns hilfst Antispam Bee besser zu machen.

Du bist gerade dabei den Kommentar von [Name des Kommentators] mit dem Inhalt [Inhalt des Kommentars] an uns zu melden, da Du es für nicht erkannten Spam hälst. Folgende Daten haben wir außerdem in dem Kommentar gefunden, die wir für die Auswertung und die Heuristik von Antispam Bee verwerten werden:

  • [IP Adresse]
  • [Host]
  • [UserAgent]
  • [eMail Adresse des Kommentator]
  • [Webseite des Kommentators]

Wir werten diese Daten [automatisiert|manuell] aus um damit die Spamerkennung von Antispam Bee zu verbessern. Sofern wir mehrfach gleichlautende Meldungen über einen Spamer bekommen, nutzen wir diese Daten auch um damit Blacklist Updater zu aktualisieren. Die Daten werden von uns in den nächsten x [Stunden|Tagen] verarbeitet und danach automatisch gelöscht. Für den Zeitraum der Verarbeitung werden die Daten ausschliesslich auf Servern mit Standort Deutschland gespeichert. Lediglich das Entwicklerteam von Antispam Bee hat darauf Zugriff. Um den Prozess schlank zu halten, bekommst Du von uns keine weitere Rückmeldung über die Verarbeitung, Speicherung oder Löschung, aber unser Dank wird Dir gewiss sein.

Wenn Du mit der Übermittlung dieser Daten einverstanden bist, kannst Du sie mit dem Button unten absenden.
Button: Verwerfen / Button: Absenden


Thank you for helping us to improve Antispam Bee.

You are about to report the comment by [commenter name] with the content [content of the comment] to us, because you believe it is unrecognized spam. We also found the following data in the comment, which we will exploit for Antispam Bee's evaluation and heuristics:

  • [IP address]
  • [Host]
  • [UserAgent]
  • [eMail address of the commenter]
  • [website of the commenter]

We evaluate this data [automated|manually] to improve the spam detection of Antispam Bee. If we receive multiple identical messages about a spammer, we also use this data to improve Blacklist Updater. The data will be processed by us in the next x [hours|days] and then automatically deleted. For the period of processing, the data is stored exclusively on servers located in Germany. Access to this data is only granted to our developer team. To keep the process lean, you will not receive any further feedback from us about the processing, storage or deletion, but pls receive our thanks for your help.

If you agree to submit this data, you can send it using the button below.
Button: Discard / Button: Submit

@Zodiac1978 Zodiac1978 modified the milestones: 2.10, Future Release Jun 24, 2021
@florianbrinkmann florianbrinkmann modified the milestones: Future Release, 2.11 Aug 12, 2021
@florianbrinkmann florianbrinkmann self-assigned this Aug 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants