The plugin below allows to check the context of forum posts. This means that just a swear word would not trigger it, the context really matters. For example I f’ing love you guys would not be automatically flagged. However, the post below would (if the threshold is set to 75% toxicity) and the user would get a warning before posting it, with a suggestion to change the post.
It’s possible to customize the thresholds. My suggestion would be to check with the internal privacy team if this is something that the current FP privacy statement covers or not. And then my suggestion would be to not include DMs, only the public posted messages, since Google can already see/index those anyway.
This plugin would likely make the life of mods easier and make the posts on this forum less toxic. There may be a limit in the free use of this API, for my own forum it’s not an issue, because I don’t have many users. But maybe this forum will hit its free use limit (if there is any).
Okay, I see there is some objection. But do note that Google already crawls this forum. There is no extra privacy violation in that sense. The posts are already public and Google indexes it. Discourse has a special sitemap to make it easier for crawlers.
You can also test it out here https://perspectiveapi.com/ (click on the “Try it now” button). You can also configure the plugin to just notify mods, rather then also blocking someone from posting a toxic message until it’s revised. So in that sense toxic posts are caught faster by the mods.
There’s a difference between a site getting crawled (in some cases maliciously) and directly / intentionally offering up all your stuff to a company because it happens anyway.
I don’t subscribe to the defeatist point of view in this case, why make it easier for them?!
Besides, not every communication on this forum is publicly accessible, restricted categories, DMs, etc. are not.
even then, it would only be for curse words, I don’t care for censoring only some words. What would be nice would be discouraging unhelpful or hurtful input regardless of the caracters written. And I think we need a human to go over it for that.
On top of that, I agree that analyzing every post in not the same as crawling for indexing purposes.
I’m fine to not use the Perspective API, I just thought it would be a nice addition to keep the forum civil. It does a good job, and it puts some load off the mods. A win-win, but I guess it’s a no-go.
I just want to highlight that the privacy argument isn’t a strong one. On paper for the GDPR it is, because you have to show you are in control of your data and know what’s happening. That’s why I noted for the admin that it might be good to first check with the internal privacy team if the current privacy statement covers this use case.
As noted, Perspective is context aware. It’s able to determine how a swear word or regular words are used. And it’s good at it. In my previous post I left a link where you can try it yourself by typing some sentences.
Yeah, I just looked up one of the countless YouTube videos containing the word f*ck in its video ID by accident, in order to use it as a slightly hyperbolic example for one of the issues automatic word filters can bring with them, just to discover there is already a filter in place that blocks the video’s URL (which I would consider a false positive, since there is nothing actually offensive about it). xD
I personally am always a bit wary of automatically filtering words - especially if its a plain out blocking filter and not one flagging the post for review.
Imagine for example a perfectly random character sequence of size 11, which is the length of a YouTube video ID. Let’s for simplicity assume that only lowercase letters are allowed. In this case the probability of the sequence containing the word fùck is ((11−4)×(26^7))÷(26^11), which comes out to approximately 0.00153 %. So if over a period of time 10000 IDs are posted in a link, the probability of one or more containing the word fúck by accident is 1−((1−0,0000153)^10000), which already comes out to about 14.19%. And that is not to be confused with a calculation of the overall false-positive rate - it ONLY considers the word f*ck and rather short character sequences. (Also I hope I didn’t miscalculate here, but the point still stands.)
Note that I am not critizising the awesome efforts of the moderation team here, and I am well aware what triggered this post today. I am simply wary of the potential overreliance on automatic moderation mechanisms without the possibility of initiating a review.
It’s probably the ‘watch words’ builtin Discourse feature, which is not context aware and thus easily circumvented by using special characters and may also trigger false positives, unlike Perspective which you can’t fool that easily
I’m fine if people prefer humans to do the work. I personally feel like it’s a waste of the volunteer time of those people.
And by stopping someone from posting a toxic reply, you can also enforce the code of conduct instead of reacting after the breach of that code. You can customize the Perspective API as you want, also the trigger percentage of measured toxicity.
I don’t think people actually read what I’m writing Once more, I have stated that you can configure this to your liking. Such as only notifying mods, it for sure is possible to let humans do the final judgement. You can also do both. People may ignore the warning, a mod is notified, and then acts on it.