Automatically detect and handle toxic language

UPPERCASE · September 3, 2023, 5:15am

The plugin below allows to check the context of forum posts. This means that just a swear word would not trigger it, the context really matters. For example I f’ing love you guys would not be automatically flagged. However, the post below would (if the threshold is set to 75% toxicity) and the user would get a warning before posting it, with a suggestion to change the post.

It’s possible to customize the thresholds. My suggestion would be to check with the internal privacy team if this is something that the current FP privacy statement covers or not. And then my suggestion would be to not include DMs, only the public posted messages, since Google can already see/index those anyway.

This plugin would likely make the life of mods easier and make the posts on this forum less toxic. There may be a limit in the free use of this API, for my own forum it’s not an issue, because I don’t have many users. But maybe this forum will hit its free use limit (if there is any).

The API can be used free of cost, here are the API Reference docs.

Lidwien · September 3, 2023, 10:57am

I have my reservations about this.
Yes, sometimes there’s toxic language on this forum.
But that’s why we have moderators.
Leaving this to a programm gives me the creeps in terms of privacy.

hirnsushi · September 3, 2023, 11:00am

Sending every post on this forum to Google seems like a very bad idea from a privacy standpoint.

I’d probably close my account or at least stop posting if this would get implemented.

UPPERCASE · September 3, 2023, 12:15pm

Okay, I see there is some objection. But do note that Google already crawls this forum. There is no extra privacy violation in that sense. The posts are already public and Google indexes it. Discourse has a special sitemap to make it easier for crawlers.

You can also test it out here https://perspectiveapi.com/ (click on the “Try it now” button). You can also configure the plugin to just notify mods, rather then also blocking someone from posting a toxic message until it’s revised. So in that sense toxic posts are caught faster by the mods.

Anyway, thanks for considering.

hirnsushi · September 3, 2023, 2:43pm

There’s a difference between a site getting crawled (in some cases maliciously) and directly / intentionally offering up all your stuff to a company because it happens anyway.
I don’t subscribe to the defeatist point of view in this case, why make it easier for them?!

Besides, not every communication on this forum is publicly accessible, restricted categories, DMs, etc. are not.

koumilak · September 3, 2023, 3:24pm

even then, it would only be for curse words, I don’t care for censoring only some words. What would be nice would be discouraging unhelpful or hurtful input regardless of the caracters written. And I think we need a human to go over it for that.
On top of that, I agree that analyzing every post in not the same as crawling for indexing purposes.

UPPERCASE · September 3, 2023, 3:51pm

As noted, you can configure this. And by default the plugin only checks public posts. So those are not restricted categories, DMs, etc. Those are what guests can also see.

Crawlers use the neatly timestamped XML sitemap to keep track of every post. Not using the Perspective API would really not make it harder for Google.

To put things in perspective (pun intended), here are my forums statistics. I have a small forum, not many posts or active users. But still my forum is data mined nonetheless.

I’m fine to not use the Perspective API, I just thought it would be a nice addition to keep the forum civil. It does a good job, and it puts some load off the mods. A win-win, but I guess it’s a no-go.

I just want to highlight that the privacy argument isn’t a strong one. On paper for the GDPR it is, because you have to show you are in control of your data and know what’s happening. That’s why I noted for the admin that it might be good to first check with the internal privacy team if the current privacy statement covers this use case.

As noted, Perspective is context aware. It’s able to determine how a swear word or regular words are used. And it’s good at it. In my previous post I left a link where you can try it yourself by typing some sentences.

AndreasChris · September 3, 2023, 4:45pm

Ah yes, this Video ID is very offensive. Especially in this outrageous context! /s

It appears the Scunthorpe Problem has not been solved after all.

Edit:
LOL. I would habe loved to link to this very informative youtube video, but it appears it is blocked already:

Good job discourse… You’re detecting accidental character sequences in Links as offensive content.

yvmuell · September 3, 2023, 5:02pm

If you tried to link here its not that we havent done nothing.

However, I would also vote against more Google then nesssary.

AndreasChris · September 3, 2023, 5:06pm

I’m sorry, my brain is currently incapable of parsing this sentence correctly. What do you mean by this?

yvmuell · September 3, 2023, 5:13pm

Someone wants to implement steps to sort out faulty language.

You tried to share a link (here in the forum?) containing faulty language and it was blocked.

I say: we have done something already to reduce faulty language as much as possible.

AndreasChris · September 3, 2023, 5:51pm

I see, thanks for the clarification.

Yeah, I just looked up one of the countless YouTube videos containing the word f*ck in its video ID by accident, in order to use it as a slightly hyperbolic example for one of the issues automatic word filters can bring with them, just to discover there is already a filter in place that blocks the video’s URL (which I would consider a false positive, since there is nothing actually offensive about it). xD

I personally am always a bit wary of automatically filtering words - especially if its a plain out blocking filter and not one flagging the post for review.

Imagine for example a perfectly random character sequence of size 11, which is the length of a YouTube video ID. Let’s for simplicity assume that only lowercase letters are allowed. In this case the probability of the sequence containing the word fùck is ((11−4)×(26^7))÷(26^11), which comes out to approximately 0.00153 %. So if over a period of time 10000 IDs are posted in a link, the probability of one or more containing the word fúck by accident is 1−((1−0,0000153)^10000), which already comes out to about 14.19%. And that is not to be confused with a calculation of the overall false-positive rate - it ONLY considers the word f*ck and rather short character sequences. (Also I hope I didn’t miscalculate here, but the point still stands.)

Note that I am not critizising the awesome efforts of the moderation team here, and I am well aware what triggered this post today. I am simply wary of the potential overreliance on automatic moderation mechanisms without the possibility of initiating a review.

UPPERCASE · September 3, 2023, 7:59pm

It’s probably the ‘watch words’ builtin Discourse feature, which is not context aware and thus easily circumvented by using special characters and may also trigger false positives, unlike Perspective which you can’t fool that easily

I’m fine if people prefer humans to do the work. I personally feel like it’s a waste of the volunteer time of those people.

And by stopping someone from posting a toxic reply, you can also enforce the code of conduct instead of reacting after the breach of that code. You can customize the Perspective API as you want, also the trigger percentage of measured toxicity.

UPPERCASE · September 3, 2023, 8:26pm

I don’t think people actually read what I’m writing Once more, I have stated that you can configure this to your liking. Such as only notifying mods, it for sure is possible to let humans do the final judgement. You can also do both. People may ignore the warning, a mod is notified, and then acts on it.

I hope everything is clear now

AndreasChris · September 3, 2023, 8:45pm

Sure. That’s not the status quo for the current filter though. Hence the current filter has mostly downsides, since it can break Links (where you cannot choose your words), but…