Word Filter System Changes

Site and forum news announced on the front page.

Word Filter System Changes

Postby Thoul » Sun Jul 20, '14, 12:41 am

Today, the automatic word filter system used on the board has been upgraded. Please bear with me as I explain exactly what has changed and what you can expect going forward with these upgrades.

This upgrade to the filter system has a few major points. As members may know, the forum software we use here has a feature that automatically checks for disallowed terms and prevents messages containing those terms from being posted until the terms are removed. Most of the upgrades are improvements to this.

1) The List of Disallowed Terms

The list of disallowed terms has been emptied and reinstalled using a new source. It may block a few more terms than before, but if so this is mostly not intentional. The "new" source is mostly the old list with a few additions based on research I did a couple of years ago. The list is processed in a slightly different manner, however, so it is possible there may be a few false positives. If you think you've encountered such a situation or a word that was allowed before and is now blocked in error, please let me know by PM so I can investigate.

2) Automatic Bypass Detection

In the past, one of our most often violated Participation Guidelines was the one that prohibits intentionally bypassing the disallowed terms list by changing a few characters in a word. For example, if "grisamatic" was a disallowed term, someone might try to slip it through by changing the i to 1 or the s or a $ or 5. Catching these was a matter of spotting them after they happened or of my anticipating them. Neither was very good or fun. The latter was very difficult when considering all the possible combinations: e.g., gr1samatic, grisamat1c, gri$amatic, gr1$amatic, gri$amat1c, gr1$amat1c - and that's not including substitutions for letters beyond i or s. There are plenty more; each letter in the alphabet has at least 5 known substitutions. The letter M alone has 23!

The upgrades include a new method of allowing the software to detect most bypasses of this nature automatically and handling them just like the base disallowed term. Rather than trying to figure out all the possible combinations myself, I define a list of known substitutions and the software will build matches from that list for me. There are still a few it won't catch - either because I haven't seen the character substitutes or they can't be handled automatically with any degree of reliability, but it is an improvement from the site staff end of things.

This feature is slightly experimental. I've tested as well as I can on my own, but this is the first time it's going to be deployed in such an active environment under my watch. If you encounter any issues with the site that might be related to this, please let me know so I can investigate and, if necessary, disable this or revert to the old system.

It is also possible that some innocent words might be caught up by this feature. That's why the upgrades also include...

3) Whitelist

We now have a whitelist that can allow specific words to be posted even when they match one of the disallowed term filter rules. Example: you can now post the word "arsenal." Guess why it was blocked before? Yeah. It's a completely innocent word, but it was blocked as a side effect of blocking more problematic words. Currently the only whitelisted words are arsenal, arsenic, click*, and Dick (it's a proper name, don't abuse it!). I'll add more as they come up. Expect a Member Feedback topic in a few days regarding this.

4) I know there was a #4, but it's completely slipped my mind. Oh well, if I remember I'll post it later. ;)

So that's it. Again, if you encounter any issues with these upgrades, such as words being blocked now that weren't before, please let me know by private message so I can look into it.

Thanks,
Thoul
Last edited by Thoul on Sun Jul 20, '14, 12:41 am, edited 4 times in total.
User avatar
Thoul
Administrator
Administrator
 
Posts: 12923
Joined: March 2007
Location: USA
Achievements: 123
Gender: Male

Re: Word Filter System Changes

Postby Thoul » Sun Jul 20, '14, 5:27 am

I remembered #4! Well, not really. I came up with a new #4, but it'll do just fine.

4) Logging

In the past, when the blocked term filters were triggered on posting, you received an error message asking you to adjust your post with the triggering terms highlighted. This still happens, but now there's an additional bit of logging that happens at that time. The triggering terms are now entered into the site's logs so they can be reviewed at a later time. This lets me go in and see what terms triggered the filters and when. This exists to help root out false positives and prevent them.

5) Reporting False Positives

To help improve the whitelist by reporting false positives, please see the topic Whitelist - Reporting Word Filter False Positives in the Member Feedback forum. It's a sticky topic, so you can't miss it.


Also, side note: I fixed problems with the Font Color selector and More Smilies popup in Google Chrome. Same fixes probably apply to all browsers, but I didn't test that it's a generic "hey, let's force this to work all the time" kind of thing.
User avatar
Thoul
Administrator
Administrator
 
Posts: 12923
Joined: March 2007
Location: USA
Achievements: 123
Gender: Male


Return to Site Announcements

Who is online

Users browsing this forum: No registered users and 0 guests