Today, the automatic word filter system used on the board has been upgraded. Please bear with me as I explain exactly what has changed and what you can expect going forward with these upgrades.
This upgrade to the filter system has a few major points. As members may know, the forum software we use here has a feature that automatically checks for disallowed terms and prevents messages containing those terms from being posted until the terms are removed. Most of the upgrades are improvements to this.
1) The List of Disallowed Terms
The list of disallowed terms has been emptied and reinstalled using a new source. It may block a few more terms than before, but if so this is mostly not intentional. The "new" source is mostly the old list with a few additions based on research I did a couple of years ago. The list is processed in a slightly different manner, however, so it is possible there may be a few false positives. If you think you've encountered such a situation or a word that was allowed before and is now blocked in error, please let me know by PM so I can investigate.
2) Automatic Bypass Detection
In the past, one of our most often violated Participation Guidelines was the one that prohibits intentionally bypassing the disallowed terms list by changing a few characters in a word. For example, if "grisamatic" was a disallowed term, someone might try to slip it through by changing the i to 1 or the s or a $ or 5. Catching these was a matter of spotting them after they happened or of my anticipating them. Neither was very good or fun. The latter was very difficult when considering all the possible combinations: e.g., gr1samatic, grisamat1c, gri$amatic, gr1$amatic, gri$amat1c, gr1$amat1c - and that's not including substitutions for letters beyond i or s. There are plenty more; each letter in the alphabet has at least 5 known substitutions. The letter M alone has 23!
The upgrades include a new method of allowing the software to detect most bypasses of this nature automatically and handling them just like the base disallowed term. Rather than trying to figure out all the possible combinations myself, I define a list of known substitutions and the software will build matches from that list for me. There are still a few it won't catch - either because I haven't seen the character substitutes or they can't be handled automatically with any degree of reliability, but it is an improvement from the site staff end of things.
This feature is slightly experimental. I've tested as well as I can on my own, but this is the first time it's going to be deployed in such an active environment under my watch. If you encounter any issues with the site that might be related to this, please let me know so I can investigate and, if necessary, disable this or revert to the old system.
It is also possible that some innocent words might be caught up by this feature. That's why the upgrades also include...
3) Whitelist
We now have a whitelist that can allow specific words to be posted even when they match one of the disallowed term filter rules. Example: you can now post the word "arsenal." Guess why it was blocked before? Yeah. It's a completely innocent word, but it was blocked as a side effect of blocking more problematic words. Currently the only whitelisted words are arsenal, arsenic, click*, and Dick (it's a proper name, don't abuse it!). I'll add more as they come up. Expect a Member Feedback topic in a few days regarding this.
4) I know there was a #4, but it's completely slipped my mind. Oh well, if I remember I'll post it later.
So that's it. Again, if you encounter any issues with these upgrades, such as words being blocked now that weren't before, please let me know by private message so I can look into it.
Thanks,
Thoul