Are there any profanity filter dlls ?

Community Forums/General Help/Are there any profanity filter dlls ?

RifRaf(Posted 2009) [#1]
I did a quick google and got alot of net based ones.. but im looking for a simple dll that i can send txt string to and get txt back with the bad words replaced within the string. It doesnt have to be comprehensive in that it will find every phonetic possibility of a bad word, but cover the obvious and the common.

i could make my own lookup table and use instr() but I thought that would be far slower than a professional DLL


D4NM4N(Posted 2009) [#2]
Probably quite easy and very quick to write your own module.

Here is a list of known words.
http://www.noswearing.com/dictionary
You could spoon that lot into a simple XML/CSV 'lookup & return' dataset.

Problem is with using phoenetics is a word like "Fuk" is not a swearword. It could be someones name from east asia, or something else. This might will cause even more offence/irritation to people than simply allowing it. Also you need to consider the whitespace. eg. a "Shitake mushroom"? which again is a valid name. There are hundreds of embedded "swearwords" in words that are not swearwords at all.
You could also easily detect things like "F¤%!" too because if F/f/S/s/C/c is followed by a combination of 3 non alphanum chars then do something with it but then do you really want to go -that- far?
Then there is the other problem, what do you replace it with? To remove them would make the text unreadable, to replace them would make people look & feel stupid (possibly a good thing?? eg."duck you grassmole!" LOL) Ot the 3rd option of *ing it out completely which would still keep a readability but everyone still knows what it says anyway.

Its a difficult line to draw. I would just go for replacing separated actual swearwords and censor them (#%¤"!). If you start going into phoenetics and replacement, you are at risk of alienating users.


SculptureOfSoul(Posted 2009) [#3]
Otherwise, if you want something that would work - maybe not the fastest or most memory efficient, but I've got code to parse text into a list of individual words. Combine that with a hashtable (i've got one of those written for bmax, too) and you could easily have custom substitutions for each word.


RifRaf(Posted 2009) [#4]
but everyone still knows what it says anyway

well , thats not my perception.

My son plays games , in fact hes pretty good at most of the ones I let him play, and he wouldnt know what **** meant if he saw "**** you !" Because hes not been around that kind of behaviour. Hes 6 years old, my daughter is 9 and she wouldnt know what word should be there either.

if someone already knows how to fill in the blanks then it wont matter to them if its a bunch of asterisks. My concern is not teaching young kids via other players in Tiny Tanks, if they already know it then its someone elses fault.

If phonetics get through, and I know some will if people try. Its honestly too bad, but at least I can say to myself I tried. I think kids will like the game im making, and I want to do what I can to keep inappropriate language out.

Accidentally hitting a foreign name with the filter, is really too bad as well, but I hope it will not offend worse. I hope most parents would appreciate the reason for a filter and understand that if they live on the other side of the world, and the game was made by some guy in the USA , that may happen.

I feel a responsibility, and its a priority before I complete the game.
I am still hoping to find a dll so that i know its done right, but if not ill take a crack at it.


Digital Anime(Posted 2009) [#5]
Probably quite easy to write your own.


Agree with D4NM4N on this

Don't think a dll will be any faster. Also if you make a word filter yourself you could easily update it for smiley's as well (just a thought) by replacing ":)" by a smiley image.

Also if you make it yourself you can easily update it by making it learn new words. People will allways find other ways by using numbers and symbols to swear or even use text/symbols in a bad but creative way like for example "( . )( . )".

One idea I also like is when bad text is replaced by other text like for example "F*** off!" is replaced by the text "Get lost!". It still has the same meaning. Only downside is that this will take more time to create.


D4NM4N(Posted 2009) [#6]
Hes 6 years old, my daughter is 9 and she wouldnt know what word should be there either.
Wow :O (seriously?) Thats actually good/refreshing to hear.

Where do you live then? Must be a utopia compared to where i grew up. When i was at primary school i think every kid from the first grade 5y/o to last grade 10y/o had "sat-on-the-mat-with-hands-on-head" for using the F-word or C-word more than a few times. That was back in the early 80s. I cringe to think what it is like now in schools. But if judging by the language of the 9 year olds on the bus going to school its much worse. (as now they know 100% what they all mean too).
-Of course we all censored ourselves at home. (because the language was "not suitable for parents" :D)


RifRaf(Posted 2009) [#7]
I had a long winded reply wrote, but it went too far off topic
Im sorry that suprises you though.

now back on the filter topic.
I think ill see what I can come up with , after searching and comming up with no external solutionls.

Thanks guys.


D4NM4N(Posted 2009) [#8]
I had a long winded reply wrote, but it went too far off topic
Im sorry that suprises you though.
It does surprise me (in a good way :)

I was thinking, if you know C++ basics (eg. how to configure the IDE :) then you could create a Dynamic Lib using my "b3d-style file and string commands for c++" (i have a small collection of blitz-imitated string and file commands in the c++ section @syntaxbomb) if they help you at all (- they seem to work ok and as expected.)

Although if its just for you (rather than a product you want to sell) i dont see the point in using a DLL over an include.


SculptureOfSoul(Posted 2009) [#9]
What approach are you thinking of going with?

Whatever you do, don't search the string for all of the words separately! :p


RifRaf(Posted 2009) [#10]
Right.

D4NM4N, i am on the fence, depends on how my current contract pans out. its been in a holding pattern for about a month now. I would love to give it out for free, second to that I may add advertisements.. and if I have to sell it to help get some food I will. But I really hope to just put it on download.com or somthing.


Yeah, I know better than to search for every word via string parsing.. but


I was thinking of storing each bad word in a type, but as ascii byte codes. Then breaking the users input into words, then each word down to its asci codes. Each word would be searched though the datbase, first by Length of chars, then by starting asci code, then finally checking the remaining asci codes to get a match. This would search the entire database for each word, but not using any string functions. Do you think that would work out?



Havent written anything yet, thats just a starting point I guess.


SculptureOfSoul(Posted 2009) [#11]
I think hashing would work faster, to be honest. It's kind of doing the same thing.. converting a word to a random sequence of characters, but it should be faster. You just create a hash table entry for every swear word.. then you parse the input into individual words and hash them and see if there is something there,.. if there is an entry you either replace it with asterisks or you could also store a substitute word in the hash table that would be returned and could be used in place of the original.


SculptureOfSoul(Posted 2009) [#12]
I can explain it better if you'd like later. I optimized the hash table code I wrote to be extremely fast - I was getting results faster than variable lookups in Lua (which use tables which are basically hash tables under the hood).


D4NM4N(Posted 2009) [#13]
If you mean ascii codes as numbers then they would still need to be strings (as you would end up with astronomical numbers for some words), especially if using 16bit chars rather than 8.
So in effect you would be either doubling or quadrupling the number of bytes to compare.
-if thats what you meant, perhaps i misunderstood.

The way i would do it myself is to wrap whatever takes the string typed by the user imediately, rather than the streams or when printed on the screen. (chop it off at the head so to speak - that way only 1 client needs to work harder)

Then i would get that wrap to check against the few major profanities (there really is not that many really that really matter) so shouldnt be slow at all.
Although if the list gets -really- big (eg, more than 30-40 words as a ballpark) then perhaps another way is needed, like some sort of chopping.
-Perhaps check the string to see if it has 4 char length words, make a string out of those and then do an instr against your list.
(Depends how long the typed message length is i guess :/)

Also i reccomend using arrays to store the words rather than as a type field as they are more efficient than types for linear searches in b3d and you only have 2 things to reference.


SculptureOfSoul(Posted 2009) [#14]
Oh, this is in B3D. Hmm, I guess my code wouldn't work then.

If you don't want to rely on the pre-screening D4NM4N is talking about (that'd be fastest, for sure, as the server isn't involved, but also the easiest to circumvent) I would look into using a hash table though. They are basically associative arrays, or rather, arrays where you can use a string as an index (but do not have to be sized like arrays and can grow as needed).

In pseudo-code it would be something like this

If
Swearwords[word-to-check] = true return "****"


D4NM4N(Posted 2009) [#15]
Thats one way, (but does b3d have built in 'intelligent' hash tables?)

Im not sure if splitting:
"The quick brown fox jumped over the lazy dog"
Into 9 strings and comparing each against say 30 strings would be faster(or slower?) than comparing the whole thing against said 30 strings.

I feel an experiment comming on :/

By the way ragtag, i asumed this is for b3d?


FlameDuck(Posted 2009) [#16]
Problem is with using phoenetics is a word like "Fuk" is not a swearword.
More importantly whether something is naughty is often a matter of context. To paraphrase George Carlin: You can prick your finger all you want, just don't finger your prick.

The reason there isn't a decent profanity filter as a DLL is because practically every language has some support for Regular Expressions. The probable reason there are ones for .Net is that the lions share of .Net programmers are oblivious to Regular Expressions.

Here is Pseudocode:
foreach (regex):
	regex.replace(text, "(Expletive deleted)")
As you can probably tell, wrapping that in a dll seems like a waste of time.

I hope most parents would appreciate the reason for a filter and understand that if they live on the other side of the world, and the game was made by some guy in the USA , that may happen.
Really? In "Man on Fire" the original name for Dakota Fanning's character "Pita" was "Pinta", which while being fine in the book, set in Italy, made her a whore in Mexico.

If you want to avoid profane language, don't allow chat, do what Disney did with Toon Town Online. Only allow emotes to other people, and only allow chat between people you know in private.


D4NM4N(Posted 2009) [#17]
I dont like regex, its scary :D

But if blitz had regex that would certainly be the way.


_PJ_(Posted 2009) [#18]
I can honestly see no point in profanity filters.

Any media where "colourful" language may be encountered is going to have some clear age-rating.*
Where older people still may feel offended by the use of such words, they can simply ignore it, ask the other to stop, or, if it is code-based, then an alternative option may be suitable: (such as the 'blood/gore') settings on some games
If (FilterProfanity)
Return "Oh No! Get out of there!"
Else
Return "Oh Shit! Get out of there!"
End If


Otherwise, filters themselves are a complete waste of time. They can easily be 'sidestepped', (especially where typical internet-speak' is so abbreviated and phonetic), and more often than not, the filter just acts as an obstacle to communication.

Why can I discuss how much I like "Sega" with Italians, when I can't mention that I "lived in Scunthorpe until I moved to Essex"?
"Dick Whittington's pussy cat caught a cockerel with an arsenal of claws in the prickly bush." Sometimes filters just go too far whilst ignoring or incapable of dealing with "onanist", "zoophile", "vulva", "excrement" and much more.

Sre, you could allow users to create and edit their own profanity list. If you REALLY wanted one, that's probably the best way to go about it. Otherwise, the amount of checking and details would likely end up not only a real hassle for coding, but also, slow things down (it may not be an extremely large slowdown, but still, it's unnecessary in my opinion).

Okay, enough ranting. I appreciate the topic was regarding filter DLL's or similar availability, rather than just whether to use one or not, but I wanted to highlight why I personally, don't think it's worth the time or effort.


*internet play typically demands a minimum age of 13 years.


___________________________________________________________________


EDIT:
Apologies, Rifraf, I have just read more detail of your original post.
Of course, being a parent and taking the responsibility for your childrens' sake is more worthwhile than I indicated with the above. Whether a practical and comprehensive solution truly exists, though, I doubt it, but as you said yourself, at least you can say you tried. :)


RifRaf(Posted 2009) [#19]
George Carlin, I think he understood people more than most.

I understand all your points, but I still want somthing in here, even if I only block the top 50 to 100 bad english words. Ill put a language notice on it as well, hmm Webkins only allows emotes as well.. I cant go that far because im sure players will want to discuss what they are doing in the game.


I am using sprite candy for all input, so ill probably toss the filter on its inputbox update, this would cover name entry as well as in game chat without bogging the server down. Im only slightly concerned with someone circumventing that locally. Hopefullt TT wont draw the L4D crowd , however ive been in L4D and heard some kids that sound like they are 10 :)


D4NM4N(Posted 2009) [#20]
Is there even 50 to 100 bad english words? (99% of them on that list use the same few words).

I was thinking 5-10 tops for the "really bad", 10-15 "biggotist" ones, maybe 10 more "borderline" and the rest are no longer considered "swearing" get into childrens books like harry potter anyway.


Ked(Posted 2009) [#21]
Here is a list of known words.
http://www.noswearing.com/dictionary

Firefox blocked that site on my end. http://www.noswearing.com redirected me to a weird url that supposedly "steals private information" and "installs software to your computer."

Just a heads-up.


skidracer(Posted 2009) [#22]
Possibly irrelevant, but I would consider any game for kids that allowed them to play with strangers a really BAD idea.

If that is indeed the scope, then replacing chat with some kind of send preprogrammed pictogram /phrase interface may be a better solution.


RifRaf(Posted 2009) [#23]
Not irrelevent at all, its a great point.

Once again I had a 5 paragraph reply, but while writting I got an idea from your post.

Perhaps its best to have two verions of the Game, one for kids and one for young adults. The kids version would block all chat messages and allow only emotes while the yound adult version would allow filtered chat ?

Then the person could get the one that is best appropriate for them or their kids.


skidracer(Posted 2009) [#24]
Have you looked at steam?

If your LAN mode is working good, jumping on their network may be a suitable next step? I've helped people establish rewards with a simple Blitz3D wrapper and the packet stuff for multiplayer looked pretty easy for their API.

If you're thinking of hosting even a lobby I suspect you'll be wanting that filter for user names even if you drop chat.


Matty(Posted 2009) [#25]
Going back a few posts - I don't think performance is really an issue here is it? I'm guessing you'll check the string when the user hits 'send' and I doubt anyone would noticed a few extra milliseconds for a single chat line parsed one way or another. It's not like you'll need to do the check more than once every few hundred frames given it takes a couple of seconds just to type the message, and messages won't need to be sent every frame either.


RifRaf(Posted 2009) [#26]
I havent looked at it yet, but im interested in that.

I was talking with phil, the author of Tank Universal and he mentioned your wrapper as well. I would love to get my hands on it if possible.


_PJ_(Posted 2009) [#27]
Possibly irrelevant, but I would consider any game for kids that allowed them to play with strangers a really BAD idea.

If that is indeed the scope, then replacing chat with some kind of send preprogrammed pictogram /phrase interface may be a better solution.

Sorry, bit offtopic, but I think skidracer's hit on a real issue here. And it's a problem with the interrnet at large.

Anyway, another solution to this particular issue, which would perhaps be less involved than coding two completely differing chat interfaces, may be ensure that that anyone joining the server must be invited (by IP address provided they are running the game), or their username 'verified'/'okayed' by the current players/host?

It depends I imagine, a lot on how the net-play is set up and how many players at a time (maximum in theory) your game is designed around.