Compare strings
BlitzMax Forums/BlitzMax Programming/Compare strings
| ||
I'm using BlitzMax's string compare function to compare two strings with each other to see how similar they are. It spits out a value to tell you how similar two strings are and a "0" would be a perfect match. However, it usually returns very odd results! For example, according to BlitzMax the string "talking super jeopardy!" is closer to "super mario bros 3" than "super mario bros. 3" is. I've made an example so you guys can test it out yourself. SuperStrict Framework brl.standardio Local str1:String = "super mario bros 3" Local str2:String = "super mario bros. 3" Local str3:String = "talking super jeopardy!" Print "Str1 & Str2 Similarity: " + str2.Compare(str1) Print "Str1 & Str3 Similarity: " + str3.Compare(str1) |
| ||
Compare returns :0 : if the two strings are identical <some number> : the difference in length of two strings, if the shorter one is identical up to its length. <some number> : the difference between the value of first character that is not identical. (for example, the difference between "t" and "s". (see blitz_string.c / bbStringCompare() for details) Not sure you can use Compare() in the way you think you can. ;-) |
| ||
I found your regex module Brucey, and it seems to return better results.SuperStrict Framework brl.standardio Import bah.regex Local str1:String = "super mario bros 3" Local str2:String = "super mario bros. 3" Local str3:String = "talking super jeopardy!" Print "Str1 & Str2 Similarity: " + StringCompare(str1, str2) Print "Str1 & Str3 Similarity: " + StringCompare(str1, str3) Function StringCompare:Int(str1:String, str2:String) Local regex:TRegEx = TRegEx.Create(str1) Return regex.Compare(str1) - regex.Compare(str2) EndFunction |
| ||
For completeness, here's a small program showing the three examples I mentioned above :SuperStrict Framework brl.standardio Local same1:String = "Hello World!" Local same2:String = "Hello World!" Print "* SAME *" Print "same1 = " + same1 Print "same2 = " + same2 Print "compared = " + same2.Compare(same1) Print "~n* LENGTH DIFFERENCE *" Local small:String = "Hello" Local big:String = "Hello World!" Print "small = " + small Print "big = " + big Print "big - small lengths = " + (big.length - small.length) Print "compared = " + big.Compare(small) Print "~n* CHAR DIFFERENCE *" Local diff1:String = "Hello" Local diff2:String = "World" Print "diff1 = " + diff1 Print "diff2 = " + diff2 Print "W - H = " + (Asc("W") - Asc("H")) Print "compared = " + diff2.compare(diff1) |
| ||
The compare function is defined in brl.mod/blitz.mod/blitz_string.cint bbStringCompare( BBString *x,BBString *y ){ int k,n,sz; sz=x->length<y->length ? x->length : y->length; for( k=0;k<sz;++k ) if( n=x->buf[k]-y->buf[k] ) return n; return x->length-y->length; } Which might translate to a blitzmax variant in the likes of: SuperStrict Framework brl.standardio Local str1:String = "super mario bros 3" Local str2:String = "super mario bros. 3" Local str3:String = "talking super jeopardy!" Print "Str1 & Str1 equal: " + stringCompare(str1, str1) +" str.Compare() = " + str1.Compare(str1) Print "Str1 & Str2 equal: " + stringCompare(str1, str2) +" str.Compare() = " + str1.Compare(str2) Print "Str1 & Str3 equal: " + stringCompare(str1, str3) +" str.Compare() = " + str1.Compare(str3) Function stringCompare:int( x:string, y:string ) local sz:int if x.length < y.length sz = x.length else sz = y.length endif For local k:int = 0 to sz 'i am not sure if I understood this portion correctly if x[k] - y[k] <> 0 then return x[k] - y[k] Next return x.length - y.length End Function So it seems to do this: it checks for similar characters (from character 0 to character min(lengthX, lengthY)). As soon as the charcodes differ, it will return the charcode difference. If there is no difference, it returns the difference in length. conclusion: it returns "0" for an equal string, and all other numbers mean: not equal. So this is NO similarity check - for this you might code your function in a way it checks for "equal characters" on the same position. BUT ... there is more advanced stuff to do there: - check equal characters (what happens to "super" versus "supper" - so you have to check for neighborhood characters - because else it checks "sup" on both and from then on each char is different") - check for equal length - check for similar sounding characters ("super mario" versus "super marin" versus "super mariu" versus "super marioo") At the end you have to "weight" each of the factors according to your needs (is the "sound" of a string important, a similar length, ...) EDIT: Seems Brucey was faster... maaan I needed more time to generate the sample code and validate that _I_ understood it correctly. bye Ron |
| ||
I was expecting something like a levenshtein distance, to tell me how "similar" strings are. But the regex module seems to function as I expected. (See code above) |
| ||
I've found something odd with the regex module though. If you convert the strings to Lower, they won't match anymore for some reason. SuperStrict Framework brl.standardio Import brl.retro Import bah.regex Local str1:String = "super mario bros." Local str2:String = "super mario bros." Print "Str1 & Str2 Similarity: " + StringCompare(Lower(str1), Lower(str2)) Function StringCompare:Int(str1:String, str2:String) Local regex:TRegEx = TRegEx.Create(str1) Return regex.Compare(str1) - regex.Compare(str2) EndFunctionThat would result in -48, even though they're exactly the same. But if you don't convert it to lowercase, you get the result 0. |
| ||
I've decided to use this: http://www.blitzbasic.com/codearcs/codearcs.php?code=2439 |
| ||
Never noticed the .compare() method in string! Interesting the example posted... I need something like this. |