Add dictionary to your word games

Monkey Forums/Monkey Code/Add dictionary to your word games

Arabia(Posted 2013) [#1]
Here is what I've come up with for being able to spell check words. Seems fast enough on HTML, haven't tried it on my tablet yet.

For this example, I'm using the same dictionary that Words with Friends uses (known as the Enhanced North American Benchmark Lexicon (ENABLE) dictionary) which can be found here: https://code.google.com/p/dotnetperls-controls/downloads/detail?name=enable1.txt

The dictionary files is 1,916,146 characters in length, and it loads the file, converts it to individual words and saves them in a Map on my crappy laptop in less than 5 seconds.

Let me know if I'm doing something the wrong way/slower way.

EDIT: Updated 9 Dec 13. Showing changes made by Ziggy. This code now executes in approximately 21 seconds on a Galaxy Tab 3, original version took approximately 40 seconds.

Strict

Import mojo 

Class dictionary Extends App
	
	Field text:String
	Field myMap:= New StringMap<String>

	Method OnCreate:Int()
		SetUpdateRate(60)
		text = LoadString("dictionary.txt")		' Load the complete  dictionary into a string
		
		Local newWord:= New StringStack
		
		' Separate the complete dictionary into individual words (look out for ASCII values 
		' of 10 & 13 as these a CR & LF.  Each word is then added to "myMap"
		
		For Local i:Int = 0 Until text.Length
			If text[i] <> 10 And text[i] <> 13 Then
				newWord.Push(String.FromChar(text[i]))
			Else
				If newWord.Length > 0 Then
					Local data:= newWord.Join()
					myMap.Insert(data, data)
				EndIf
				newWord.Clear()
			End If
		Next
		If IsAWord("test") Then Print "test is a word" Else Print "test is not a word"
		If IsAWord("iowier") Then Print "iowier is a word" Else Print "iowier is not a word"
		
		Return 0
	End Method
	
	Method OnUpdate:Int()
		Return 0
	End Method
	
	Method OnRender:Int()
		Cls(192, 192, 102)
		Return 0
	End Method
	
	Method IsAWord:Bool(s:String)
		Local check:String
		
		check = myMap.Get(s)
		' if the string "s" is not found in myMap, an empty string will be returned (word doesn't exist)
		If check.Length > 0 Then
			Return True
		Else
			Return False
		End If
	End Method

End Class

Function Main:Int()

	Local d:dictionary
	d = New dictionary

	Return 0
End Function



SLotman(Posted 2013) [#2]
Nice!

I just wonder if instead of looping char by char, using text.Split("~n") and then just loop through the array would be faster...? (It will use more memory for sure, but the array would be GC'ed after the function is done processing...


Arabia(Posted 2013) [#3]
No idea. You probably know a lot more about Monkey that I do.

I've not mucked around much with String functions, so text.Split("~n") I wasn't aware of. I might change it to that and see if there is any noticeable speed different. It's quick enough on the PC, not sure if the speed remains string functions run at similar speeds on the various mobile devices.

As for arrays, do you mean have an Array of 172,820 Strings? Again I don't know how Maps "search" for your item when you use .Get - but surely it's quicker than looping through an Array of 172,000+ items.

EDIT: The code as it stands on my 5 year old desktop PC takes about 680 millisecs to execute, hardly worth worrying about trying to optimize unless mobile devices run way slower.


ziggy(Posted 2013) [#4]
You'll make it a lot faster by not making string concatenation this way on non javascript targets. You better use a StringStack, and "join" when your're done.


Arabia(Posted 2013) [#5]

You'll make it a lot faster by not making string concatenation this way on non javascript targets. You better use a StringStack, and "join" when your're done.



I have no idea how to implement what you're suggesting - still learning. Feel free to improve on this code anyone that is capable :)


Arabia(Posted 2013) [#6]
Update:

I will have to muck around with this further, I put my unfinished game onto my Galaxy Tab 3 and it takes about 40 seconds to load the dictionary in - obviously this is not desirable.

Hopefully another way will be a lot quicker, otherwise I guess I could split the dictionary and load little bits at a time. Load 10%, show title screen, load another 10% etc. - but this seems messy to me.

Update Again:

I tried the text.Split("~n") that SLotman suggested to get the words into an Array before looping and putting them into a Map. It is about 10% faster than the way I was doing it, guessing if I transfer this onto my Tablet I won't seem much overall improvement.

So my options to try out now are:

1. Use a loading screen, splash screens and time during menu's to load the dictionary in parts - I don't really like this idea.
2. Just put them all in an Array and then search the array for valid words - should be too hard as they are already in alphabetical order <--- this is probably my best bet I think
3. Take suggestions from the audience on how I can do it


Snader(Posted 2013) [#7]
If your game is going to use online features, you could put the dictionairy in an online database and query it from within your game every time you need it.


Xaron(Posted 2013) [#8]
Hmm using a dictionary by myself I have loading times of about 100-200ms for ~130,000 words.

And I just do it that way, for every length there is an own file.
_words = New IntMap< String[] >
w = LoadString( "words/" + LANGUAGE + "/3.txt" )
_words3 = w.Split( "~n" )
_words.Set( 3, _words3 )



Arabia(Posted 2013) [#9]
If your game is going to use online features, you could put the dictionairy in an online database and query it from within your game every time you need it.


Yeah that's an option, also means users don't need to redownload the dictionary every time it's updated. I'll have a think about it.

I think the Array option might work okay though, if I use a binary search it should take less than 18 attempts to verify a word in a dictionary of 170,000 odd entries.

I'll play with that tomorrow and a new update to the code if it works.

Hmm using a dictionary by myself I have loading times of about 100-200ms for ~130,000 words.

And I just do it that way, for every length there is an own file.


Just so I'm understanding you correctly, you mean you have a file "3.txt" which is every 3 letter word, a file "4.txt" etc. ?


Xaron(Posted 2013) [#10]
Yes! I have seperate files for 3 up to 10 letter words.

I get the following results:
HTML5: Loading time: 78ms for 131401 words.
GLFW: Loading time: 34ms for 131401 words.
Android: Loading time: 476ms for 131401 words.


Arabia(Posted 2013) [#11]
And Xaron, you say you have loading times of about 100-200ms - I take it that's total for the whole 130k odd dictionary. Is this running on HTML or running on a mobile device?

My loading time of around 600ms was fine on HTML, obviously tablets are a whole different kettle of fish with this sort of stuff which is why the time blows out to 40 seconds.


Wylaryzel(Posted 2013) [#12]
How are you going to compare if a word is in the dictionary?

If you only compare if a word is in the dictionary I would use a StringStack:

dict:StringStack = New StringStack()
dict.Push(LoadString(File).Split("~n"))

Afterwards you can use the Contains Function of the stack:

dict.Contains(searchString) <- Returns True if the string is there, False if the string is not there.


The code is untested and you may need a removal function for special chars you don't want to have

Wyl


Arabia(Posted 2013) [#13]

How are you going to compare if a word is in the dictionary?



I'm storing every word in a Map, and then using this function:

	Method IsAWord:Bool(s:String)
		Local check:String
		
		check = myMap.Get(s)
		' if the string "s" is not found in myMap, an empty string will be returned (word doesn't exist)
		If check.Length > 0 Then
			Return True
		Else
			Return False
		End If
	End Method


Check will either return the word if it's found (i.e. the length will be more than 0 for the word size) or 0 if the word isn't found in the dictionary.


Wylaryzel(Posted 2013) [#14]
This could be achieved by:

dict.Contains(searchString)


Gerry Quinn(Posted 2013) [#15]
You could use a single string or array with a special character for the start of a word. To do a binary search you just go to a position and then move to the first word you find which you can then compare as normal.

I wouldn't be surprised if that was the best way for many purposes, as it would have the quickest start-up time and the least memory allocation. Unless you are really looking up a lot of words, it will probably test words fast enough too.

Another similar option is two arrays: one containing all the words in one block, and an array of ints containing the offset of the start of each word. Again, you could pre-generate them so there is no initial processing, and this second method could be searched about as fast as any data structure.


ziggy(Posted 2013) [#16]
I have no idea how to implement what you're suggesting - still learning. Feel free to improve on this code anyone that is capable :)

That's the suggested change. You should see a big difference in Android and iOS.
Strict

Import mojo

Class dictionary Extends App
	
	Field text:String
	Field myMap:= New StringMap<String>

	Method OnCreate:Int()
		SetUpdateRate(60)
		text = LoadString("dictionary.txt")		' Load the complete  dictionary into a string
		
		'Local newWord:String = ""
		Local newWord:= New StringStack
		
		' Separate the complete dictionary into individual words (look out for ASCII values 
		' of 10 & 13 as these a CR & LF.  Each word is then added to "myMap"
		
		For Local i:Int = 0 Until text.Length
			If text[i] <> 10 And text[i] <> 13 Then
				newWord.Push(String.FromChar(text[i]))
			Else
				If newWord.Length > 0 Then
					Local data:= newWord.Join()
					myMap.Insert(data, data)
				EndIf
				newWord.Clear()
			End If
		Next
		If IsAWord("test") Then Print "test is a word" Else Print "test is not a word"
		If IsAWord("iowier") Then Print "iowier is a word" Else Print "iowier is not a word"
		
		Return 0
	End Method
	
	Method OnUpdate:Int()
		Return 0
	End Method
	
	Method OnRender:Int()
		Cls(192, 192, 102)
		Return 0
	End Method
	
	Method IsAWord:Bool(s:String)
		Local check:String
		
		check = myMap.Get(s)
		' if the string "s" is not found in myMap, an empty string will be returned (word doesn't exist)
		If check.Length > 0 Then
			Return True
		Else
			Return False
		End If
	End Method

End Class

Function Main:Int()

	Local d:dictionary
	d = New dictionary

	Return 0
End Function



Arabia(Posted 2013) [#17]
That's the suggested change. You should see a big difference in Android and iOS.


Thanks ziggy. Finally got around to trying out your code, pretty much halves the load time on my Galaxy Tab to about 20 secs. Guessing this is probably more than acceptable, just need to make a nice loading screen for what I'm writing to let the user know nothing has crashed.

Thanks for all the help guys.

I've updated my first post to show the changes ziggy made. If anyone else can see any other speed improvements just post in here.


ziggy(Posted 2013) [#18]
If you change the file to not have any CR character, and to have a single char separator (instead of CR and LF) it could be a coma, it would save you some time as you have an extra iteration per separator when a CR & LF char combination is found. You could pre-format the txt document and see if it improves parsing speed (which I supose it'll do). Also, I'm not sure but I *think* having the document pre-sorted alphabetically could improve map insertion time.


Gerry Quinn(Posted 2013) [#19]
TBH I think 20 secs is a bit long, and if it were me I would pre-process the data into one of the forms I suggested. Then it should load as fast as 2 MB of raw data or images.

I have an idea that on Android a program is always supposed to respond within 4 seconds, isn't that correct? If so, to do a 20 second load you will need to do it in stages and show a progress bar.

Anyhoo, if it's working for you you can complete your game and plug in a faster dictionary if needed.


AdamRedwoods(Posted 2013) [#20]
I have an idea that on Android a program is always supposed to respond within 4 seconds, isn't that correct? If so, to do a 20 second load you will need to do it in stages and show a progress bar.

good point, i think android does kill apps that take about that long.

i also was attempting to create my own dictionary solution (based on this thread), but rather keep it disk-based as opposed to memory-based. i ran into problems, as it seems monkey+android has problems with random-access to existing asset files in the data folder (i posted a potential fix under bugs).


muddy_shoes(Posted 2013) [#21]
A while back I messed about with a dictionary for word games. I was using a dictionary source with around 180,000 words. I've just dug out the code and given it a run on my Android. The loading of the file via LoadString takes ~3secs and splitting it into a string array takes ~1.5secs. That's on my aged ZTE Blade. A more recent phone should be considerably faster.

I did a number of tests at the time with compressed files to reduce the load time but found that whatever I managed to gain in load time I lost more in the processing time. You can't generalise that across all Android devices, but it seems likely that there's not much to be won by pursuing something other than raw text.

Anyway, the result was very simple code.





And with a test app as a runnable example ( the twl06.txt dictionary file can be found all over the place, e.g. here, just be sure to either change the EOL markers to Unix or change the separator in the code to "~r~n") :

Edit: Looks like you can use the enable1.txt you're currently using if you make the separator change mentioned above and also change the FindWord code to force the input to lowercase with ToLower rather than uppercase.



I never did anything much with it but maybe it's useful to people trying to build something better.


SLotman(Posted 2013) [#22]
Here's my attempt, using the same dictionary from the first post:



On Windows Phone 8 (compiling in *release* mode):
Load time: 111 ms
Parse time: 236 ms
Dict time: 742 ms -> 190ms with StringStack

So yeah, not much more than a second to load and parse everything. Most in creating the stringmap...

I don't have an actual android device to test this, but on Bluestack's emulator, I get this:
Load time:330ms
Pase time:120ms
Dict time:440ms -> 140ms with StringStack

This time, even less than a second :)

Edit: Changed code to use StringStack, which is much faster!


Xaron(Posted 2013) [#23]
I wonder why it even should take that time of 40 seconds on Android?! I mean I haven't done any optimization in my code but even this takes only half a second on my mid class Android device.

So I guess there are some new word games coming soon? ;)


Wylaryzel(Posted 2013) [#24]
Please apologize if I sidetrack the thread a bit but with all the codes posted I still wonder why the dictionary is stored in an StringMap<String> instead of an StringStack ?

Without testing, but wouldn't the memory consumption significantly higher? And as the word is stored as key and as value isn't that something doubling?

Thx in advance


muddy_shoes(Posted 2013) [#25]
I wonder why it even should take that time of 40 seconds on Android?!


I don't know what the 40 second code did but looking at Arabia's 21 second code the char by char construction of words via a StringStack is going to be notably slower than the "native" Java split to an array. Map construction is going to be a chunk of time too. Building a large RB tree with all that node construction and rebalancing (and in-order insertion may be a worst case for this) isn't free.

Java on Android on mobile devices is just very poor at this kind of text parsing and manipulation. Code that performs perfectly adequately on other platforms turns out to be a performance catastrophe on Android.

I'm pretty sure the code I posted above that just loads and splits to an array and then binary searches that array is likely to be about as quick as you can do this at Monkey's level of abstraction. I'd be curious to see if your approach of multiple smaller dictionary files improves the raw string load time somehow, but I imagine you've just got a faster phone than me.


Arabia(Posted 2013) [#26]
Please apologize if I sidetrack the thread a bit but with all the codes posted I still wonder why the dictionary is stored in an StringMap<String> instead of an StringStack ?


It's in a Map because that is how I did it :) I am still feeling my way with Monkey and a Map seemed like the logical choice and a nice easy way of checking if a word was in a list of 170,000+ words.

Would like to see if someone could compile the code I currently have in the first post onto iOS and see how it performs.


I'm pretty sure the code I posted above that just loads and splits to an array and then binary searches that array is likely to be about as quick as you can do this at Monkey's level of abstraction.



For my current level of knowledge, I think your code is probably the way to go. I haven't implemented the binary search of an Array in my code yet, will try and give it a go later today.


SLotman(Posted 2013) [#27]
So I guess there are some new word games coming soon? ;)

I have one planned for a long time, would even use minib3d/blitzmax to do it... but time/resources are a problem, so at least mine shouldn't see the light of day anytime soon.

And yeah, I also used a StringMap, because it's what I know... I do not know everything about Monkey yet, even if I completed some games with it :)


SLotman(Posted 2013) [#28]
Heh, changed my sample to StringStack, and the 'dict time' reduced to 190ms on WP8!

On Bluestacks, 140ms - so StringStack is definitively faster and better in this case than StringMap :)


muddy_shoes(Posted 2013) [#29]
StringStack is definitively faster and better in this case than StringMap :)


At construction, sure. What about actually checking words? There's not much value in quickly constructing a dictionary that then cripples your game at runtime.

Another quick note on your code. You don't have to Trim the strings if you use the correct separator when splitting the string. Assuming your dict file is using Windows end of line indicators then the separator is "~r~n".


SLotman(Posted 2013) [#30]
Hmmm strange. The code runs fine on Bluestacks emulator, but on my crap android tablet, it doesn't seem to be working... it's reporting something like 2ms to load the file.

Yeah, loadstring is failing on the device...?! dict.txt is inside the apk, it works on the emulator, so what the h3ll is going on?!?!

Edit: figured out - my tablet is so crappy, it won't load more than 1mb text files with loadstring. Reduced the file to less than 1mb, and I got this:

load time: 2771ms (yikes! just half the dictionary! just loadstring!)
parse time: 1903ms (yikes ^2!)
dict time:1649ms (yikes ^3!)

To load everything (in two passes):
load time: 6565ms
parse time: 3944ms
dict time: 5789

Almost 17 seconds to load it all! Definitively a loading screen will be necessary...

Did I mention this tablet was reaaaally crappy?? :)

So, I would recommend to anyone doing word games to split the word files... will for sure avoid problems with low spec android devices!


Xaron(Posted 2013) [#31]
Well yes, I've splitted my files so I have one for every length (3 to 10). It's coming along nicely. :)




Arabia(Posted 2013) [#32]
Edit: figured out - my tablet is so crappy, it won't load more than 1mb text files with loadstring. Reduced the file to less than 1mb, and I got this:


So maybe that is something I need to look out for, devices which impose a 1mb limit?

How do they test your App when you submit it, I'm guessing they have testers with every imaginable device running Android - in which case my App may fail based on this file size alone.

So I guess there are some new word games coming soon? ;)


I'm hoping to get mine submitted before Christmas - I'll post here if/when it gets accepted. Not really my sort of game, but it's not so much about what you like to play but more about what other people like to play - and similar games have some massive amounts of downloads. Time will tell if I've wasted my time.


Supertino(Posted 2013) [#33]
Not adding to the actual topic but;

< 1MB files and 1024x1024 max textures is what I stick to to ensure older droid devices don't crap out.


Gerry Quinn(Posted 2013) [#34]
They won't go overboard testing your app on everything, it just won't work on some devices.

As for runtime speed, any data structure that you can binary search is going to be blindingly fast unless you are really churning through millions of searches per second.


SLotman(Posted 2013) [#35]
How do they test your App when you submit it

On Google Play? No testing at all. You send your app, it will be live in a couple of hours.


Arabia(Posted 2013) [#36]

Not adding to the actual topic but;

< 1MB files and 1024x1024 max textures is what I stick to to ensure older droid devices don't crap out.



Interesting and worth noting for anyone looking at using a dictionary in their game.

So the plans now are: split the dictionary up into maybe 4 files of < .5Mb each.

Implement binary search.

Just going to focus on getting this thing working on HTML first, will make the final speedup changes before I submit to Google Play.

On Google Play? No testing at all. You send your app, it will be live in a couple of hours.


Wow that surprises me. But good to know you can have an App up within hours of submitting.