My, what a big array you have!

Monkey Forums/Monkey Programming/My, what a big array you have!

ElectricBoogaloo(Posted 2013) [#1]
I have an array. It's quite big!
It contains a word list, which can then be scanned through to find whether words are in there or not.
It's all numbers, because in Blitz2D when I first created this array, it was quicker to sift through numbers than strings.
I've since ported it to C, and used it in DS homebrew, and iOS, and my use of numbers for the thing has resulted in many a happy speed-based returns.

Today, I converted it to Monkey, and then hit F7.
Guess how long it took to compile?
Go on, have a guess!
..
I'll let you continue guessing, because it's been 10 minutes so far, and it's not done yet! ("Translating.."!)

So, that ain't going to work.

Assuming I'm targetting everything possible, what's the quickest solution you guys can think of, for cramming 4Mb worth of numbers into memory?


ziggy(Posted 2013) [#2]
What do you mean by "having an array". Is a monkey source code that initializes the array, so it is hardcoded? It would be much nicer to have the contents external and loadig on runtime, woudnĄt it? It would also be a lot faster.
Having a brief samplo of how the code looks like can be helpful to get an idea of where the compilation bottleneck is.


ElectricBoogaloo(Posted 2013) [#3]
Not sure you need an example, to be honest.
It's a great big megariffic array that's trying to convert, but isn't having a very happy time of things..

Looks a bit like this.
Public Global MyBigassArray:int=[$2C32581,$2490CC00,$2C32585,$24908000,
$2C32585,$2490CC00,$2C33E4A,$2000000,$2C33E4A,
...
trimmed for length, but assume it's very very big.
...
$F900000,$2C3BDA5,$28590000,$2C3BDA5,$28594C00
]


"It would be much nicer to have the contents external and loadig on runtime, woudnĄt it? "
If that's what you think the answer should be, then yes, that would indeed be the answer to the question.
If you could then tell me how to do that, in such a way that would definitely work for all possible targets, then you'd be helping!

Thanks!


Grey Alien(Posted 2013) [#4]
How about converting the array into a 32-bit png and then loading that and reading out each pixel, which will be a 32-bit value like in your array?


ElectricBoogaloo(Posted 2013) [#5]
But I can't read from the png in memory, either. I'd have to load, splodge it onto a screen whose resolution it might not fit onto, then grab little chunks at a time, and deal with it that way.
..
Honestly, I probably *could* do that, but it's very messy!
I've been using this "shove it in a big array" method for *checks* almost a decade! Surely it should be do-able somehow, without requiring half-hour compilation times?


ElectricBoogaloo(Posted 2013) [#6]
*sigh*
I can't see this turning out very quick, but no matter, off we go!
Trial and Error, folks, and I'm guessing this one will very much be an error!


ziggy(Posted 2013) [#7]
I think it may be the parser requiring to much memory for the conversion. How much mem is Trans using in your computer when you compile this? How big is very big? I would love to test your source code on Jungle Ide to see if I can optimize the parser, anyway I'll do my own absurdly long array and see what I can get.

Storing the data into an external bytearray can help tho.


ElectricBoogaloo(Posted 2013) [#8]
Figured that wouldn't work. Can't seem to read Alpha, keep getting $FFxxxxxx values from ReadPixels().
Grrr!

No matter, was worth playing with it, and I can at least add basic pixel-reading to my MonkeyCanDo-list, so not a complete waste of time.
If I can be bothered to redo everything in 24-bit values, it might be achievable, but then you've no guarantees of exact colours, anyway, so it'll probably break on something, somewhere down the line.
Yippee!

*Deletes all that gfx stuff*


Dima(Posted 2013) [#9]
Reading pixels from back buffer does not include alpha component, but this 'could' be solved by loading alpha channel separately from text file or another png (would have to store alpha channel in one of the color channels or something), then combining into a single int array with alpha and color channels respectively.

Another way would be to not use alpha channel in original png but instead pick a single mask color (like the old days) and after reading pixels convert all of those mask colors to alpha 0. This obviously isn't very flexible if you need to have variable alpha levels.

edit: now that I think about this, you could store 3 different alpha channels in one png, each color component r,g,b could represent different alpha channels for 3 different images, then combine the values to recreate original pngs with no quality loss.


Fred(Posted 2013) [#10]
Why don't you resave it to a text file ? With blitz2d.
Then you just load it with loadstring() and put your words in any kind of array.
And you also will be able to add words or go to utf8 for extended charset if needed.


muddy_shoes(Posted 2013) [#11]
There are ways and means to load in your values but all of them will have trade-offs in terms of cross-platform use (as far as standard Monkey features go), performance and/or memory efficiency.

If I were you my first port of call would be to report the compilation speed problem to Mark. It seems likely that he's just not thought about performance in cases involving huge literal arrays and the problem may well be easily resolvable.


AdamRedwoods(Posted 2013) [#12]
Assuming I'm targetting everything possible, what's the quickest solution you guys can think of, for cramming 4Mb worth of numbers into memory?

if it's an external binary file, use monkeyv67b and the databuffer class.
Local buf:DataBuffer = DataBuffer.Load(file)
Print buf.PeekFloat(0)
Print buf.PeekString(4,5)



muddy_shoes(Posted 2013) [#13]
Does 67b resolve the HTML5 incompatibilities?


NoOdle(Posted 2013) [#14]
It might be worth trying a different data structure. I have loaded Sowpods word lists and Tournament Word Lists with monkey, I didn't notice any speed issues. I used a StringMap containing a list for each starting letter of the words loaded from a text file.


AdamRedwoods(Posted 2013) [#15]
Does 67b resolve the HTML5 incompatibilities?

yes, V67 goes back to the ArrayBuffer instead of DataView.


Samah(Posted 2013) [#16]
You could encode it as base64 and just store it as a string either in your code or a text file. Diddy has a base64 module.


ElectricBoogaloo(Posted 2013) [#17]
I guess bog-standard file access it s, then.
To tell you the truth, I've been trying to avoid file access, while Mark's still playing with it. I'm not 100% sure it'll work 100% of the time.
The rest of this stuff, I'm happy to work with, but file access looks like a WIP right now.
I really am struggling to come up with alternatives at this point though.
I'll probably still stick with numbers, though. Numbers are quicker for "blank tile" searches, since you can simply and/mask your search.

*sigh* more of this, tomorrow. I need a sleep!


Grey Alien(Posted 2013) [#18]
Sorry the png thing failed. Shame there's no way to read the alpha. File access has worked for me since about V60 when I started using it. Worked on GLFW, HTML5, Flash and iPhone.


ElectricBoogaloo(Posted 2013) [#19]
Maybe not today!
Woke up with a banging headache, as well as a half-a-dozen possible alternative ways to do this flying through my head.
But after yesterdays trial-and-mostly-error, I've decided to give it a rest, today.

There's lots of other little things I can be tackling, anyway. As handy as bulk-data is, it's currently only useful for the word list, and I'm not really in the mood to code another word game, just yet.
There's many other games to do.

I'll be back!


Gerry Quinn(Posted 2013) [#20]
You could use two pngs, or encode each number in two opixels.


Dima(Posted 2013) [#21]
1 PNG can store 3 alpha channels, so technically you would need 4 pngs total for every 3 pixel array images with alpha.


ElectricBoogaloo(Posted 2013) [#22]
Good grief, has it been that long since I last tackled this?

Well, I gave it another go, today.

First, I trimmed out all words that had more than 12 letters. That'll probably come back to haunt me later, but it meant I could cram the data into a much smaller space.

3 characters fits into 1 pixel, with an 8*/ multiplier ensuring devices with terrible colour issues should (*should) keep the values intact..
(essentially AAA=$080808, ZZZ=$D0D0D0)

In BlitzMax, first, you flick through the wordlist, convert letters to colours, plot the colours, save the picture. Store it in blocks of 64x64 across a large 1024x1024 image, and... you're done.

Then in Monkey you load the picture in as an AnimImage, and at the start of each OnRender frame you load in the next chunk of wordlist data from one of the blocks.

*sigh*

162,726 words fits snuggly into 159 blocks of data, which takes 2.5 seconds to load at 60fps.. It actually takes about twice that long on my Nexus 7, but I can live with that!

I can then scan every word in the list twice per frame on the Nexus 7, or the browser, without even seeing a framerate hit. Obviously I wouldn't ever need to do 2 checks every frame! For 99% of word games it'll be one check every time the player says "Is this a word?", but given it works so blindingly fast, it should mean that players can do that without having to wait.
Sorted!


muddy_shoes(Posted 2013) [#23]
I still think you should directly email Mark about the compile time issue. If he can fix that then you could stick with your original method.

What word list are you using, by the way? Is it available?


skid(Posted 2013) [#24]
If you want to load an array of strings into Monkey fast load one big string and use the Split method.


It contains a word list, which can then be scanned through to find whether words are in there or not.



If you want an optimal solution for this in Monkey then use a Set which should be Log(n) speed, as in an order of magnitude faster than scanning an array.

Function Main()


	Local dictionary:=New StringSet()
	Local count
	
	For Local a=65 To 97
		For Local b=65 To 97
			For Local c=65 To 97
				Local word$=String.FromChar(a)+String.FromChar(b)+String.FromChar(c)
				dictionary.Insert word
				count+=1
			Next
		Next
	Next
	
	Print "Set contains "+count+" words"
	
	If dictionary.Contains("ABC") Print "ABC"
	If dictionary.Contains("123") Print "123"	
	
End



ElectricBoogaloo(Posted 2013) [#25]
If dictionary.Contains("??GOTE") then ....?!

And, as much as Mark might be able to fix the initial issue, achieving pixel-data loading methods will have a multitude of other useful uses in future projects.
It ain't all about the words. It's about loading masses of data in a relatively short time, and being able to store it in a nice neat way.
One task, many uses, and works on all tested targets.

Oh yeah, and the word list is the slightly outdated, but still useful TWL06.


muddy_shoes(Posted 2013) [#26]
I'm sure you can find uses for what you've done. That doesn't really change the reason why you wanted to use compile-time array construction in the first place.


ElectricBoogaloo(Posted 2013) [#27]
.. Because I lazily wanted to reuse 7 year old code that I'd written for a DS, instead of attempting to redo the thing. Yeah, good point!

Seriously, people. Where's your love of coding?


muddy_shoes(Posted 2013) [#28]
I thought the point was to avoid the expensive load on platforms like Android.


ElectricBoogaloo(Posted 2013) [#29]
The point is, and always will be, to use the tools you're given, to achieve the task at hand.
Job done.


muddy_shoes(Posted 2013) [#30]
Okay, well if you're happy with that being your motivation. I may have misinterpreted your initial post asking about the "quickest solution".

I did mess around with this today and I'm loading the plain text TWL06 wordlist on my fairly weedy phone faster than your stated Nexus 7 time. If loading time is a concern for you then you might want to revisit the method at some point.


EdzUp(Posted 2013) [#31]
On the ??GOTE check why not check for GOTE then if its in there check for the rest :)