Saving large amounts of data- major slowdown

Monkey Forums/Monkey Programming/Saving large amounts of data- major slowdown

Cygnus(Posted 2012) [#1]
Eyup!

I'm trying to save details for a tileset of 256x256 with 7 layers, each containing rotation, scale and direction vectors, so quite a bit of data. I have written a file system class of my own (heavily modified GFKs to use loadstring and actual files)

I can minimise the data required by each cell by maybe 30%, but this still leaves me with the same problem:

To be specific, at 393k of data, it dies.

I'm now working around it a little by saving each layer in its own file, which isn't optimal, but providing a layer isn't saturated it copes. if one layer gets too busy, it risks hitting 393k.

Is there a way to reliably handle this much data? For example could I say, I used to be able to define a 300k char array but this appears to be impossible with Monkey.


muddy_shoes(Posted 2012) [#2]
Your title says "slowdown" but your description says "dies" and then you say that defining a 300K char array is "impossible". What, exactly, is the problem? Is it slow, is it crashing or is it not compiling at all? What platform are you talking about? GLFW?


Cygnus(Posted 2012) [#3]
Sorry, It was way to late for me to make a post! I have not been clear.

The problem appears to be appending strings together. Appending strings to another string of large size is slow, that is the base of the problem:

I should not have said "Dies" - It just slows down to an impossible level. It never actually crashes. After the 393k, adding a character to the end of a string takes much much longer, meaning appending a row of 1x256x16bytes takes around 250ms on my machine. The more I append, the slower it gets.

This is on GLFW.

I am just wondering what the best way to save levels of this size is?

Since posting I've had an idea to turn my data string into an array. this way if there are 10x 393k string arrays, I only have to append 10 large strings before writing and no single string reaches the point of being slow to process.

About the 300k char array: I am able to access a string by using variable[10], but I was hoping I could define one of a certain size, then say variable[10]=256 (or variable[10]="a") however they are write only, so it is not possible to use them the way I require. Write access to this would prevent me from having to append any strings together at all.

Again, sorry about the babbling in my first post. Clearly I was getting tired!


Paul - Taiphoz(Posted 2012) [#4]
if this something that needs to be done in real time ?

you could try creating a class with the field you need and then creating an array of the class instance.

You could then do Variable[10].name ="" or .x=23 or .float=21.12

there is or will of course be some overhead, but you will also be able to manage your data structure better, just all depends on how and when you need to access your data.


Cygnus(Posted 2012) [#5]
It is actually level storage for my gravity platformer :-)

In fact it starts out like you have it there, but it would need to be stored to disk, thus definitely needs to be a string at some point.

I have tweaked my FileSystem library to work with this slowdown, manipulating the smaller string, periodically pushing this to the string that will be written to disk.

My thought is that appending 10 strings is going to be faster than appending 1 string 3000 times.

I've already minimised the amount of bytes in use, with all values being wrapped into bytes, so that now a single saturated layer can be written to disk without the slowdown, so we can flush the string.

If my latest idea works, then we won't really have a filesize limit anymore :-) If it works, I'll share source.

[edit]

Just wrote a 2.5mb file- slow, but a million times faster than appending to a 300k string. :-) Source to follow (after much testing.)

[edit 2]

Wrote 10mb in the same amount of time, so it's not the file writing that's slow now, and I am "flushing" the string at 65k. I'll go share the source now :-)


Dima(Posted 2012) [#6]
Are you mostly concerned with optimizing string concatenation or reducing the string length or both?

I think the first problem is related to strings being immutable which means long strings require large memory allocation for any modification. Don't know if Monkey permits pre-allocating large strings and filling them from an ascii array or something to avoid recreating the whole string in memory but that's how I would do it in Java. Maybe worth checking the Monkey docs deeper or someone else might want to chime in; I saw something about a StringStack class which contains a Join() method that returns a concatenated string by combining the whole stack - this would be up to the implementation of Monkey to handle correctly on different platforms but I haven't looked at the internals to know what's behind the scenes.
' right from the Monkey Docs - it is unknown to me however,
' if Monkey uses optimized implementation for all/any targets.
Local vec:=New StringStack
vec.AddLast "Hello"
vec.AddLast "there"
vec.AddLast "this"
vec.AddLast "is"
vec.AddLast "a"
vec.AddLast "StringStack"
Print vec.Join( " " )

Reducing the length of strings can also help with performance especially in the large strings that store level data. I usually encode the data to shrink filesizes and this works really well for 2D tile maps in particular. 2D maps can be unfolded into one dimensional representations to omit storing x/y coordinates and also any repeating tiles can be encoded with simple tags to further increase capacity. The idea is to only store what is needed avoiding empty space and redundant data. As an example my data file might look somethins like this:
name = Level 1
size = 10 x 10
' tile coords are represented by 1D array index that will automatically fill the 2D volume
' the line below encodes that 20 tiles 'x' should be inserted starting index 0
' and 10 tiles 'x' starting index 90.  For a level thats 10x10 in size this means 
' first two rows and one last row will contain tiles 'x'
x,0:20,90:10
' if no repetitive sequence is present simply store only the index
' the line below would fill the third row of the level with tiles 'y' every 'other' tile
' and fill the fourth row completely with 'y' tiles
y,20,22,24,26,28,30:10

I hope that makes sense to whomever might not be familiar - it's a very simplistic compression for repetitive data and transformation to one dimensional view - saves me a ton of memory and space for storing tile maps, probably won't work/help in all cases but worth a thought for improvement. There must be many little things that can save a bit of space - like storing entity position and orientation can probably be encoded or unrolled - maybe using 'angle' to store direction instead of a 2D vector.

Cheers,
Dmitriy


Cygnus(Posted 2012) [#7]
Thanks for your input, useful information. In fact I managed to encode all detail about every tile into a single byte. It needn't be human readable. My tilemap has null tiles that are simply none existent. I am down to 200k for an entire 127x127x7 map that details everything there is to know about the entire level.

As a result of my workaround I have written a filesystem class that is capable of writing large files :-)

Cheers for your input guys. :-)


Gerry Quinn(Posted 2012) [#8]
What about using Diddy's StringBuilder class?


Gerry Quinn(Posted 2012) [#9]
Edit: Or directly use the undocumented FromChars() function.


Dima(Posted 2012) [#10]
Thanks for pointing that out Gerry,

A quick test shows about 40% improvement using FromChars function. I've only tested in HTML5 (no other targets here). Maybe someone could try other targets and share results?
Import mojo

Class MyApp Extends App

	Method OnCreate()
		Local s$, len% = 10000000
		Local ascii%[] = New Int[len]
		For Local i% = 0 Until len
			ascii[i] = Rnd(97, 123)
		End

		Local dt% = Millisecs()
	        s = Concat1(ascii)
	        Print "Concat1: " + (Millisecs()-dt)
	
		dt = Millisecs()
	        s = Concat2(ascii)
	        Print "Concat2: " + (Millisecs()-dt)
	End
	
	Method Concat1$(ascii%[])
		Local s$ = ""
		For Local i% = 0 Until ascii.Length()
			s += "".FromChar(ascii[i])		
		End
		Return s
	End
	
	Method Concat2$(ascii%[])
		Return "".FromChars(ascii)
	End

End 

Function Main()
    New MyApp
End



Cygnus(Posted 2012) [#11]
FromChars would help fix it the way I originally wanted, nice! I may try to implement this as it'll speed up saving by the sounds of it. Cheers. :-)

At the moment I have a base string, and a write string. When base reaches 64k, it is appended to write string and cleared, so we are never appending single chars to a large string, and only occasionally messing with the larger string, with minimal slowdown. :-)