Ugly File Parsing...

Monkey Forums/Monkey Programming/Ugly File Parsing...

Paul - Taiphoz

(Posted 2017) [#1]

Sup peeps...

Right so I have a file I need to parse, doing it currently like this(See Bellow).

So I basically keep track of the line number that I last looked at, then every few seconds I re-open the file split the raw data into lines, jump to the last line I was on and if we have new lines I then parse those.

This works and for what I am using it for its ok I guess, but the longer this log file gets the longer the period of stall I get as the app locks up for a split second, not long enough to cause major issues but enough to be visible..

Is there a better way of doing this to avoid loading a massive log file into memory just to get the last x number of lines?

	Method ParseIntelChannel:Void(_channel:String = "TheCitadel", _chanID:Int = 0)
		'Print " Parsing " + _channel + " using lines id " + _chanID
		Local lines:String[]
		Local path:String = Self.FindIntel(_channel)
		Local file:FileStream
		Local data:String

		file = FileStream.Open(LocalRoot + path, "r")
		If Not file
			Error "Error File not Found [" + LocalRoot + path + "]"
		EndIf
		
		data = Self.AsciiToString(file.ReadString("ascii"))
		lines = data.Split("~n")
		
		If Self.currentIntelLine[_chanID] >= lines.Length - 1
			Self.currentIntelLine[_chanID] = lines.Length - 1
		EndIf
		
		If (lines[Self.currentIntelLine[_chanID]].Length()) > 25 Then 'if we have some text on this line then parse it.
			'Print " Parsing : " + lines[Self.currentIntelLine[_chanID]]
			Self.ParseIntelLine(lines[Self.currentIntelLine[_chanID]])
			Self.lastLineScanned[_chanID] = Self.currentIntelLine[_chanID]
			Self.currentIntelLine[_chanID] += 1
		EndIf
		
	End

Gerry Quinn

(Posted 2017) [#2]

Maybe you could just record how many bytes of the file you have processed up to the last complete line, and just chop that much off the start of the newly loaded version before splitting the remainder into lines?

MonkeyPlotter

(Posted 2017) [#3]

Hmm, gave it a little thought, but I don't think you can open a data file a certain percentage into the file....

I thought about you copying the data into an array, then you can access the last x items in the array rather than processing the whole thing. That might work actually, using a var to hold how far you got to in the array previously.

Gerry Quinn

(Posted 2017) [#4]

By chopping off the start, I meant copying the second part of the string. It means only one new string allocation, plus one for each line you found in the second part.

The alternative is to roll your own splitting function, so you can start it at any point. This is the optimum for speed, and not that hard to do (just a matter of searching repeatedly for the split criterion, starting at the point you last found it.)

Paul - Taiphoz

(Posted 2017) [#5]

its a pain in the ass, it's force me to stop all animation because the hitching that happens every few seconds was driving me nuts.

I think the initial file load is fine speed wise, I suspect the major lag issue is coming from the function I have that converts the byte data into ascii and then the part where I split that ascii into an array of lines based on the new line character.

there must be a better fast way of doing this, but honestly iv not properly touched monkey in months and i'm drawing a blank.

Gerry Quinn

(Posted 2017) [#6]

You could try something like this (haven't tested it, but the idea should be simple enough):

	data = Self.AsciiToString( file.ReadString( "ascii" ) )
	data = data.DropFirstNLines( data, nLinesSoFar )
	moreLines = data.Split( "~n" )
		
	'...
	
	Function DropFirstNLines:String( str:String, nLines:Int )
		Local start:Int = 0
		For Local i:Int = 0 Until nLines
			start = str.Find( "~n", start ) + 1
			If start = 0
				Return ""
			End
		Next
		Return str[ start .. ]
	End

If you want to keep all the lines in order, you can use a stack or list, and add the lines in moreLines to the end of it.

There are ways to speed things up further, but if the problem comes from splitting a long string into the same lines over and over, as it probably does, this should help. You get rid of a lot of string allocations.

MonkeyPlotter

(Posted 2017) [#7]

The timing of this topic is great for me also, just yesterday my file parser bolked at trying to parse 3MBytes of ascii stuff. Breaking it down into manageable chunks could be my answer, it copes with half a MByte ok - which within a web browser still impresses me. My B3D of my tcx parser also crashes on a 3MByte file, must try harder ;)