Delimited Text Files

BlitzMax Forums/BlitzMax Programming/Delimited Text Files

Glenn Dodd(Posted 2007) [#1]
Hi All,
I am using Streams to open text files.
The existing ones have all been Fixed Field length so posed no issues with locating particular pieces of data.
I now have TAB delimited files to deal with.
Is there an easy way to seperate the fields, other than checking each character for the delimiter?

Cheers
Glenn


tonyg(Posted 2007) [#2]
Can you read it in one big string and then use 'Find'
Local mystring:String = "Hello	World"
If mystring.contains("	")
	Print mystring.find("	")
Else
	Print "No it doesn't"
EndIf

You then know the first string is 0 to 4.


Glenn Dodd(Posted 2007) [#3]
I was going to do "check each value in the string" or the "find method" but i thought both of these options would be a bit clumsy.
The files i was looking at had multiple datavalues (multiple delimiters) and i just thought there might be a better way to do it.
something like OpenStream (filename).
Readline(filename, delimiter) which would read it into variables.

I guess a function using an array to store each field would be the most generic method.
Something like:
call GetDelimitedValues (readline(filename),",")

function GetDelimitedValues (s:string,delimiter)
do until eol
'loop through array
array[current record] = left(s,s.find(delimiter)-1)
s=mid(s.find(delimiter)+1, len(s))
wend
end function

then i just need to loop through the array and write the values to my permanent variables, and do whatever i wanted with them.

Is there a better method?

Cheers
glenn


Brucey(Posted 2007) [#4]
It's such a shame there's no built in "split" method for strings...

This is the one I use in my modules :
Function _stringSplit:String[](text:String, separator:String)
	Local splitArray:String[]
	Local fieldCount:Int = 1
	
	' how many elements ?
	Local loc:Int = text.find(separator)
	While loc >= 0
		loc = text.find(separator, loc + 1)
		fieldCount:+1
	Wend
	
	' set the array with the calculated size
	splitArray = New String[fieldCount]
	
	fieldcount = 0
	While True
		loc = text.find(separator)
		If loc >= 0 Then
			splitArray[fieldCount] = text[..loc]
			text = text[loc+1..]
		Else
			splitArray[fieldCount] = text
			Exit
		End If
		fieldCount:+1
	Wend
	
	Return splitArray
End Function



Grey Alien(Posted 2007) [#5]
here's mine:

' -----------------------------------------------------------------------------
' First String To Sub Chop (Return first part of String up To Substring AND Chop the source string)
' -----------------------------------------------------------------------------
Function ccFirstStringToSubChop$(s$ Var, sub$)
	'Pass in a String, this will only return the first part up to, but not including, the substring (Or End)
	'The source string will have the first part and the substring removed.	
	Local pos% = Instr(s$, sub$)
	'If pos = 0 Then Then End of the string was reached, so Return the whole thing.
	If pos = 0 Then
		Local ret$ = s$
		'now clear s$
		s$ = ""
		Return ret$
	Else
		Local ret$ = Mid(s$, 1, pos-1)
		s$ = Mid$(s$, pos+1, Len(s$)) 'leave remainer in s$
		Return ret$		
	EndIf
End Function




Glenn Dodd(Posted 2007) [#6]
Hi Brucey,
Thanks for the function.
I have included a slightly modified version which handles delimiters of multiple length, not just one character.

Cheers
Glenn

Function _stringSplit:String[](text:String, separator:String)
Local splitArray:String[]
Local fieldCount:Int = 1

' how many elements ?
Local loc:Int = text.find(separator)
While loc >= 0
loc = text.find(separator, loc + 1)
fieldCount:+1
Wend

' set the array with the calculated size
splitArray = New String[fieldCount]

fieldcount = 0
While True
loc = text.find(separator)
If loc >= 0 Then
splitArray[fieldCount] = text[..loc]
Print splitArray[fieldCount]
text = text[loc+Len(separator)..]
Else
splitArray[fieldCount] = text
Print splitArray[fieldCount]
Exit
End If
fieldCount:+1
Wend

Return splitArray
End Function


_stringSplit("hello,how are you|,I, am|, fine now","|,")

I can never remember how to get these codeboxes...


Glenn Dodd(Posted 2007) [#7]
Brucey,
Another question.
I often deal with files which use Hex Delimiters (1c is the official hex delimiter).
When i read or write to a file i use Chr(28) (this is hex(1c). How would i add this to your function as the delimiter?

Grey - thanks for your function too. Also thanks for the last update to your game framework.

Cheers
Glenn


Brucey(Posted 2007) [#8]
You can use either {code} {/code} or {codebox} {/codebox} - replace the curly brackets with square brackets.

How would i add this to your function as the delimiter?

Can't you just pass it in like any other String?

Nice tweak to the function, btw... although I would do it slightly differently - caching the length so it's not recalculated every time :

Function _stringSplit:String[](text:String, separator:String)
	Local splitArray:String[]
	Local fieldCount:Int = 1
	Local separatorSize:int = separator.length
	
	' how many elements ?
	Local loc:Int = text.find(separator)
	While loc >= 0
		loc = text.find(separator, loc + separatorSize)
		fieldCount:+1
	Wend
	
	' set the array with the calculated size
	splitArray = New String[fieldCount]
	
	fieldcount = 0
	While True
		loc = text.find(separator)
		If loc >= 0 Then
			splitArray[fieldCount] = text[..loc]
			text = text[loc + separatorSize..]
		Else
			splitArray[fieldCount] = text
			Exit
		End If
		fieldCount:+1
	Wend
	
	Return splitArray
End Function



Glenn Dodd(Posted 2007) [#9]
When you view the hex(1c) character in a text file it looks like a square, so it isn't a character i can type from the keyboard.
and yes your seperator.length is better.
It is always a pleasure to read code by you professionals.
I have used several of your modules in my simple coding efforts and i have learned heaps...

Cheers
Glenn


Glenn Dodd(Posted 2007) [#10]
here is a little bit of code i use

				WriteString(FileOutput,"DOM-ITEM" + Chr(28) + Chr(28) + Barcode + Chr(28) + Chr(28) + Chr(28) + "8000" + Chr(28) + "0" + Chr(28) + "0" + Chr(28) + "0" + Chr(28))
				WriteString(FileOutput,Chr(28) + LocationName + Chr(28) + Chr(28) + Chr(28) + ReceiverAddress1 + Chr(28) + ReceiverAddress2 + Chr(28) + Chr(28) + ReceiverTown + Chr(28) + Chr(28) + Chr(28) + Chr(28))
				WriteString(FileOutput,"CPOL ProductCode" + Chr(28) + "0" + Chr(28) + Chr(28) + "CPSR" + Chr(28) + "0" + Chr(28) + Chr(28) + Chr(28))
				WriteString(FileOutput,Chr(28) + Chr(28) + Chr(28) + Chr(28))
				WriteString(FileOutput,"0" + Chr(28) + ServiceCode + Chr(28) + Chr(10))



Glenn Dodd(Posted 2007) [#11]
i guess i should have used codebox...
anyway those five lines write a detail line to a text file.
when i read it back in the delimiter is chr(28).

if you create that file (adding the correct code to create it of course) and view it in notepad then the chr(28) shows as a square


Grey Alien(Posted 2007) [#12]
Try passing in Chr($1c)


Glenn Dodd(Posted 2007) [#13]
Im not sure what you mean.
I am specifically looking at this line from Brucey:
Function _stringSplit:String[](text:String, separator:String)
where separator is defined as a string.

It may just be my understanding. Strings are printable characters? chr(28) or chr($1c) aren't printable.


Glenn Dodd(Posted 2007) [#14]
suppose i should have tested this before, but it works fine




Glenn Dodd(Posted 2007) [#15]
Thanks for your Brucey and Grey.

Cheers
Glenn


tonyg(Posted 2007) [#16]
Have you tried it?
Local mystring:String = "Hello" + Chr($1C) + "World"
Local myarray:String[] = _stringSplit(mystring , Chr($1c) )
For Local all:String = EachIn myarray
	Print all
Next

<edit> I see you have


Glenn Dodd(Posted 2007) [#17]
well i did learn something new from your last post Tony.
I have always converted the hex values to ascii? but i see i could have used chr($1c) instead. if only i had known that when i first started working with hex values...

thanks again


Grey Alien(Posted 2007) [#18]
glad to help.


FlameDuck(Posted 2007) [#19]
Is there an easy way to seperate the fields, other than checking each character for the delimiter?
Use a Tokenizer.


Glenn Dodd(Posted 2007) [#20]
what is a tokenizer?
is it part of blitz?