Handle illegal characters in file paths.

BlitzMax Forums/BlitzMax Programming/Handle illegal characters in file paths.

ima747(Posted 2010) [#1]
Some things windows just won't allow in file/folder names that macs have no problems with, for example a trailing space, or a |. These bad names are easy to create when moving files and folders from a mac to a PC. Windows replaces them with what looks like a bullet with lots of space around it. When loading those files into my program I have no problems, however if I save the path and then try to load them again I get an access violation as it tries to access something that isn't allowed, and also isn't the right character...

I assume this is a bug in the way that blitzmax is reading the path (realpath also doesn't help) and converting it to a string. Preferably I'd like to just have this work as it should, however failing that, or in the mean time is there a way to detect bad file paths?


Brucey(Posted 2010) [#2]
Where/how are you saving the path?

It's possible that you are losing the "real character" values when saving the text in ASCII format...


ima747(Posted 2010) [#3]
That's exactly what's happening actually, just stumbled on that and came back to post it...

So, how do I do something like WriteLine with > ASCII support? Is WriteString safe?


Brucey(Posted 2010) [#4]
Write it out as UTF8... and then read it back in as UTF8 - UTF8 is a byte-sized representation of multi-byte characters.
BlitzMax internally uses 16-byte characters.


ima747(Posted 2010) [#5]
How do I set the encoding method?

Checking the docs WriteLine is 1 byte per character which explains the ASCII limit, but I don't see a way to change encoding. WriteString says it will write each character, however it doesn't specify the character size it will use...


ima747(Posted 2010) [#6]
I think I got it thanks to http://www.blitzbasic.com/Community/posts.php?topic=83948#948429, add "utf8::" before the file path when opening the files. Seems to work so far...


Brucey(Posted 2010) [#7]
There is a TextStream which supports multi-byte streams... but for some reason it adds BOMs to the start of the file - each to their own, I suppose.

Or you can manage it yourself, but using the string's ToUTF8() method to get a byte-sequence that you can then write out to your stream. (remember to free the Byte Ptr!)


ima747(Posted 2010) [#8]
TextStream is the easy option, but I am storing a lot more than just text in the stream and I don't want to get bogged down with that.

Didn't know about ToUTF8(), will make a note for future, but having to manage the ptrs sounds like a good way to get myself in trouble, especially with multithreading and all the problems I've been having with the garbage collector...

opening the files with "utf8::" prepended seems to be doing the trick nicely, and my raw data is still moving through as well. More testing will let me know if I've borked it up and need to try another approach. Thanks Brucey!