WriteStdout improper UTF8 output

Archives Forums/BlitzMax Bug Reports/WriteStdout improper UTF8 output

plash(Posted 2010) [#1]
SuperStrict

Framework brl.blitz
Import brl.standardio

' The Japanese kana for 'a' (arrayified to avoid forum encoding issues)
Local ja_a:Short[] = [Short(12354)]
Local sja_a:String = String.FromShorts(ja_a, ja_a.Length)
Print(sja_a) ' This should look fine
WriteStdout(sja_a + "~n") ' This should not output "B", instead it should be this: http://en.wiktionary.org/wiki/Index:Japanese/%E3%81%82

' Now, simulate the WriteLine call
' WriteLine calls _FlushWrite once and WriteString twice (which also calls _FlushWrite)
Local ts:TTextStream = TTextStream(StandardIOStream)
ts.WriteString(sja_a + "~n")
' Here WriteLine would call WriteString again, with "~r~n" (another _FlushWrite)
' Instead of doing that, we'll just flush the stream
ts._FlushWrite()
TCStandardIO(ts._stream).Flush() ' Flush to stdio


The problem is in the use of _FlushWrite in TTextStream. It should really be called after the WriteString call (though I don't know if this would mess with anything).
I'm using Ubuntu 10.04 with Max 1.39.

EDIT: You'll probably have to run this in a terminal to see the Japanese character. I don't know how it will appear on Windows.


marksibly(Posted 2010) [#2]
Hi,

What problem?

The output is indeed messed up in the IDE (or is it? Not sure what I'm supposed to be seeing...), but that may be related to the richedit control used by stdout not being unicode friendly - something bmx can't and wont guarantee.

The important thing is whether it's getting sent the right data, and I find your example a little confusing in this respect - what's it supposed to do? Why all the downcasting?


plash(Posted 2010) [#3]
The output is indeed messed up in the IDE
That's not what I'm referring to, but yes it does not look correct in the IDE. I'm referring to places where it should look correct (a Linux terminal, for example).

what's it supposed to do? Why all the downcasting?
The downcasting (I assume you're talking about the array thing) was done to preserve the character when posting here, because the forums do not like most non-english characters. The target of this thread is only the output of WriteStdout.

I don't know what more can be explained other than what is in the comments - simply that the TextStream needs to be flushed after calling WriteString.

Here is what the code outputs in gnome-terminal:


The first character is outputted using Print (sequence of calls is Print -> WriteLine -> {_FlushWrite, WriteString(str) -> {_FlushWrite, WriteChar}, WriteString("~r~n") -> {_FlushWrite, ..}}).
The second is WriteStdout (sequence is WriteStdout -> WriteString -> {_FlushWrite, WriteChar}; notice that there is only one _FlushWrite here, and it is before writing characters).
The third character is outputted simulating the sequence of calls from Print (that is, a _FlushWrite after WriteString).


marksibly(Posted 2010) [#4]
Hi,

_FlushWrite is internal. It is used to align certain write ops to a new 'start of line', which is why it appears *before* those ops. From memory, it is purely a formatting feature, designed to make textstream output more readable.

It may indeed by doing something wrong, but you still haven't told me what the 'bug' is with your program, apart from it not calling _FlushWrite when you think it should be (and I think it shouldn't).


plash(Posted 2010) [#5]
It may indeed by doing something wrong, but you still haven't told me what the 'bug' is with your program, apart from it not calling _FlushWrite when you think it should be (and I think it shouldn't).
The bug is that WriteStdout doesn't output correctly (as noted in the thread title) - and calling _FlushWrite after writing the string is the only solution I see.
Clearly we do not agree.

EDIT: I've changed the comment at the WriteStdout call, it was unclear. What I mean is that it's outputting "B" where it should be this: http://en.wiktionary.org/wiki/Index:Japanese/%E3%81%82 (as all the others are).


marksibly(Posted 2010) [#6]
Hi,

> The bug is that WriteStdout doesn't output correctly (as noted in the thread title)

WriteStdout is not designed to handle unicode or UTF8 - it's just a very thin layer on top of 'printf' and only uses simple c style strings.

This could probably be amended to use UTF8 instead, but it's really just meant for debugging anyway, when Print isn't appropriate because you're debugging a module or something.

It has nothing to do with textstreams - although both WriteStdout AND Print go through stdout so operations like flush on one will affect the other.

Apologies for the confusion though.


plash(Posted 2010) [#7]
WriteStdout is not designed to handle unicode or UTF8 - it's just a very thin layer on top of 'printf' and only uses simple c style strings.
I did not realize this.

I don't know when I came to the conclusion that WriteStdout was even defined in brl.standardio (or that it even used the StandardIOStream). I seem to have lost my mind somewhere along the way when posting in this thread.
I do feel that there should be a UTF8 compliant non-newline-writing Print function though, as I've found WriteStdout very useful (specifically, checking the output of a parser/generator - which was actually the cause for this entire thing).

Apologies for the confusion though.
Likewise.