TRIM strings in java when type casting

Monkey Targets Forums/Android/TRIM strings in java when type casting

AdamRedwoods(Posted 2011) [#1]
Works in all targets but Android:
Elseif row.Contains("size")
	xy = row[ (row.FindLast(":")+1)..].Split(",")
	subimage.w = Int(xy[0])
	subimage.h = Int(xy[1])
	Print xy[0]+":"+xy[1]


Even with INT() casting, the subimage.w will crash in Android if there's a space before the number in the string.

Use Trim() and it works.
Elseif row.Contains("size")
	xy = row[ (row.FindLast(":")+1)..].Split(",")
	subimage.w = Int(xy[0].Trim())
	subimage.h = Int(xy[1].Trim())
	Print xy[0]+":"+xy[1]


Is this a bug or expected Java behavior?


Samah(Posted 2011) [#2]
It's just the way parsing works in Java. Any non-numeric character (apart from hyphen, for negative numbers) will make Integer.parseInt() fail.

Having said that, it should be up to the developer to always trim whitespace before conversion. The fact it works in other languages is more of a quirk due to the lenience in their parsing (and sometimes excessive leniency is bad and leads to hidden bugs).


AdamRedwoods(Posted 2011) [#3]
Then it should show up as in error in monkey.


FlameDuck(Posted 2011) [#4]
It's not necessarily an error in Monkey. As Monkey is its own language Mark can pretty much pick and chose whether his language should or should not accept a space as a valid character for String -> Integer conversion.

However he should of course fix it, in such a way so that it works consistently across platforms, regardless of whether whitespace is an accepted numeral in monkey. This seems like a legit bug report.


ziggy(Posted 2011) [#5]
@FlameDuck: +1


Samah(Posted 2011) [#6]
However he should of course fix it, in such a way so that it works consistently across platforms, regardless of whether whitespace is an accepted numeral in monkey.

Since the other targets seem to ignore the whitespace, maybe the Java translator should put an implicit trim() in the converted code when casting from String to Int.


marksibly(Posted 2011) [#7]
Hi,

Even forcing a trim() wont totally solve this problem - Java will still throw an exception if the string contains non-numeric characters. Most (all?) other targets will just stop converting at any non-numeric char. Should this be addressed too?

Either way, any solution WILL have a performance impact, and I'd be more in favor of leaving all data conversions as 'implementation' or 'target language' defined. You should trim() when necessary, and don't when not.

But this is obviously just gonna keep coming up again and again, and the performance impact of a fix is likely to be minimal, so here's what I'd suggest be used instead of the current inline 'Integer.parseInt()':

	static public int stringToInt( String str ){
		int n;
		str=str.trim();
		try{
			n=Integer.parseInt( str );
		}catch( NumberFormatException ex ){
			n=0;
			for( int i=0;i<str.length();++i ){
				int ch=(int)str.charAt(i);
				if( ch<'0' || ch>'9' ) break;
				n=n*10+(ch-'0');
			}
		}
		return n;
	}


(Haven't actually tried it though - will need some '-' handling too...and something similar for stringToFloat).

Or would a simple 'Trim()' be sufficient? Any other suggestions?

The thought of trimming EVERY string still strikes me as icky though!


Samah(Posted 2011) [#8]
I'd suggest something more along the lines of this:
public static int stringToInt(String str) {
	String s = str;
	for(int i = 0; i < s.length(); i++) {
		char c = s.charAt(i);
		if(c=='\n' || c=='\r' || c==' ' || c=='\t') {
			s = s.trim();
			break;
		}
	}
	int n = 0;
	try {
		n = Integer.parseInt(s);
	} catch(NumberFormatException ex) {
	}
	return n;
}

Since trim() is going to create a new String (unless there's no whitespace). In this situation we're only trimming if we find whitespace. Of course this means it could still unnecessarily trim if there's whitespace in the middle of your string, but that's the developer's problem. Either way, the exception handling should catch that.

You could also use Character.isWhitespace() which would handle different string encodings.


marksibly(Posted 2011) [#9]
Hi,

Well, I guess this really needs testing to see which is faster.

Strings are immutable, so it may be that .trim() does pretty much the same thing...plus it'll be native.


Samah(Posted 2011) [#10]
Strings are immutable, so it may be that .trim() does pretty much the same thing...plus it'll be native.

The issue is that trim() will create a new String object, firing off the GC. The loop is there so that it will only trim() if it needs to.

Alternatively you could create your own native parseInt implementation that ignores whitespace, just make sure you don't create any objects.


marksibly(Posted 2011) [#11]
Hi,

> The issue is that trim() will create a new String object, firing off the GC.

Are you sure?

One of the nice things about immutable strings is that methods can just return 'this' if they don't modify the string, as per your code.

If trim is well written it is likely to do this (although going by my experiences with Android, that's not exactly a given...) - and of course, being native, do it much faster.

[edit]In fact, going by the docs here this is exactly what happens:

Returns: A copy of this string with leading and trailing white space removed, or this string if it has no leading or trailing white space.

http://download.oracle.com/javase/1.4.2/docs/api/java/lang/String.html#trim()
[/edit]

So it sounds like just adding a trim() would be pretty close to 'free' if the string is already trimmed. I think this is a good compromise for now.


muddy_shoes(Posted 2011) [#12]
I'm pretty sure that the Dalvik VM String objects are just pointers into shared pools of unicode character arrays anyway. Trimming a string would just create a new one of these reference objects pointing to the same character array. The GC implications are pretty minor as long as you're not doing it thousands of times in your game loop. Not that simply creating an object results in "firing off the GC" anyway.

Still, measuring beats speculation every time. Has anyone tested it?


Samah(Posted 2011) [#13]
@marksibly One of the nice things about immutable strings is that methods can just return 'this' if they don't modify the string, as per your code.

This is exactly the reason it creates a new string (since the source is immutable). But yes, it will just return "this" if the trim is unnecessary.

@marksibly So it sounds like just adding a trim() would be pretty close to 'free' if the string is already trimmed.

There is no free in Java. The closest you have is removing all strong references and forcing a System.gc(), but even that doesn't guarantee the object will get cleaned up.

@marksibly I think this is a good compromise for now.

Yep, do some performance testing and see how we go! :)

@muddy_shoes I'm pretty sure that the Dalvik VM String objects are just pointers into shared pools of unicode character arrays anyway.

It's called the string pool, which is where the compiler puts constants, and any strings that have been intern()ed by the developer. As far as I know, it stores full String objects, since there is much more to a string than just an array of characters.

@muddy_shoes Trimming a string would just create a new one of these reference objects pointing to the same character array.

And the keyword there is "new". It doesn't matter where it gets its source from; if there's a new keyword, it's a new object.


muddy_shoes(Posted 2011) [#14]
Yes, I know there's a new object -- that would be why the sentence you quoted actually says that it's a new object. The point is that just creating an object doesn't mean "firing off the GC". VM GCs "fire" based on strategies that are primarily dependent on levels of resource usage -- they don't just fire every time you create an object and then let it drop out of scope. Further, GC performance issues concern both GC frequency and the actual cost of the process. If the string object is no more than a set of references to a shared backing char array that already existed before the trim then the actual allocated object is probably trivial from a GC perspective.

Here, from the android docs for the string class:

"This class is implemented using a char[]. The length of the array may exceed the length of the string. For example, the string "Hello" may be backed by the array ['H', 'e', 'l', 'l', 'o', 'W'. 'o', 'r', 'l', 'd'] with offset 0 and length 5. Multiple strings can share the same char[] because strings are immutable."

Ultimately, it comes back to my point about measuring. Without measurement this comes under what I consider to be poking at goat entrails from a performance optimisation perspective. Beyond that, considering that we're talking about an operation that is highly unlikely to be performed in a performance sensitive part of a game, it seems like poking at entrails to divine the answer to a question of no importance.


therevills(Posted 2011) [#15]
http://developer.android.com/guide/practices/design/performance.html#object_creation

If you allocate objects in a user interface loop, you will force a periodic garbage collection


Thus, you should avoid creating object instances you don't need to. Some examples of things that can help:

* If you have a method returning a string, and you know that its result will always be appended to a StringBuffer anyway, change your signature and implementation so that the function does the append directly, instead of creating a short-lived temporary object.
* When extracting strings from a set of input data, try to return a substring of the original data, instead of creating a copy. You will create a new String object, but it will share the char[] with the data. (The trade-off being that if you're only using a small part of the original input, you'll be keeping it all around in memory anyway if you go this route.)