Hashmap and Hashset

Monkey Forums/Monkey Code/Hashmap and Hashset

maltic(Posted 2012) [#1]
In an attempt to learn monkey I created a HashMap class and a HashSet class that extends it. I am putting them here because--in my empirical testing--I noticed both classes consistently outperformed the Red-Black binary tree based Map and Set that come with Monkey, especially for sizes > 1000. Which makes sense given the asymptotic complexity of both.

Following are the times I saw in bench marking. Diddy's RealMillisecs was used to time the benchmarks. All tests were done in release mode.

HTML 500:
HashSet<string>: 2
StringSet: 2

HTML 5000
HashSet<string>: 5
StringSet: 13

HTML 500000:
HashSet<string>: 1037
StringSet: 2039

(GLFW was too fast to see a notable difference till the larger test case. Also the timer wasn't fine-grained enough)

GLFW 5000000:
HashSet<string>: 7000
StringSet: 21000

Feel free to use the code, change it or ignore it as you see fit. I hereby release it under public domain.

Class Hasher<T>
	Method hashCode:int(i:T)
		Return 0
	End
	Method equals:Bool(i:T, i2:T)
		Return False
	End
End

Class IntHasher Extends Hasher<int>
	Method hashCode:int(i:int)
		Return i
	End
	Method equals:Bool(i:int, i2:int)
		Return i = i2
	End
End

Class StringHasher Extends Hasher<String>
	Method hashCode:int(i:String)
		Local hash:Int = 1
		Local prime:Int = 31
		For Local j:Int = 0 To i.Length() -1
			hash = prime * hash + i[j]
		Next
		Return hash
	End
	Method equals:Bool(i:String, i2:String)
		Return i.Compare(i2) = 0
	End
End


'	A hashtable element represents an entry in the hash map/table.
'	Works like a singly linked list to deal with hash collisions.
Class HTElem<K, V>
	Field key:K
	Field value:V
	Field nxt:HTElem<K, V>
	Method New(k:K, v:V, n:HTElem<K, V>)
		key = k
		value = v
		nxt = n
	End
End

'	A hash map class that uses chained bucketing to deal with hash collisions.
' 	K and V represent the classes of the key and value items in the hash map.
'	To use this class one will have to provide a 'Hasher' object defined above.
Class HashMap<K, V>
	Private
	Field table:HTElem<K, V>[]		'	The array that backs the hash table
	Field numElems:Int = 0			'	Number of elements in the hash table
	Const MAX_LOAD:Float = 1.75	'	When the hashtable exceedes this density it will resize itself
	Const GROWTH_FACTOR:= 3			'	The multiple to with the table resizes
	Const DEFAULT_HT_SIZE:= 11		'	The default hash table size
	Const SHRINK_FACTOR:= 0.5
	Const MIN_LOAD:= 0.3
	Field hasher:Hasher<K>			'	The hasher object that provides equals and hashCode methods
	
	'	Resizes the hash map while keeping all its entries.
	Method resize:Void(size:int)
		Local tmp:HTElem<K, V>[] = table
		table = New HTElem<K, V>[size]
		For Local i:int = 0 Until tmp.Length
			Local curr:= tmp[i]
			While (curr <> Null)
				Self.add(curr.key, curr.value)
				curr = curr.nxt
			Wend
		Next
	End
	
	'	Gets the location for a key in the current table
	Method getLoc:Int(key:K)
		Return Abs(hasher.hashCode(key) Mod table.Length)
	End
	
	Public
	
	'	------Constructors-------
	' Create a new hashmap with a specific size
	Method New(h:Hasher<K>, size:Int)
		table = New HTElem<K, V>[size]
		hasher = h
	End
	'	Create a default hashmap
	Method New(h:Hasher<K>)
		table = New HTElem<K, V>[DEFAULT_HT_SIZE]
		hasher = h
	End
	
	'	Check if the hashmap contains a specific element
	Method contains:Bool(key:K)
		Local location:Int = Self.getLoc(key)
		Local curr:HTElem<K, V> = table[location]
		While (curr <> Null)
			If (hasher.equals(curr.key, key))
				Return True
			EndIf
			curr = curr.nxt
		Wend
		Return False
	End
	
	Method get:V(key:K)
		Local location:Int = Self.getLoc(key)
		Local curr:HTElem<K, V> = table[location]
		While (curr <> Null)
			If (hasher.equals(curr.key, key))
				Return curr.value
			EndIf
			curr = curr.nxt
		Wend
		Throw New Throwable
	End
	
	'	Add a new key/value pair
	Method add:Void(key:K, value:V)
		Local location:Int = Self.getLoc(key)
		Local tst:= table[location]
		table[location] = New HTElem<K, V>(key, value, table[location])
		Self.numElems += 1
		If (Self.numElems / Float(table.Length) >= MAX_LOAD)
			Self.resize(table.Length * GROWTH_FACTOR)
		EndIf
		tst = table[location]
	End
	
	'	Remove all instances of a key
	Method remove:Void(key:K)
		Local location:Int = Self.getLoc(key)
		Local curr:HTElem<K, V> = table[location]
		Local prev:HTElem<K, V> = Null
		While (curr <> Null)
			If (hasher.equals(curr.key, key))
				If (prev <> Null)
					prev.nxt = curr.nxt
				Else
					table[location] = table[location].nxt
				EndIf
			EndIf
			prev = curr
			curr = curr.nxt
		Wend
	End
	
End


'	A set that is backed by a hashmap
Class HashSet<T> Extends HashMap<T, Object>
	Method New(h:Hasher<K>, size:Int)
		Super.New(h, size)
	End
	Method New(h:Hasher<K>)
		Super.New(h)
	End
	Method add:Void(key:T, value:Object = Null)
		Super.add(key, Null)
	End
	
	Method remove:bool(key:T)
		Return Super.remove(key)
	End
	
	Method contains:bool(key:T)
		Return Super.contains(key)
	End
End


Here is an example of how one might use the code:
Local h:HashSet<String> = New HashSet<String>(New StringHasher())
h.add("hello hashset!")


You will have to define your own Hasher classes for different classes. However I might be able to make a generic one using reflection.

When using these classes with your own objects, or for any other type I haven't made a Hasher object for, you need to make sure you define a good hashCode method that spreads out entries evenly over the hash table. I suggest following the method used by StringHasher in the above code for every field element you want to use to form the hash code.

The hash map will automatically resize itself. However using the right constructor you can give it a starting size other than the default. This can be useful when you know roughly how many elements you will be housing in the hash map and will optimize performance a little.

The code has been moderately tested for speed and correctness on the HTML5 and GLFW targets. Feel free to report any bugs, ideas or improvements.


Gerry Quinn(Posted 2012) [#2]
Looks neat, though I'll admit that my current roguelike project is the first time I've ever used a Map or a Set for anything!


maltic(Posted 2012) [#3]
Thanks! I've had to use Maps and Sets for a lot of things. I think of them like a nice swiss army knife for lots of applications.


Goodlookinguy(Posted 2013) [#4]
I saw these back when you first posted them and didn't get around to testing them until now (5 months later...). Interestingly, on a small scale, both hash map and map typically ran at the same speed. It wasn't until very large sized maps that the speed difference became noticeable.

These are the results from two of my tests (red = hash multimap; blue = multimap)




All-in-all though, the (A)verage and (T)otal usually leaned towards hash map (most platforms, except flash, which was indecisive).

Thank you for making these, 'cause they do help with performance in certain cases where you need to squeeze out a bit more without causing major slowdowns.