Code archives/Miscellaneous/Hashtable type for BMax
This code has been declared by its author to be Public Domain code.
Download source code
| |||||
The format for this hash table is: key: any string value: any object THashtable.Constructor( size:int ) is the only way to create a hash table. You'll get an error if you try to create one with new. So an example call to insert a value into a new hash table might be: myObject:object = new object myHash:THashtable = THashtable.Constructor( 50 ) myHash.insertEntry( "myKey", myObject ) I tried to document the major functions in the code itself, and clarify some of the usage in the test portion of the code. Be warned however that the test area is some quick and ugly code! :) | |||||
''Constructor( capacity:int ) - returns a new Hashtable object with a maximum capacity = to '' the parameter provided. Utilize this function to create '' any and all hashtable objects. '' '' ''InsertEntry(name:string,o:object) - add an object to the table, the name is used to '' to generate an index 0-Capacity, selecting the list. '' multiple entries with the same name are permitted using '' this method. If you want to guarantee that only one '' entry exists per a given name, use the '' insertUniqueName( ) method '' ''insertUniqueName( name:string, o:object) - this method guarantees that the object specified '' will be the only object for that particular name. '' ''insertUniqueNamedObject( name:string, obj:object) - this method guarantees that there will only be '' one copy of the object specified by the obj parameter '' in the hashtable. There may be multiple entries for '' the name however. Using this instead of the '' standard InsertEntry() method when you want to make '' sure the same object (assuming it is the same name) '' is not duplicated in the table. '' ''RemoveNamed(name:string) - Removes all entries with the name specified from the hash table. '' Note, this does not return those objects. '' ''RemoveNamedObject(name:string, obj:object) - removes any entries from the bucket specified by the name '' parameter whose object value matches the object specified '' by the obj parameter. This method lets you quickly remove '' a single object without removing all objects that share the '' same name (such as RemoveNamed() does.) '' ''RemoveObject( obj:object ) - searches every bucket in the hash table for the specified object and removes '' all traces of it from the hash table. Note: This method can be relatively slow '' on large tables, and if possible you should use RemoveNamedObject() instead. '' ''GetEntry:object(name:string) - grab the first entry for a given name. You must cast it back to its original '' type before using it,myobject=myType(GetEntry(name:string)) '' ''GetMultipleEntries:object [](name:string) - returns an array of objects (entries) that match the '' given name parameter. This is useful when you intend '' to store a list of objects with a given name index '' ''Grow( growthsize:int ) - Increases the number of buckets in the hash array by growthsize. '' ''removeAllEntries() - removes all objects from the hash table '' ''removeNamed( name ) removes all entries with the given name from the hash table '' '' '' Regarding Unique Entries '' ------------------------ ''If you want to ensure there is only one entry for a given name key, there are two approaches you can take. ''If you want or need to ensure there is no entry for a given name key before you do an insert - or if there ''is an entry already present you want to handle it in some manner you should ''call getEntry( name ). If this is null, it's safe to insert. If this returns an object calling InsertEntry() ''will insert another entry with the same name. While you can access multiple entries with the same name via ''the method GetMultipleEntries(name), doing so is not ideal if you simply want a single entry per name. ''So the first approach is to call getEntry, and deal with the object returned (if any) and then do your ''insertion. ''The second approach comes into play if you don't need access to any entry that might already be stored ''under the insert name, but simply want to guarantee the entry you are inserting is the only one for ''that name. Calling insertUniqueName(name) will erase any entries already stored under "name" and guarantees that ''there is only one unique entry for the name provided. ''''''''''''''''''''''''''' '' ''HASHTABLE OBJECT '' ''- M.Laurenson/Defoc8 2006 ''- S.Hofslund/SculptureOfSoul 2006 '' ''- major modifications: the introduction of a capacity parameter, getMultipleEntries(), '' insertUniqueName(), insertUniqueNamedObject(), removeNamed(), removeNamedObject(), removeObject(), and '' Grow() methods and the Constructor() function some error checking code, document modification '' as well as a new and much more effective hash algorithm added by S.Hofslund (SculptureOfSoul) '''''''''''''''''''''''''''' Strict Type THashTable Field _table:TList[] Field _capacity:Int ?debug Global _indirectconstruct:Int ? Method New() ?debug Assert _indirectconstruct, "Use THashtable.Constructor() to create a new hash table." ? EndMethod Function Constructor:THashTable( capacity:Int ) ?debug _indirectconstruct = True ? Local retobj:THashTable = New THashTable retobj._capacity=capacity retobj._table=retobj._table[..capacity] For Local n:Int=0 Until capacity retobj._table[n]=New TList Next ?debug _indirectconstruct = False ? Return retobj EndFunction Method Grow( growthsize:Int ) If growthsize <= 0 Return Local oldtable:TList[] oldtable = _table _capacity = (_capacity + growthsize) _table = New TList[_capacity] For Local i:Int = 0 Until _capacity _table[i] = New TList Next 'regenerate indexes for and reinsert all of the old tables entries For Local n:Int = 0 Until oldtable.length For Local entry:gHashEntry = EachIn oldtable[n] InsertEntry( entry.name, entry.obj ) Next Next EndMethod Method genIndex(name:String) Local val:Int=0 Local temp:Int For Local n:Int=0 Until name.length 'the following is commented out because it is neither as fast or as efficient as the one used below ' val:+ (name[n])^2 + (name[n]*(n^2)) + (name[n]Mod 3 * name[n]) val= (val Shl 3) + val + name[n] 'commented out for the same reason ' val:+ (name[n])^(((name.length + 1 - n)Mod 2) + 2) Next 'call Abs on the index because it might be negative (in the case of an integer overflow). Return Abs(val Mod (_capacity ) ) EndMethod Method insertEntry(name:String ,obj:Object) Local index:Int=genIndex(name$) Local entry:gHashEntry=New gHashEntry entry.name=name entry.obj=obj entry.link=_table[index].AddLast(entry) EndMethod 'object must have an overriden compare method! Method insertSortedEntry(name:String,obj:Object,ascending:Int=True) Local index:Int=genIndex(name$) Local entry:gHashEntry=New gHashEntry entry.name=name entry.obj=obj entry.link=_table[index].AddLast(entry) _table[index].Sort(ascending) EndMethod Method sortEntriesNamed(name:String,ascending=True) Local index:Int=genIndex(name) _table[index].sort(ascending) EndMethod Method sortAll(ascending:Int=True) For Local iter:Int=0 Until _table.length _table[iter].sort(ascending) Next EndMethod Method insertUniqueNamedObject( name:String, obj:Object ) Local index:Int = genIndex(name) internal_removeNamedObject( obj, index ) Local entry:gHashEntry = New gHashEntry entry.name = name entry.obj = obj entry.link = _table[index].Addlast(entry) EndMethod Method insertUniqueName(name:String,obj:Object) Local index:Int=genIndex(name$) internal_removeNamed( name, index ) Local entry:gHashEntry=New gHashEntry entry.name=name entry.obj=obj entry.link=_table[index].AddLast(entry) EndMethod Method getEntry:Object(name:String) Local index:Int=genIndex(name) Local link:TLink = _table[index]._head Local entry:Object Rem the old for eachin variety of the loop, replaced below by the hand rolled loop. Uncomment this and comment the the below code to see for yourself the speed difference it makes (only noticeable when doing 1000's of operations though) For Local entry:gHashEntry=EachIn _table[index] If(entry.name=name) Return(entry.obj) EndIf Next EndRem 'Print "_table[index].count() =" + _table[index].count() For Local iter = 0 Until _table[index].count() 'handrolling the loop as it turns out to be much faster than a For..Eachin entry = link._succ._value 'do the keys match? If gHashEntry(entry).name = name Return gHashEntry(entry).obj EndIf link = link._succ 'Return(entry.obj) 'EndIf Next Return Null EndMethod Method getMultipleEntries:Object[](name:String) Local index:Int = genIndex(name) Local retarray:Object[] = New Object[getEntryCount(index)] Local objectcount:Int For Local entry:gHashEntry=EachIn _table[index] If entry.name = name retarray[objectcount] = entry.obj objectcount:+ 1 EndIf Next 'resize the array to objectcount elements. retarray = retarray[..objectcount] Return retarray EndMethod Rem The below are leftovers from a variety of tests. I figured someone might find these of interest. Anyhow, except in certain special situations, I found them to be slower than the code I'm using for getMultipleEntries. Feel free to experiment though. Method getMultipleEntries2:Object[](name:String) Local index:Int = genIndex(name) Local objectcount:Int For Local entry:gHashEntry=EachIn _table[index] If entry.name = name objectcount:+ 1 EndIf Next Local retarray:Object[] = New Object[objectcount] objectcount = 0 For Local entry:gHashEntry = EachIn _table[index] If entry.name = name retarray[objectcount] = entry.obj objectcount:+ 1 EndIf Next 'resize the array to objectcount elements. 'MemCopy( retarray, retarray, SizeOf(retarray[0]) * objectcount) 'retarray = retarray[..objectcount] 'above method is faster Return retarray EndMethod EndRem Rem Method getMultipleEntries3:Objwrapper(name:String) Local index:Int = genIndex(name) Local retarray:Object[] = New Object[getEntryCount(index)] Local objectcount:Int Local wrapper:objwrapper = New objwrapper For Local entry:gHashEntry=EachIn _table[index] If entry.name = name retarray[objectcount] = entry.obj objectcount:+ 1 EndIf Next 'resize the array to objectcount elements. 'MemCopy( retarray, retarray, SizeOf(retarray[0]) * objectcount) wrapper.objarray = retarray wrapper.length = retarray.length Return wrapper EndMethod EndRem 'simply a slightly faster version of removeNamed. Faster because the index is provided 'and doesn't need to be generated. This method is called by insertUniqueName Method internal_removeNamed( name:String, index:Int ) For Local entry:gHashEntry = EachIn _table[index] If(entry.name = name) entry.link.Remove() EndIf Next EndMethod Method removeNamed:Int( name:String ) 'returns the # of objects removed Local index:Int = genIndex(name) Local remove_count:Int = 0 For Local entry:gHashEntry = EachIn _table[index] If(entry.name = name) entry.link.Remove() remove_count:+ 1 EndIf Next Return remove_count EndMethod Method internal_removeNamedObject( obj:Object, index:Int ) For Local entry:gHashEntry = EachIn _table[index] If(entry.obj = obj) entry.link.Remove() EndIf Next EndMethod Method removeNamedObject:Int( name:String, obj:Object ) 'returns # removed Local index:Int = genIndex(name) Local remove_count:Int = 0 For Local entry:gHashEntry = EachIn _table[index] If(entry.obj = obj) entry.link.Remove() remove_count:+ 1 EndIf Next Return remove_count EndMethod 'this loops through every bucket in the hash table Method removeObject:Int( obj:Object ) 'returns # removed Local remove_count:Int = 0 For Local iter = 0 Until _table.length For Local entry:gHashEntry = EachIn _table[iter] If entry.obj = obj entry.link.remove() remove_count:+ 1 EndIf Next Next Return remove_count EndMethod Method removeAllEntries() For Local n:Int=0 Until _capacity _table[n].clear() Next EndMethod Method getEntryCount:Int(index:Int) If (index >= 0) And (index < _capacity) Return _table[index].count() EndIf EndMethod EndType Type gHashEntry Field name:String Field obj:Object Field link:TLink Method Compare(pHashEntry:Object) 'Try ?debug Print "gHashEntry.compare called" ? ' Print "Compare result: " + obj.Compare2(gHashEntry(pHashEntry).obj) Return obj.Compare(gHashEntry(pHashEntry).obj) 'Catch ex:Object 'EndTry EndMethod EndType '--------------------------------------------'Test Section Below-------------------------------------------- 'these are included just to demonstrate the usage and speed of the table. Comment them out at your leisure. '----------------------------------------------------------------------------------------------------------- Global HashTable:THashTable = THashTable.Constructor( 1024 ) Global testint:intobj = New intobj Global tempint:Int Global str1$ Global str2$ Global obj:Object 'this type is used as a wrapper for ints allowing them to be treated as objects (and thus stored in the hash table.) Type intobj Field val:Int EndType 'str1 = our key string, str2 = our object to be inserted. str1 = "Strength" str2 = "3" testint.val = 55 Hashtable.insertEntry( str1, str2) '----------------------Test: 900,000 lookups and assignments Local starttime:Int = MilliSecs() For Local iter = 0 To 900000 obj = Hashtable.getentry( str1 ) 'the following assignment works because strings are objects. If you wanted to assign a numeric 4 you'd have to wrap it 'in an object (as is done below with the intobj type.) obj = "4" Next Local endtime:Int = MilliSecs() Print "~n" Print "*********************************************************************************************" Print "Millisecs to execute 900,000 lookups and assignments:" + (endtime - starttime) Print "*********************************************************************************************" '------------------------------------------------------------------- 'clear the table for the next test. HashTable.RemoveAllEntries() '---------------Test: Insert 1024 entries into a hash table of size 1024 to see then print out the '---------------distribution list to see how well the hash algorithm is working Print "~n" Print "*********************************************************************************************" Print "The following is a list of the # of entries per hash table element given 1024 entries in a hash table with 1024 elements." Print "The string used as a key is the following: ~qc:\documents And settings\random\testing\directory\~q plus up to 8 random letters." Print "This test should prove that the distribution, even given extremely similar strings, is still quite uniform." Print "The format of the list below is N:x where X is the element # in the hash table (0-1024 here since the table has 1024 elements." Print "And the entry count is the # of entries stored at that hash element (ideally there will be as few duplicates as possible.)" Print "If you are interested try altering the hash algorithm and see how it affects the distribution and duplication count." Print "Also note that I'm not currently seeding the random generator anywhere, so you'll get the same results each time even though" Print "in a real world situation the results will vary (possibly quite a bit) with a different seed in place, so feel free to seed the random" Print "generator to get more real world results. That, or try altering the test string that is used in the key - this can lead to drastically different" Print "results as well, and a wide range of tests are needed to verify that the algorithm is efficient with it's distribution given a wide range of inputs." Print "My tests have shown the algorithm above to be better than those I've commented out (and many others I tried). YMMV." Print "*********************************************************************************************" Global teststr:String 'generate and insert the entries. We start with the base string and add up to 8 random characters before inserting. For Local iter = 0 Until 1024 teststr = "c:\documents and settings\random\testing\directory\" 'generate up to 8 random characters to add to the test string For Local iter2 = 0 Until 8 If Rand(2) = 1 teststr:+ Chr(Rand( 65,125 )) EndIf Next HashTable.insertEntry( teststr, testint ) Next 'this loops through the entire hash table, counting the # of objects stored at each index and also the total # of 'duplicates. A duplicate is equivalent to a hash collision - a duplicate ID generated from different inputs. Local duplicatecount:Int For Local iter6 = 0 Until hashtable._table.length Print "N:" + iter6 + "....entry count:" + HashTable._table[iter6].count() If HashTable._table[iter6].count() > 1 duplicatecount:+ HashTable._table[iter6].count() -1 EndIf Next Print "number of duplicates:" + duplicatecount 'clear the table for the next test. HashTable.RemoveAllEntries() 'Insert 1024 entries with the key "testXX" where XX is a number from 0-49. The object inserted is the string "entryXX" where 'XX is again a number from 0-49. Print "~n" Print "*********************************************************************************************" Print "Inserting 1024 entries with the key ~qtestXX~q where XX is a number from 1-50. The Object inserted is the String ~qentryXX~q where" Print "XX is again a number from 1-50 that may or may not match the key number." Print "*********************************************************************************************" Print "~n" 'insert 1024 entries with a key of testXX where XX = a number from 1-50. Obviously, there will be multiple entries 'for any given key. For Local iter = 0 Until 1024 tempint = Rand(50) HashTable.insertentry("test" + tempint, "entry" + Rand(50) ) Next Print "*********************************************************************************************" Print "Retrieving multiple entries for the key ~qtest25~q. The entry #'s were randomly generated during the insertion and are from 1-50." Print "*********************************************************************************************" 'this is how you retrieve multiple entries for a given key! The entries are returned as an array of 'objects that you'll need to cast before using. Local objarray:Object[] objarray = HashTable.getMultipleEntries( "test25" ) For Local objiter = 0 Until objarray.length 'since we know the object stored is a string, we cast to string and then print. Print String(objarray[objiter]) Next Print "~n" Print "*********************************************************************************************" Print "The following is the time it took to call ~qgetMultipleEntries~q 900,000 times." Print "*********************************************************************************************" 'This just demonstrates the speed of "getMultipleEntries." It is considerably slower than getEntry() - especially 'when there are many duplicate entries, but is obviously necessary if you need to retrieve more than one entry. starttime = MilliSecs() For Local iter5 = 0 Until 900000 objarray = HashTable.getMultipleEntries( "test25" ) Next Print "Millisecs:" + (MilliSecs() - starttime) Print 'And now to prove that InsertUniqueName() works, we'll call it with the key "test25" which we've already shown 'above to have multiple entries stored. The entries that were stored with the key "test25" will be deleted and 'only the entry inserted with "InsertUniqueName()" will exist. Print "*********************************************************************************************" Print "Now calling Hashtable.InsertUniqueName( ~qtest25~q, ~qthis is the only entry that exists now!~q)." Print "This function will guarantee that the key provided - in this case ~qtest25~q will only have one unique entry" Print "We know from the above test that there are currently multiple entries stored under the key ~qtest25~q - they" Print "will be deleted by this function and replaced by the object provided in this function call." Print "*********************************************************************************************" Print "~n" Hashtable.insertUniqueName( "test25", "this is the only entry that exists now!" ) objarray = HashTable.getMultipleEntries( "test25" ) For Local iter4 = 0 Until objarray.length Print "The object at key ~qtest25~q:" + String(objarray[iter4]) Next Print "So it is clear that insertUniqueName does indeed work." Print "~n~n" Print "***Scroll to the top to read all the test results!***" '------------------------------------------------deprecated-------------------------------------------- 'commented the lines below out because the objwrapper type is only used in a version of getMultipleEntries that I was testing 'but decided not to go with. Rem Global wrapper:objwrapper = New objwrapper Type Objwrapper Field objarray:Object[] Field length:Int EndType 'Version 3 is faster when there are few duplicates Rem Print "Time b. v3:" + MilliSecs() For Local iter = 0 Until 900000 wrapper = HashTable.getMultipleEntries3( "test25" ) Next Print "Time a. v3:" + MilliSecs() 'version 1 is faster when there are multiple duplicates Rem Print "Time b. v1:" + MilliSecs() For Local iter = 0 Until 900000 objarray = HashTable.getMultipleEntries( "test25" ) Next Print "Time a. v1:" + MilliSecs() EndRem '--------------------------------------------------------------------------------------------------------- |
Comments
None.
Code Archives Forum