Subclassing hierarchy of Strings and Arrays

BlitzMax Forums/BlitzMax Programming/Subclassing hierarchy of Strings and Arrays

Fabian.

(Posted 2007) [#1]

Hi,
since this post is a suggestion to improve the BlitzMax language, it is mainly addressed to Mark Sibly, however, everyone should feel free to post comments.

When I lately wrote some code working with arrays I had trouble with the fact that every BlitzMax array dimension has its own variable data type, e.g.

Local Array [,]

is not the same type like

Local Array [,,]

The first one contains a two-dimensional array, the second one conatins a three-dimensional array. This is needed for the compiler to decide how many indexes are needed to index this array. However, this is not usefull if you want to write a function which accepts all types of arrays. Therefore I thought about how the language needs to be designed to make this possible. And I came to the result that it's generally not the best that Arrays (and also Strings) are directly extended from type Object. If I showed this graphically I would write:

 Object
 |
 |--> User defined type
 |
 |--> String
 |
 |--> 1D Array
 |
 |--> 2D Array
 |
 |--> 3D Array
 |
  ...
 |
 '--> *D Array

I don't like the idea that everything is directly extended from type Object. It would be much better to work with a hierarchy, where all sorts of arrays are extended from a base array class and String and this base array class are extended from a class indicating that they're both containing collections of elements. It would look like this:

 Object
 |
 |--> User defined type
 |
 '--> Collection
      |
      |--> String
      |
      '--> Array
           |
           |--> 1D Array
           |
           |--> 2D Array
           |
           |--> 3D Array
           |
            ...
           |
           '--> *D Array

So String is extended from Collection and Collection is extended from Object; one-dimensional array is extended from Array and Array is extended from Collection and Collection is extended from Object. The big advantage of this is that you now can write functions which accept any type of array, e.g.:

Function SearchArray ( arr:Int Array , find:Int )     'search an index in an array
  Local ArrDims:Int [] = arr.Dimensions ( )
  Local DimCount:Int = Len ( ArrDims )   'this way we can get the number of dimensions
  Local Total:Int = 1
  For Local N:Int = 0 Until Len ( ArrDims )  'Calculates the total number of elements
    Total :* ArrDims [ N ]
  Next
  Local ArrPtr:Int Ptr = arr          'Get a pointer to the fist array element
  For Local Index:Int = 0 Until Total  'Iterate through all elements
    If ArrPtr [ Index ] = find
      Return Index                      'If we found the element, return the index
    EndIf
  Next
  Return -1            'Nothing found
EndFunction

And since String and Array are both extended from Collection, you now also can write code, which works with both, stirngs and arrays. And this could be really usefull to solve another issue: you can now remove these ugly bits of the language, as you said yourself - Len could become a real function, and wouldn't be an operator anymore! It could look like this:

Function Len ( param:Collection )         'Len accepts a collection as parameter
  Return param.length
EndFunction

A Collection is either a String or an Array, and both, strings and arrays provide a length field, so Collection should provide a length field, too.
You can see that this subclassing hierarchy can solve some problems and would be therefore nice to have. I can imagine the only issue of this change in the language can be the backwards compatibility; the most programmes written before this change, working with strings and arrays would compile as well with the change, e.g.:

Local Arr:Float [] = [ 1# , 2# , 3# , 4# ]
DrawPoly Arr

It wouldn't matter that an array is now also a Collection, the code compiles just as well. However, a big problem is that this change would need two more keywords (I named them 'Array' and 'Collection' above, but those don't need to be the actual names), and if someone used previously these keywords as function or variable names, this person would get a lot of compiling errors. Therefore I suggest to add a special syntax to the top of the source files, indicating that these keywords aren't used as identifiers and can be therefore accessed from this code. It could look like

Strict 1.26

SuperStrict 1.26

If the compiler doesn't detect this at the top of the source file, it switches to a compatibility mode, that means whereever these two keywords appear in this source file they aren't seen as keywords, instead they're treated as normal identifiers for functions, types or variables.
Additional you could do this vice versa with Len: till now Len is a keyword indicating that the Len-operator is used. If the compiler detects a "Strict/SuperStrict 1.26" statement it doesn't treat Len as keyword anymore - Len is then a function sitting in the brl.retro module.

I know it would be a lot of work to implement this in the compiler, but the result of this change would really solve this and maybe some other issues. So you can see what the work is worth - how do you think about that?

FlameDuck

(Posted 2007) [#2]

I think it's a bit of a mess. Most people don't semantic;ly consider Strings as being Char collections. Also there are other types of Collections than the ones you've mentioned, (currently TList and TMap, but one could easilly imagine more comming later).

Instead I would simply propose that a parameter myArray:Int[] would accept any Int Array of any dimensions. "Cannot convert from Int Array to Int Array" is a somewhat meaningless error message.

SculptureOfSoul

(Posted 2007) [#3]

I agree with Fabian on this. Lisp does something like this - strings, arrays, vectors, lists - they are all of a type "Sequence." A lot of functions work on all types of sequence (and are - I'm assuming - overloaded (or defined as generic in Lisps case) functions that act appropriately given a specific argument).

A 2 dimensional array is nothing more than an array of arrays, essentially. Maps and Trees are nothing more than lists of lists.

That's one really nice thing about lisp and this type of a classification hierarchy - you can call the reverse function on a string such as "hello" and get "olleh" - you can call it on a list of numbers (1 2 3 4 5 6) and get (6 5 4 3 2 1), you can call it on a list of lists ( (1 2 3) (4 5 6) ) and get ( (4 5 6) (1 2 3) ), you can call it on arrays, etc.

FlameDuck

(Posted 2007) [#4]

A 2 dimensional array is nothing more than an array of arrays, essentially.

However there is a difference between rectangular (2D) and jagged (Array of Arrays) arrays.

Maps and Trees are nothing more than lists of lists.

That's because in LISP (List Processing) everything is a list, as opposed to OO languages where everything is an object.

That's one really nice thing about lisp and this type of a classification hierarchy

This has nothing to do with "this kind of structure". It is because everything in Lisp is a list. Now oppowsed to what they thought back in 1958, a list is not the best solution for every problem.

Before you answer, think about this: How would you reverse a non-linaer datastructure (like a Graph)? I know you're a big fan of Lisp, but you are in a minority.

Dreamora

(Posted 2007) [#5]

To your last question: By in order traversing the tree. This gives you the correct representing key list.

But it would make any efficient data structure (which list isn't in most cases) pointless if it was a list.

There are other collections missing beside Map and List: TStream and TBank to name the most important one.

And I highly disagree on the 1.26 mode switches. Hell we already have compatibility mode for stone age blitz (by not using strict at all), with this addition I would start feeling like BM was the worst prethought language ever if it needs 3+ different running modes to work somewhere usefull.
BM needs to become consistent, not even more trashed by another different operation mode. (to me, even the stone age blitz behavior in non strict should be removed, ie local are local to scope, ALWAYS and the like. There is no reason to support stone age blitz because for that stone age blitz already exists, right? :))

People that do not want to have new features are always free not to upgrade to the newest version as it is with any language (ever tried C++ GCC 3.4 to 4.0+? )
With this behavior *updating the language and writing a list with actual changes to the language etc*, we at least would have the chance to see current gen compiles for C++ end on Linux and Windows, where we are several years back and thus cut in optimations (the windows version is that old that it does not even know the Pentium3 flag so not even SSE support, just MMX ...)

This naturally would mean that Compiler updates and regular modul updates would need to be split. Currently this done through sync mod, but it was mentioned that there are problems and this might be dropped.
Compiler updates then could be of 2 simple types: sub minor *Bugfix* updates (like 1.2.2 to 1.2.4) where it only fixes things or minor *Upgrade* (1.2 to 1.3) where the feature set actually changes or behavior is different.

SculptureOfSoul

(Posted 2007) [#6]

Flame, I respect you and your opinions, but seriously - not everything in lisp is a list. It has native data structures - vectors, floats, hash tables, etc. I'm not making this up - I promise. Oh, and those native data structures are not all represented under the hood as lists, either.

The only way that the statement that "everything in lisp is a list" is true is regarding the actual code (not what it represents). What do I mean? Well, where most languages have a special syntax that then gets parsed into a tree for the compiler to deal with, the code you write in lisp *is* that parsed tree of commands.

For instance

(setf (aref MyArray (- (length MyArray) 1) ) 25)

The above code is stored as a list - a list of functions. But that doesn't mean that everything referenced in the code is a list.

For instance, the code above set's the last element of "MyArray" to 25. "MyArray" really is an array. Lisp will by default treat it as an array of symbols (which can represent anything - functions, floating point numbers, lists, etc). However, you can specify that it is a floating point array, for instance, and the compiler will optimize it as such - i.e., it'll be a block of contigous memory of size float * # of elements, just like in Blitz or C++.

I'm not trying to advocate Lisp as the be all end-all of languages - but it does have 50+ years of design that's gone into it and if you'd take the time to reacquaint yourself with it you'd find that more and more languages are incorporating features Lisp has had for years now (closures, higher-order functions, anonymous functions, dynamic typing, interactive compilation, etc etc.) I'm not saying Blitz should become Lisp (that'd be pointless) - I'm just saying that it's a good language do draw ideas from as it has a ton of features that other languages are just finally starting to support or can't feasibly support (i.e. Lisp's macros).

I think you'd like the language if you took a look at it. It's not going to replace C++ or Blitz - it's not the right tool for extremely resource intensive things like modern games - but it's not this archaic mess, either.

Anyhow, regarding your challenge - how do you reverse a non-linear data-structure? Well, I don't think you'd necessarily want to, in the first place - and secondly, a non-linear data-structure probably wouldn't inherit from a linear data-structure anyways. You could still reverse it - it all depends on how it is stored in memory - but it wouldn't have any meaning, of course.

I'll admit the choice of using "reverse" as an example was not all that good. You wouldn't want to use reverse on anything but linear data, it wouldn't make sense. I was really just trying to get across the point that a good object hierarchy will allow you to pass "logical" types to a function and let the function deal with them appropriately, without having to specify a separate function. Do I think it'd come up often where I'd need a function that can operate on one or more dimensional arrays? No - but that doesn't mean that it shouldn't be possible.

The one thing that I wouldn't mind seeing is dynamic typing available in Blitz - with the option of statically declaring type to aid the compiler in code optimization.

Anyhow, I'll try to stop bringing up Lisp if you do me the favor of just looking into it a bit and verifying what I'm saying that not everything is a list! :)

Dreamora

(Posted 2007) [#7]

You can actually have kind of dynamic typing if you want to. ObjectFromHandle and HandleFromObject will cast everything down to Int or up from it so you can handle anything as int.
But this isn't a usefull idea performance wise which is the main point behind true code compilation.
If you don't need that high general performance you always could use LUA, Squirrel or some other scripting language which has far more features for specific needs and which are normally typeless or dynamic typed as well. For some of them there exists modules for BM (LUA, squirrel, microc most likely even others)

Fabian.

(Posted 2007) [#8]

And I highly disagree on the 1.26 mode switches...(till the rest of that post ;-)

I actually didn't suggested that everything without this declaration at the top of the file should be compiled exactly like it did with version 1.24; what the compile would do if it found a "Strict/SuperStirct 1.26" line at the top of the file, is just *hiding* the new keywords. This means the new features still exist and are present, there's just no way to access them, like an operator without a name or symbol associate with it. If you write 'Collection' in your code the compiler refers to it like it was a function/variable... name.

But this isn't the thing that caused me to start this thread, it doesn't have to do that much with the subclassing system I suggested. The only reason why I posted it at all was that I tried to avoid posts like "I've written so much code using 'Array' and 'Collection' as variable name, this won't work anymore with this change!"
I wouldn't mind if a plain 'Strict' or 'SuperStrict' statement also activates this changes.

Gabriel

(Posted 2007) [#9]

This seems to me like one of the things Mark referred to the other day as something he might do differently if he ever makes another language. I see what you mean and there is probably a good case for it, although I don't consider myself a low-level programmer so I'm probably not the ideal person to judge. I certainly have come across a couple of the issues you suggest this would fix, and you're right, it would fix them. I think it's probably too far down the line for Mark to retro-fit all this back into Max. I don't know, I could be wrong, but I suspect it would be a pretty ugly job to try to build this hierarchy back into Max now. You say that it need not break any of our code with a strict-like construct, but that still leaves all the modules which have to work regardless of strict, superstrict and whatnot.