Binary Searching

Monkey Forums/Monkey Programming/Binary Searching

BurpyMcFistyGuts(Posted 2014) [#1]
What would the best way to do a binary search routine for lists - especially as they can't be accessed via an index.

Whilst other types would allow indexing, they don't allow sorting...


Gerry Quinn(Posted 2014) [#2]
I don't believe it can be done (in less than O(N), I mean)

You'd need some more elaborate data structure that maintains indexes too. But how do you do that and keep insertion quick?

If you want to perform multiple binary searches on a pre-existing or rarely-changing list, the answer of course is to use ToArray().


BurpyMcFistyGuts(Posted 2014) [#3]
Insertion would always be at the back (or front) of an array - GLBasic has basically a generic stack that can be re-sorted, and directly indexed


Gerry Quinn(Posted 2014) [#4]
Well, if you maintain a separate index stack that is updated every time you do a sort, this should be easy enough. It's similar to having a z-order list pointing at data in a stack, or maybe the same thing backwards. Remember sorting is O( n log n ), so adding an O( n ) procedure doesn't increase Big-O complexity.

What would be hard or impossible to get is the quick insert anywhere option that pure lists provide.


maltic(Posted 2014) [#5]
For what purpose do you require the binary search? There may be better solutions than using a 'conventional' sequential container.

For example, you could use Monkey's Set (or code your own if you are feeling masochistic) to store your items. Insert is O(lg n), as is checking if an element is in the set. This is much better than trying to sort a list at every insertion. You can still iterate through the elements of a set like a list, just remember that you will only be able to easily iterate through the items of the set in sorted order--probably not a big deal, since you needed to sort your List to binary search it anyway. In fact you can think of Monkey's Set (which is actually a self balancing binary search tree) as a list that is always sorted, and essentially binary searches itself when you try to find an element.


----------------
Edit: Not sure why this got posted twice?


maltic(Posted 2014) [#6]
For what purpose do you require the binary search? There may be better solutions than using a 'conventional' sequential container.

For example, you could use Monkey's Set (or code your own if you are feeling masochistic) to store your items. Insert is O(lg n), as is checking if an element is in the set. This is much better than trying to sort a list at every insertion. You can still iterate through the elements of a set like a list, just remember that you will only be able to easily iterate through the items of the set in sorted order--probably not a big deal, since you needed to sort your List to binary search it anyway. In fact you can think of Monkey's Set (which is actually a self balancing binary search tree) as a list that is always sorted, and essentially binary searches itself when you try to find an element.


Gerry Quinn(Posted 2014) [#7]
It gets posted twice if it hangs for a while and you click again on Reply, I think.


AdamRedwoods(Posted 2014) [#8]
I thought about this again, and perhaps my mind is racing ahead of me...

my bucket deque collection:
http://monkeycoder.co.nz/Community/posts.php?topic=6037

...is a list of fixed-size stacks. so you get the worst of both. in theory, sorting this after populating would be a drag, but sorting WHILE you add a new item would be less overhead. although the bucket deque doesn't currently do this, you can add a sorted insert by keeping the high/low values of each bucket, and that's how the search/sort would be handled. then the stack would be sorted with an insert, and an insert on this bucket deque is minimal (...is current stack full? if so, is next stack full? if not, insert in beginning of next stack. if next stack is full, insert new empty stack (since it's a list).

indexed Gets are a little slower since it needs to iterate through the lists, but again, it's a mix of worlds. a potential payoff is tighter memory management than a normal stack, and a possibility for better cache hits.

DISCLAIMER it's late at night for me here, and all of this is just theory.


muddy_shoes(Posted 2014) [#9]
I'm not sure what the problem is to be solved here. OP wants a binary searchable data structure but for some reason thinks that "Whilst other types would allow indexing, they don't allow sorting..."

The answer is that there's nothing stopping you sorting an array.


maltic(Posted 2014) [#10]
@AdamRedwoods Good thought, but that solution is only better than keeping a list by a constant factor, and is thus theoretically the same big O complexity. The worst case for insert is still linear (O(n / m) => O(n) assuming m is constant and m is the number of buckets; n is the size of a bucket), as you might be inserting a maximal value into a Deque that is sorted in ascending order--or vice-versa. In addition, the get query can be equally as bad for the same reason. Iteration remains very efficient however, being O(n) with good cache coherency.

Let us look at the requirements: he wanted a list which is binary searchable. A binary search, in the simplest possible terms, checks if an item is or is not inside a contiguous collection. In addition, a list has one definitive feature, it can be iterated over. So, we need some collection which can be iterated over, and which can answer a 'contains' query as quickly as possible--preferably O(lg n), like a binary search.

I think the best possible solution is to use a HashSet or TreeSet (as is included with Monkey) for the contains query, and iterate over the elements of this set for the 'list' like behavior. For a HashSet iteration is trivial, but has a potential overhead depending on the implementation (bounded by a constant factor, however, so still O(n)). The HashSet provides a contains query in expected case O(1) time. I already talked about using a TreeSet in my previous post.

It is also worth pointing that that diddy has a Java-like ArrayList and an implementation of Quicksort.


muddy_shoes(Posted 2014) [#11]
"Let us look at the requirements: he wanted a list which is binary searchable."

Nope. Read the posts again. The OP specifically states that they don't need the midpoint insertion performance of a list and are actually looking for something like an array-based stack. All they need is to write, or look on the forum for, an array sort.


Nobuyuki(Posted 2014) [#12]
Use a Map. They are automatically sorted and a Get from a map uses red-black tree searching, a type of binary search.

You can always steal the code for searching the nodes from Map<T> itself, if you need this finding ability for a different type of container, but the truth is that on random data, a search from beginning to end is probably just as (in)efficient as anything else. To make assumptions about the data to implement a more optimized search has trade-offs where it would be more efficient in the expected scenario, and much less efficient in unexpected ones.

So basically, what everyone else has been saying -- if you want both things you should write your own container for the expected data. For search-heavy structures, use a Map, and for search-light / index-heavy structures, use a different container like a List or Stack. My go-to container for most purposes in monkey is a Stack -- it's actually more like a dynamic array than a real Stack (think VB.NET's List<T> or Java ArrayList) -- you can index it at o(1), and - if you need to - iterate through it without creating an ObjectEnumerator to decrease GC sweep time.

Edit: and if all you need is some way to sort an array, may I suggest Timsort? For very large arrays of typical (similar) data it's going to be faster than pretty much anything else. Particularly with Y-sort or Z-ordering routines, the data is typically mostly-organized after one sort and this sorting algorithm can do educated guesses to "leap" past pre-sorted data pretty quick.


maltic(Posted 2014) [#13]
@muddy_shoes "What would the best way to do a binary search routine for lists" also "It is also worth pointing that that diddy has a Java-like ArrayList and an implementation of Quicksort"

@Nobuyuki Nice Timsort implementation, I didn't even know this existed!


muddy_shoes(Posted 2014) [#14]
"What would the best way to do a binary search routine for lists". Yes, that's what's known in the trade as the customer asking the wrong question. The posts show that the only reason a list is being used is because it has a sort function, presumably because sorted data is a prerequisite for a binary search. It's the equivalent of "How do I embed a sound file into an Excel spreadsheet so I can email it to a colleague?" -- the first part of the question is just a red herring.


maltic(Posted 2014) [#15]
@muddy_shoes I see what your saying, and it makes perfect sense. I just mentally parsed it differently. I heard list an thought "OK, this chap wants a container that he can iterate over, add elements too, and remove elements from", then I read binary search and thought "...and he also wants a fast contains query". Clearly the answer is some kind of Set. I think you (and most other people) just parsed it differently. Seeing the word sort and thinking, hey, why not simply sort an array yourself or use an ArrayList?

Since the OP wasn't clear, I'd say its up to him which solution is relevant to his problem.