Let;s talk about scenegraphs

BlitzMax Forums/BlitzMax Programming/Let;s talk about scenegraphs

JoshK(Posted 2007) [#1]
We had a discussion like this on here a while ago, but I can't find it and I was busy at the time. What I am looking for is a technique to keep the world object handling limited to the AABB of the camera frustum so that a scene with 500 visible object and 10,000 total objects will handle about the same speed as a scene with 500 objects. First let me establish a few ground ideas.

You have to have a huge number of objects and a very large scene many times bigger than the camera view range before scenegraphs provide any benefit.

For our purposes the world will be considered a 2D area and the y axis (height) will be ignored. The vertical span of almost any scene is insignificant compared to the horizontal distance.

Each entity has a known AABB, center, and radius. We want to avoid even having to go through a complete list of entities, since that itself would cause considerable slowdown with thousands of objects.

Here is our scene. The red rectangle represents the area we are interested in. The black dots are entities with different radii. We want to find all entities that lie inside or partially intersect the rectangle:


I have a technique worked out that uses an array of sectors, but I am more interested in a binary search. What would be really cool is a 2-dimensional TMap where I can grab all entities between points x0,z0 and x1,z1.

Let's assume the entity radii are negligible, and just work with their positions. We can sort all entities in an X TMap by their x position and a z TMap by their Z position. We can even return all entities within a certain range on both TMaps. But how to combine the results of the X and Z axes in a fast way?

Here the blue area represents the area that any entity within would be returned in an x axis query. The green is the z result. The red is the desired combination of the two, eliminating all entities not found in the other results.


I can handle everything up to the combination of the results from the TMap. It's easy to retrieve a TMap range of nodes.

Then the final step of elimination would be a sphere test for each remaining entity against the camera frustum.


Paposo(Posted 2007) [#2]
Hello.

¿Using arrays of arrays?

One array ordered by X. Each element of this array are another array ordered by Z
You select first for X and after traverse each selected elements for Z.

Bye,
Paposo


Dreamora(Posted 2007) [#3]
There are different approaches.

Paposos looks like a variant of a scanline algorithm which will definitely work.

A different approach would be the usage of a quad tree (as you are only interested in 2D dimensions)
If we are talking of your engine here: as you use a physics system behind, you might consider to checkout if it provides such a feature as well, otherwise you potentially do twice the same work, as physics, at least newer newton and physx, need such algorithms for fast collision testing and resting


ImaginaryHuman(Posted 2007) [#4]
Wouldn't you use like a BSP or Octree to divide up the space and make it easy to search? ie the Octree acts like bounding boxes?

Or maybe construct a linked list `web` of objects, linked based on their physical locations to each other, so that you are then basically doing a lookup of all connected items within a given coordinate area?


JoshK(Posted 2007) [#5]
You guys are all thinking in sector-based approaches, which are easy to visualize, but sort of more complicated. With a binary search, the size of the world or the number of sectors doesn't matter. It just works automatically with no adjustment of sectors size, initialization, etc. And it doesn't have the problem of efficiency of the sector size.

The scanline approach is pretty much what I was doing before. Each entity's position is just rounded off to find which sector it is in, and that information saved. But it requires a lot of data storage and a defined world size, and the technique I am describing feels a lot cleaner and simpler.

With the binary search I am describing, all entities would always be listed in order on the x axis:

entity2 - entity3 - entity 1 - entity0 - entity4

And on the z axis:

entity1 - entity2 - entity3 - entity4 - entity0

A query of all entities within the range x0-x1 might return this:
entity3 - entity1 - entity0

And a query of all entities within the range z0-z1 might return this:
entity2 - entity3 - entity4

We would then eliminate entities 4,1,0,2 from the results, since they do not occur in both results, and we would be left with entity3 as the only one within the 2D range in question.


Wayne(Posted 2007) [#6]
I like it, very fast and efficient, but how would one pick up the entities that lie outside your box but overlap ?

After thinking about it, I suppose each entity's bounding box coordinates would just be added to the list and sorted, and would work the same as each entity's origin.

It's great, I like it !


JoshK(Posted 2007) [#7]
I would only consider the entity position. Even a large mesh's radius would be insignificant compared to the size of the camera frustum. It would be up to the artist to make sure an enormous building, for example, is broken up into small enough pieces. We're talking like STALKER-size buildings that span a quarter mile. If two entities occur at the same x and/or z position, you just sort them with the super/Object sort method, which I believe just compares their memory addresses or handle numbers. It wouldn't matter how those two objects were sorted, as long as it was always the same.

This could be used to retrieve the entities within any arbitrary AABB of any size, in a simpler way than octrees or things with fixed sectors sizes.

The only problem is how to test the results of the X and Z queries against each other, in a non-linear way? Maybe take the X results, insert them all into a temporary TMap sorting them by Z position, then get the objects within the Z range from that TMap? That seems like a lot of data sorting.

Maybe I can temporarily cut off the ends of the z TMap, then check each X result to see if it occurs in the trimmed z tmap. The z-check would be binary, but going through the X results would be linear, and it doesn't feel right.

I think what's really needed is some kind of 2-dimensional TMap so a 2D range of nodes can be retrieved. But the TMap source is pretty hard to visualize, and a 2D version...ugghhh. Mark, save me.


Wayne(Posted 2007) [#8]
The only problem is how to test the results of the X and Z queries against each other, in a non-linear way?

So your thinking perhaps: Sort z, Step thru x and binary search z for the match?


Wayne(Posted 2007) [#9]
Are the objects static or dynamic ?

I can see this working in 2d or 3d, and the sorting goes hand in hand with the binary searches, and feels right to me.


JoshK(Posted 2007) [#10]
An object would just have its node position resorted any time the object moved. That would just be a matter of removing a TMap node and inserting a new one with the new position, fast enough for real-time. So static or dynamic, it would not matter. I don't think a hierarchical tree would work, because it would require rebuilding the tree any time something moved.

So close, but I can't seem to find a way to combine the axis data, either as the result of two separate binary searches, or combining them into a sort of quad search. I can't grab a 2D area quickly without sorting out a lot of outliers on either axis in a linear method.

I drew it out on paper and tried starting with the outer nodes on each axis and tracing their neighbors inwards, but it is possible to create isolated nodes, so that approach is not right.


dmaz(Posted 2007) [#11]
The only problem is how to test the results of the X and Z queries against each other, in a non-linear way? Maybe take the X results, insert them all into a temporary TMap sorting them by Z position, then get the objects within the Z range from that TMap? That seems like a lot of data sorting.

So close, but I can't seem to find a way to combine the axis data, either as the result of two separate binary searches, or combining them into a sort of quad search. I can't grab a 2D area quickly without sorting out a lot of outliers on either axis in a linear method.

unless I misunderstand... just take the x results and insert them into a TMap by "object"... then take the z results and see if the same object already exists... if it does, that's an object you're looking for.


JoshK(Posted 2007) [#12]
Yes, that is possible. The question is whether that approach is fast enough to be of use in real-time. It would require linear sorting of perhaps 10x the number of objects within the rectangle. What it basically amounts to is a binary search on one axis followed by a linear search on the other.

Here's a test I made using a scene graph with a fixed-size array. It sorts 100,000 points onscreen and only draws those within a rectangle. As you can see, the scenegraph mode is many times faster than the linear method, but it requires tuning of the sector size and how many objects there are. There's definitely an optimal balance:
http://www.leadwerks.com/post/SceneGraph.zip

Here's the source code. It's kind of nice because it is so simple. You just create a scenegraph with the scale and subdivision you want, then insert objects of any type into it based on their x and z positions using InsertSceneNode(). You can then use ForEachSceneNodeDo() specifying a rectangle and a callback function to perform for each object within the range. This is much faster than adding all the objects to a list. If you delete an object, just use RemoveSceneNode() and if you move it, just call InsertSceneNode() again and the program will detect the old scene node and remove it automatically.

Any values that overrun the edges of the scenegraph get clamped to the scenegraph area, so you don't have to worry about going outside the allowed area. Also, the scenegraph is centered at 0. In most cases you can just create it with the scale you want (number of sectors/size of the area you want covered) and then insert your entities based on their actual x,z positions.

Strict

Rem
bbdoc:
EndRem
Type TSceneGraph
	Field grid:TMap[,]
	Field size
	Field scale:Float
	Field nodes:TMap=New TMap
	
	Method sector:TMap(grid_x,grid_y)
		grid_x=Max(0,grid_x)
		grid_x=Min(size-1,grid_x)
		grid_y=Max(0,grid_y)
		grid_y=Min(size-1,grid_y)
		If Not grid[grid_x,grid_y] grid[grid_x,grid_y]=New TMap
		Return grid[grid_x,grid_y]
	EndMethod
	
EndType

Rem
bbdoc:
EndRem
Function CreateSceneGraph:TSceneGraph(size,scale#=1.0)
	Local graph:TSceneGraph=New TSceneGraph
	Local grid:TMap[size,size]
	graph.grid=grid
	graph.size=size
	graph.scale=scale
	Return graph
EndFunction

Rem
bbdoc:
EndRem
Function InsertSceneNode(graph:TSceneGraph,o:Object,grid_x,grid_y)
	grid_x=grid_x*graph.scale+graph.size/2
	grid_y=grid_Y*graph.scale+graph.size/2
	Local map:TMap=TMap(MapValueForKey(graph.nodes,o))
	If map MapRemove(map,o)
	If Not graph.grid[grid_x,grid_y] graph.grid[grid_x,grid_y]=New TMap
	graph.grid[grid_x,grid_y].insert o,o
	MapInsert graph.nodes,o,graph.grid[grid_x,grid_y]
EndFunction

Rem
bbdoc:
EndRem
Function RemoveSceneNode(graph:TSceneGraph,o:Object)
	Local map:TMap=TMap(MapValueForKey(graph.nodes,o))
	If map
		MapRemove(map,o)
		MapRemove(graph.nodes,o)
	EndIf
EndFunction

Rem
bbdoc:
EndRem
Function ForEachSceneNodeDo(graph:TSceneGraph,grid_x,grid_y,width,height,Callback:Int(o:Object))
	Local x,y
	Local o:Object
	
	grid_x=Floor(grid_x*graph.scale)+graph.size/2
	grid_y=Floor(grid_y*graph.scale)+graph.size/2
	width=Ceil(width*graph.scale)
	height=Ceil(height*graph.scale)
	
	If grid_x>graph.size-1 Return
	If grid_y>graph.size-1 Return
	If width<=0 Or height<=0 Return
	If grid_x<0
		width:+grid_x
		grid_x=0
	EndIf
	If grid_y<0
		height:+grid_y
		grid_y=0
	EndIf
	If grid_x+width>graph.size-1 width=graph.size-1-grid_x
	If grid_y+height>graph.size-1 height=graph.size-1-grid_y
	
	For x=grid_x To grid_x+width
		For y=grid_y To grid_y+height
			If graph.grid[x,y]
				For o=EachIn graph.grid[x,y].values()
					If Callback(o)=False Return
				Next
			EndIf
		Next
	Next
	
EndFunction




'---------------------------------------------------------
' Test Program
'---------------------------------------------------------

Graphics 512,512,0

Local list:TList=New TList

Type Point
	Field x,y
EndType

Local sectors=8
Local points=10000

Local scale#=sectors/512.0

Local graph:TSceneGraph=CreateSceneGraph(sectors,scale)

Global scenex,sceney

Local p:point

For Local n=1 To points
	p=New point
	p.x=Rand(0,511)
	p.y=Rand(0,511)
	InsertSceneNode(graph,p,p.x-256,p.y-256)
	list.addfirst(p)
Next

Local modename$="Scenegraph"

Global size=128,hsize

MoveMouse GraphicsWidth()/2,GraphicsHeight()/2

Global FPSLASTUPDATETIME

Local mode,time

While Not KeyHit(KEY_ESCAPE)

	If AppTerminate() End
	
	Cls()
	
	scenex=MouseX()
	sceney=MouseY()
	
	If KeyHit(KEY_MINUS) size=Max(10,size-10)
	If KeyHit(KEY_EQUALS) size:+10

	hsize=size/2
	
	Select mode
	Case 0
		ForEachSceneNodeDo(graph,scenex-hsize-256,sceney-hsize-256,size,size,Callback)
	Case 1
		For p=EachIn list
			If pointinrect(p.x,p.y,scenex-hsize,sceney-hsize,size,size) Plot p.x,p.y
		Next
	EndSelect
	
	If KeyHit(KEY_SPACE)
		mode=Not mode
		Select mode
		Case 0 modename$="Scenegraph"
		Case 1 modename$="Linear"
		EndSelect
	EndIf
	
	time=MilliSecs()
	
	SetColor 255,255,255
	DrawText "UPS: "+FPS(),0,0
	DrawText "Mode: "+modename,0,20
	DrawText "space: toggle mode",0,40
	DrawText "-/+: change rect size",0,60
	Flip 0
	
	FPSLASTUPDATETIME=FPSLASTUPDATETIME+(MilliSecs()-time)

Wend

End

Function Callback:Int(o:Object)
	Local p:Point=Point(o)
	If Not pointinrect(p.x,p.y,scenex-hsize,sceney-hsize,size,size) Return 1
	Plot p.x,p.y
	Return 1
EndFunction

Function PointInRect:Int(x0,y0,x1,y1,w,h)
	If x0<x1 Return 0
	If y0<y1 Return 0
	If x0>x1+w Return 0
	If y0>y1+h Return 0
	Return 1
EndFunction

Function FPS#(frequency=200)
	Global FPSFRAMECOUNT
	Global FPSLASTCOUNT#
	Local time=MilliSecs()
	FPSFRAMECOUNT=FPSFRAMECOUNT+1
	Local elapsed#=time-FPSLASTUPDATETIME
	If elapsed>=frequency
		FPSLASTCOUNT=FPSFRAMECOUNT/elapsed*1000.0
		FPSFRAMECOUNT=0
		FPSLASTUPDATETIME=time
	EndIf
	Return FPSLASTCOUNT
End Function 



Regular K(Posted 2007) [#13]
*ahem*

Yoink! I'll be keeping that. I'll be damned if I know how it works, but it works well.


JoshK(Posted 2007) [#14]
Here's a slightly updated version. It fixes a potential crash and also uses a more intuitive scaling method.

This isn't really anything all that advanced, but it works and it's simple. If you have the right sector size relative to the camera range, there's really no limit to how big your scene can be, once it gets out of the camera range. I ran a scene with 1,000,000 objects and 300,000,000 polys and since it only drew what was in the immediate area, I got the maximum framerate.




JoshK(Posted 2007) [#15]
I implemented a faster improved version of the above technique and added a routine that sorts any scene into batches of like meshes. Without actually rendering anything, the framerate is over 1000 FPS with one million meshes, so the sorting and culling overhead is pretty minimal.

The next step is to switch over to the instanced rendering on the GE Force 8 series. I am very curious to see what the FPS counter will read if I implement GPU batches with this scene.




Wayne(Posted 2007) [#16]
I almost wet myself, will this all be integrated into LWE ?
I'd like to run that demo, can you please post it ?

Good stuff!


JoshK(Posted 2007) [#17]
I'm going to implement GPU instancing, but then it will only run on the GE Force 8 series. This is not for the current engine.


Dreamora(Posted 2007) [#18]
Just one idea, don't know why I did not came up with this sorting method before:

Why don't you use just RadixSort / BucketSort (the books intermix them sadly a little).

Those sorting algorithms are great for sorting either strings or other elements that consists of different "bases" (numbers with 1, 10, 100, 1000 positions, strings with their positions, but as well 2D and 3D data where each coordinate is just a new basevector to a vector space).
The only thing you would need to do there is to decide on the most usefull "bucket size" ie the units in world space which form a bucket / sector


RadixSort is one of the fastest sort algorithms known, it can be programmed in a parallel way without too many problems and above all it is not comparision based like others which further reduces the workload


marksibly(Posted 2007) [#19]
Hi,

BSP/Quadtree/OctTree will be as fast as anything TMap based, as they are all essentially 'log(n)' algorithms. QuadTrees and OctTrees are really just 'unrolled' BSP trees.

Done right, a binary tree based geometric algorithm will be faster than a TMap algorithm because it knows more about the data - but the complexity will be the same so it will scale the same.

However, there is a practical 'constant time' solution to this given that you're essentially working in 2D:

1) Create an x/z array of lists, where each element contains a list of entities that intersect the x/z cell.

2) At rendertime, manually render a top-down 'shadow' of the view frustum on to the array so that you can work out which array cells lie within the frustum.

3) Render the stuff within each 'visible' array cell.

This is *constant complexity*, which means it doesn't matter how big the scene, it will be the same speed (ignoring array size issues!).


JoshK(Posted 2007) [#20]
Mark, that is exactly what I have done. :)

Although a hierarchal approach could potentially save time and provide finer culling in some circumstances, I have found that overall the 2D array approach is much faster. I mean, it's pretty much instantaneous. You can add a single entity to multiple cells if it spans them. To avoid "grabbing" the same entity twice, just add an iterator attached to the object, so it gets skipped if it has already been set to the current function iterator.

By the way, GPU instancing gave me a free 30-40% speed boost on the GE Force 8 series, and it was very easy to implement. The way it makes you compile an array of matrices actually gave me an idea to speed up the older batched VBO approach as well.

This is pretty cool because even if the world contains 10,000,000 objects, the renderer never even sees them, and the number of objects in lists you have to go through is pretty low. Still, I have found arrays and memcopy() to be so much faster when dealing with large numbers, it would be nice to figure out a way to use them more. Of course, I am testing with an unrealistic number of objects, so maybe I should just let it be.




Filax(Posted 2007) [#21]
Good job :)


Chroma(Posted 2007) [#22]
This sounds pretty remarkable. What's it for? ....


ImaginaryHuman(Posted 2007) [#23]
How would you go about using your method (Leadwerks) for efficient collision detection and also how would it handle a scene where every single entity is dynamic in size, shape and position?

Also same question to you Mark, how would a frustum-shaped scanline polygon-type array traversal perform for dynamic moving objects and for collision checks?

If there are lots of empty spaces between objects then you're looking at a lot of cells only to find they are empty, right? And the more objects move around, since you're using spacial partitioning you have to remove the objects from a cell and reinsert it in another (or up to 4 others)?

Is this mainly only a good technique for static scenes?


Scaremonger(Posted 2007) [#24]
How do you go about tweaking the sectors and size for optimal performance?


Chroma(Posted 2009) [#25]
So is it possible to combine this scenegraph stuff with the BlitzTiles terrain engine?


Nate the Great(Posted 2009) [#26]
for even more speedup, dont use tlists, use arrays.


JoshK(Posted 2009) [#27]
What I have found, after years of using different approaches, is that no automatic technique can come close to hand-placed zones. This works in a variety of scene sizes and complexities, and there's really no way to automate it efficiently.


Nate the Great(Posted 2009) [#28]
ok so what exactly is this doing? in the demo with 100,000 pixels, is it putting them in a tmap and then only drawing the ones in the box? so they must be static :( what if they are moving?


Arowx(Posted 2009) [#29]
Cool OK so I could use this in a 2d game to limit the visual range of what's drawn!

What about collision detection tests, could I use this get potential collisions quickly?

Regards

Merx


Nate the Great(Posted 2009) [#30]
What about collision detection tests, could I use this get potential collisions quickly?


there are thousands of ways to do this... ok maybe hundreds but theres a lot of options. maybe we should start a new thread for this but here is what I have found fastest.

1. If objects in your game are unlikely to move very far very fast then simply make a tlist of objects within a certain radius loop through those every frame to check for collisions. now every say 10-60 frames, depending on how far objects move every frame, you need to update the tlist, ie clear it and find the objects that are close now. This can be made almost instant if done in the background with threading :)

2. My personal favorite: fixed grid algorithm. Make an array (2d for simplicity, or 1d for optimization) of grid cells that have pointers to tlists as follows.



then before you check collisions, erase all tlists and put the particles/players in the tlist of the square they are in. Then all you need to do is check for collisions in the gridcells around them for example:



You know the red dot would never collide with the green or yellow dots because it is not in an touching cell, however the red dot may collide with the blue dot because their cells are touching.


Chroma(Posted 2009) [#31]
Heck, I just figured out that the BlitzTiles terrain is broken up into x about of segments as designated by the BT_Tiles variable and stored in an array. If BT_Tiles is set to 8 then it breaks up the terragen map into an 8x8 grid!

Hehehehe! I can DEFINITELY use the above scenegraph stuff to hide the terrain segments individually. Here's to endless terrain with no frame hit. Zing!!