Let;s talk about scenegraphs
BlitzMax Forums/BlitzMax Programming/Let;s talk about scenegraphs
| ||
We had a discussion like this on here a while ago, but I can't find it and I was busy at the time. What I am looking for is a technique to keep the world object handling limited to the AABB of the camera frustum so that a scene with 500 visible object and 10,000 total objects will handle about the same speed as a scene with 500 objects. First let me establish a few ground ideas. You have to have a huge number of objects and a very large scene many times bigger than the camera view range before scenegraphs provide any benefit. For our purposes the world will be considered a 2D area and the y axis (height) will be ignored. The vertical span of almost any scene is insignificant compared to the horizontal distance. Each entity has a known AABB, center, and radius. We want to avoid even having to go through a complete list of entities, since that itself would cause considerable slowdown with thousands of objects. Here is our scene. The red rectangle represents the area we are interested in. The black dots are entities with different radii. We want to find all entities that lie inside or partially intersect the rectangle: I have a technique worked out that uses an array of sectors, but I am more interested in a binary search. What would be really cool is a 2-dimensional TMap where I can grab all entities between points x0,z0 and x1,z1. Let's assume the entity radii are negligible, and just work with their positions. We can sort all entities in an X TMap by their x position and a z TMap by their Z position. We can even return all entities within a certain range on both TMaps. But how to combine the results of the X and Z axes in a fast way? Here the blue area represents the area that any entity within would be returned in an x axis query. The green is the z result. The red is the desired combination of the two, eliminating all entities not found in the other results. I can handle everything up to the combination of the results from the TMap. It's easy to retrieve a TMap range of nodes. Then the final step of elimination would be a sphere test for each remaining entity against the camera frustum. |
| ||
Hello. ¿Using arrays of arrays? One array ordered by X. Each element of this array are another array ordered by Z You select first for X and after traverse each selected elements for Z. Bye, Paposo |
| ||
There are different approaches. Paposos looks like a variant of a scanline algorithm which will definitely work. A different approach would be the usage of a quad tree (as you are only interested in 2D dimensions) If we are talking of your engine here: as you use a physics system behind, you might consider to checkout if it provides such a feature as well, otherwise you potentially do twice the same work, as physics, at least newer newton and physx, need such algorithms for fast collision testing and resting |
| ||
Wouldn't you use like a BSP or Octree to divide up the space and make it easy to search? ie the Octree acts like bounding boxes? Or maybe construct a linked list `web` of objects, linked based on their physical locations to each other, so that you are then basically doing a lookup of all connected items within a given coordinate area? |
| ||
You guys are all thinking in sector-based approaches, which are easy to visualize, but sort of more complicated. With a binary search, the size of the world or the number of sectors doesn't matter. It just works automatically with no adjustment of sectors size, initialization, etc. And it doesn't have the problem of efficiency of the sector size. The scanline approach is pretty much what I was doing before. Each entity's position is just rounded off to find which sector it is in, and that information saved. But it requires a lot of data storage and a defined world size, and the technique I am describing feels a lot cleaner and simpler. With the binary search I am describing, all entities would always be listed in order on the x axis: entity2 - entity3 - entity 1 - entity0 - entity4 And on the z axis: entity1 - entity2 - entity3 - entity4 - entity0 A query of all entities within the range x0-x1 might return this: entity3 - entity1 - entity0 And a query of all entities within the range z0-z1 might return this: entity2 - entity3 - entity4 We would then eliminate entities 4,1,0,2 from the results, since they do not occur in both results, and we would be left with entity3 as the only one within the 2D range in question. |
| ||
I like it, very fast and efficient, but how would one pick up the entities that lie outside your box but overlap ? After thinking about it, I suppose each entity's bounding box coordinates would just be added to the list and sorted, and would work the same as each entity's origin. It's great, I like it ! |
| ||
I would only consider the entity position. Even a large mesh's radius would be insignificant compared to the size of the camera frustum. It would be up to the artist to make sure an enormous building, for example, is broken up into small enough pieces. We're talking like STALKER-size buildings that span a quarter mile. If two entities occur at the same x and/or z position, you just sort them with the super/Object sort method, which I believe just compares their memory addresses or handle numbers. It wouldn't matter how those two objects were sorted, as long as it was always the same. This could be used to retrieve the entities within any arbitrary AABB of any size, in a simpler way than octrees or things with fixed sectors sizes. The only problem is how to test the results of the X and Z queries against each other, in a non-linear way? Maybe take the X results, insert them all into a temporary TMap sorting them by Z position, then get the objects within the Z range from that TMap? That seems like a lot of data sorting. Maybe I can temporarily cut off the ends of the z TMap, then check each X result to see if it occurs in the trimmed z tmap. The z-check would be binary, but going through the X results would be linear, and it doesn't feel right. I think what's really needed is some kind of 2-dimensional TMap so a 2D range of nodes can be retrieved. But the TMap source is pretty hard to visualize, and a 2D version...ugghhh. Mark, save me. |
| ||
The only problem is how to test the results of the X and Z queries against each other, in a non-linear way? So your thinking perhaps: Sort z, Step thru x and binary search z for the match? |
| ||
Are the objects static or dynamic ? I can see this working in 2d or 3d, and the sorting goes hand in hand with the binary searches, and feels right to me. |
| ||
An object would just have its node position resorted any time the object moved. That would just be a matter of removing a TMap node and inserting a new one with the new position, fast enough for real-time. So static or dynamic, it would not matter. I don't think a hierarchical tree would work, because it would require rebuilding the tree any time something moved. So close, but I can't seem to find a way to combine the axis data, either as the result of two separate binary searches, or combining them into a sort of quad search. I can't grab a 2D area quickly without sorting out a lot of outliers on either axis in a linear method. I drew it out on paper and tried starting with the outer nodes on each axis and tracing their neighbors inwards, but it is possible to create isolated nodes, so that approach is not right. |
| ||
The only problem is how to test the results of the X and Z queries against each other, in a non-linear way? Maybe take the X results, insert them all into a temporary TMap sorting them by Z position, then get the objects within the Z range from that TMap? That seems like a lot of data sorting. So close, but I can't seem to find a way to combine the axis data, either as the result of two separate binary searches, or combining them into a sort of quad search. I can't grab a 2D area quickly without sorting out a lot of outliers on either axis in a linear method. unless I misunderstand... just take the x results and insert them into a TMap by "object"... then take the z results and see if the same object already exists... if it does, that's an object you're looking for. |
| ||
Yes, that is possible. The question is whether that approach is fast enough to be of use in real-time. It would require linear sorting of perhaps 10x the number of objects within the rectangle. What it basically amounts to is a binary search on one axis followed by a linear search on the other. Here's a test I made using a scene graph with a fixed-size array. It sorts 100,000 points onscreen and only draws those within a rectangle. As you can see, the scenegraph mode is many times faster than the linear method, but it requires tuning of the sector size and how many objects there are. There's definitely an optimal balance: http://www.leadwerks.com/post/SceneGraph.zip Here's the source code. It's kind of nice because it is so simple. You just create a scenegraph with the scale and subdivision you want, then insert objects of any type into it based on their x and z positions using InsertSceneNode(). You can then use ForEachSceneNodeDo() specifying a rectangle and a callback function to perform for each object within the range. This is much faster than adding all the objects to a list. If you delete an object, just use RemoveSceneNode() and if you move it, just call InsertSceneNode() again and the program will detect the old scene node and remove it automatically. Any values that overrun the edges of the scenegraph get clamped to the scenegraph area, so you don't have to worry about going outside the allowed area. Also, the scenegraph is centered at 0. In most cases you can just create it with the scale you want (number of sectors/size of the area you want covered) and then insert your entities based on their actual x,z positions. Strict Rem bbdoc: EndRem Type TSceneGraph Field grid:TMap[,] Field size Field scale:Float Field nodes:TMap=New TMap Method sector:TMap(grid_x,grid_y) grid_x=Max(0,grid_x) grid_x=Min(size-1,grid_x) grid_y=Max(0,grid_y) grid_y=Min(size-1,grid_y) If Not grid[grid_x,grid_y] grid[grid_x,grid_y]=New TMap Return grid[grid_x,grid_y] EndMethod EndType Rem bbdoc: EndRem Function CreateSceneGraph:TSceneGraph(size,scale#=1.0) Local graph:TSceneGraph=New TSceneGraph Local grid:TMap[size,size] graph.grid=grid graph.size=size graph.scale=scale Return graph EndFunction Rem bbdoc: EndRem Function InsertSceneNode(graph:TSceneGraph,o:Object,grid_x,grid_y) grid_x=grid_x*graph.scale+graph.size/2 grid_y=grid_Y*graph.scale+graph.size/2 Local map:TMap=TMap(MapValueForKey(graph.nodes,o)) If map MapRemove(map,o) If Not graph.grid[grid_x,grid_y] graph.grid[grid_x,grid_y]=New TMap graph.grid[grid_x,grid_y].insert o,o MapInsert graph.nodes,o,graph.grid[grid_x,grid_y] EndFunction Rem bbdoc: EndRem Function RemoveSceneNode(graph:TSceneGraph,o:Object) Local map:TMap=TMap(MapValueForKey(graph.nodes,o)) If map MapRemove(map,o) MapRemove(graph.nodes,o) EndIf EndFunction Rem bbdoc: EndRem Function ForEachSceneNodeDo(graph:TSceneGraph,grid_x,grid_y,width,height,Callback:Int(o:Object)) Local x,y Local o:Object grid_x=Floor(grid_x*graph.scale)+graph.size/2 grid_y=Floor(grid_y*graph.scale)+graph.size/2 width=Ceil(width*graph.scale) height=Ceil(height*graph.scale) If grid_x>graph.size-1 Return If grid_y>graph.size-1 Return If width<=0 Or height<=0 Return If grid_x<0 width:+grid_x grid_x=0 EndIf If grid_y<0 height:+grid_y grid_y=0 EndIf If grid_x+width>graph.size-1 width=graph.size-1-grid_x If grid_y+height>graph.size-1 height=graph.size-1-grid_y For x=grid_x To grid_x+width For y=grid_y To grid_y+height If graph.grid[x,y] For o=EachIn graph.grid[x,y].values() If Callback(o)=False Return Next EndIf Next Next EndFunction '--------------------------------------------------------- ' Test Program '--------------------------------------------------------- Graphics 512,512,0 Local list:TList=New TList Type Point Field x,y EndType Local sectors=8 Local points=10000 Local scale#=sectors/512.0 Local graph:TSceneGraph=CreateSceneGraph(sectors,scale) Global scenex,sceney Local p:point For Local n=1 To points p=New point p.x=Rand(0,511) p.y=Rand(0,511) InsertSceneNode(graph,p,p.x-256,p.y-256) list.addfirst(p) Next Local modename$="Scenegraph" Global size=128,hsize MoveMouse GraphicsWidth()/2,GraphicsHeight()/2 Global FPSLASTUPDATETIME Local mode,time While Not KeyHit(KEY_ESCAPE) If AppTerminate() End Cls() scenex=MouseX() sceney=MouseY() If KeyHit(KEY_MINUS) size=Max(10,size-10) If KeyHit(KEY_EQUALS) size:+10 hsize=size/2 Select mode Case 0 ForEachSceneNodeDo(graph,scenex-hsize-256,sceney-hsize-256,size,size,Callback) Case 1 For p=EachIn list If pointinrect(p.x,p.y,scenex-hsize,sceney-hsize,size,size) Plot p.x,p.y Next EndSelect If KeyHit(KEY_SPACE) mode=Not mode Select mode Case 0 modename$="Scenegraph" Case 1 modename$="Linear" EndSelect EndIf time=MilliSecs() SetColor 255,255,255 DrawText "UPS: "+FPS(),0,0 DrawText "Mode: "+modename,0,20 DrawText "space: toggle mode",0,40 DrawText "-/+: change rect size",0,60 Flip 0 FPSLASTUPDATETIME=FPSLASTUPDATETIME+(MilliSecs()-time) Wend End Function Callback:Int(o:Object) Local p:Point=Point(o) If Not pointinrect(p.x,p.y,scenex-hsize,sceney-hsize,size,size) Return 1 Plot p.x,p.y Return 1 EndFunction Function PointInRect:Int(x0,y0,x1,y1,w,h) If x0<x1 Return 0 If y0<y1 Return 0 If x0>x1+w Return 0 If y0>y1+h Return 0 Return 1 EndFunction Function FPS#(frequency=200) Global FPSFRAMECOUNT Global FPSLASTCOUNT# Local time=MilliSecs() FPSFRAMECOUNT=FPSFRAMECOUNT+1 Local elapsed#=time-FPSLASTUPDATETIME If elapsed>=frequency FPSLASTCOUNT=FPSFRAMECOUNT/elapsed*1000.0 FPSFRAMECOUNT=0 FPSLASTUPDATETIME=time EndIf Return FPSLASTCOUNT End Function |
| ||
*ahem* Yoink! I'll be keeping that. I'll be damned if I know how it works, but it works well. |
| ||
Here's a slightly updated version. It fixes a potential crash and also uses a more intuitive scaling method. This isn't really anything all that advanced, but it works and it's simple. If you have the right sector size relative to the camera range, there's really no limit to how big your scene can be, once it gets out of the camera range. I ran a scene with 1,000,000 objects and 300,000,000 polys and since it only drew what was in the immediate area, I got the maximum framerate. |
| ||
I implemented a faster improved version of the above technique and added a routine that sorts any scene into batches of like meshes. Without actually rendering anything, the framerate is over 1000 FPS with one million meshes, so the sorting and culling overhead is pretty minimal. The next step is to switch over to the instanced rendering on the GE Force 8 series. I am very curious to see what the FPS counter will read if I implement GPU batches with this scene. |
| ||
I almost wet myself, will this all be integrated into LWE ? I'd like to run that demo, can you please post it ? Good stuff! |
| ||
I'm going to implement GPU instancing, but then it will only run on the GE Force 8 series. This is not for the current engine. |
| ||
Just one idea, don't know why I did not came up with this sorting method before: Why don't you use just RadixSort / BucketSort (the books intermix them sadly a little). Those sorting algorithms are great for sorting either strings or other elements that consists of different "bases" (numbers with 1, 10, 100, 1000 positions, strings with their positions, but as well 2D and 3D data where each coordinate is just a new basevector to a vector space). The only thing you would need to do there is to decide on the most usefull "bucket size" ie the units in world space which form a bucket / sector RadixSort is one of the fastest sort algorithms known, it can be programmed in a parallel way without too many problems and above all it is not comparision based like others which further reduces the workload |
| ||
Hi, BSP/Quadtree/OctTree will be as fast as anything TMap based, as they are all essentially 'log(n)' algorithms. QuadTrees and OctTrees are really just 'unrolled' BSP trees. Done right, a binary tree based geometric algorithm will be faster than a TMap algorithm because it knows more about the data - but the complexity will be the same so it will scale the same. However, there is a practical 'constant time' solution to this given that you're essentially working in 2D: 1) Create an x/z array of lists, where each element contains a list of entities that intersect the x/z cell. 2) At rendertime, manually render a top-down 'shadow' of the view frustum on to the array so that you can work out which array cells lie within the frustum. 3) Render the stuff within each 'visible' array cell. This is *constant complexity*, which means it doesn't matter how big the scene, it will be the same speed (ignoring array size issues!). |
| ||
Mark, that is exactly what I have done. :) Although a hierarchal approach could potentially save time and provide finer culling in some circumstances, I have found that overall the 2D array approach is much faster. I mean, it's pretty much instantaneous. You can add a single entity to multiple cells if it spans them. To avoid "grabbing" the same entity twice, just add an iterator attached to the object, so it gets skipped if it has already been set to the current function iterator. By the way, GPU instancing gave me a free 30-40% speed boost on the GE Force 8 series, and it was very easy to implement. The way it makes you compile an array of matrices actually gave me an idea to speed up the older batched VBO approach as well. This is pretty cool because even if the world contains 10,000,000 objects, the renderer never even sees them, and the number of objects in lists you have to go through is pretty low. Still, I have found arrays and memcopy() to be so much faster when dealing with large numbers, it would be nice to figure out a way to use them more. Of course, I am testing with an unrealistic number of objects, so maybe I should just let it be. |
| ||
Good job :) |
| ||
This sounds pretty remarkable. What's it for? .... |
| ||
How would you go about using your method (Leadwerks) for efficient collision detection and also how would it handle a scene where every single entity is dynamic in size, shape and position? Also same question to you Mark, how would a frustum-shaped scanline polygon-type array traversal perform for dynamic moving objects and for collision checks? If there are lots of empty spaces between objects then you're looking at a lot of cells only to find they are empty, right? And the more objects move around, since you're using spacial partitioning you have to remove the objects from a cell and reinsert it in another (or up to 4 others)? Is this mainly only a good technique for static scenes? |
| ||
How do you go about tweaking the sectors and size for optimal performance? |
| ||
So is it possible to combine this scenegraph stuff with the BlitzTiles terrain engine? |
| ||
for even more speedup, dont use tlists, use arrays. |
| ||
What I have found, after years of using different approaches, is that no automatic technique can come close to hand-placed zones. This works in a variety of scene sizes and complexities, and there's really no way to automate it efficiently. |
| ||
ok so what exactly is this doing? in the demo with 100,000 pixels, is it putting them in a tmap and then only drawing the ones in the box? so they must be static :( what if they are moving? |
| ||
Cool OK so I could use this in a 2d game to limit the visual range of what's drawn! What about collision detection tests, could I use this get potential collisions quickly? Regards Merx |
| ||
What about collision detection tests, could I use this get potential collisions quickly? there are thousands of ways to do this... ok maybe hundreds but theres a lot of options. maybe we should start a new thread for this but here is what I have found fastest. 1. If objects in your game are unlikely to move very far very fast then simply make a tlist of objects within a certain radius loop through those every frame to check for collisions. now every say 10-60 frames, depending on how far objects move every frame, you need to update the tlist, ie clear it and find the objects that are close now. This can be made almost instant if done in the background with threading :) 2. My personal favorite: fixed grid algorithm. Make an array (2d for simplicity, or 1d for optimization) of grid cells that have pointers to tlists as follows. then before you check collisions, erase all tlists and put the particles/players in the tlist of the square they are in. Then all you need to do is check for collisions in the gridcells around them for example: You know the red dot would never collide with the green or yellow dots because it is not in an touching cell, however the red dot may collide with the blue dot because their cells are touching. |
| ||
Heck, I just figured out that the BlitzTiles terrain is broken up into x about of segments as designated by the BT_Tiles variable and stored in an array. If BT_Tiles is set to 8 then it breaks up the terragen map into an 8x8 grid! Hehehehe! I can DEFINITELY use the above scenegraph stuff to hide the terrain segments individually. Here's to endless terrain with no frame hit. Zing!! |