Optimizing opengl?

Monkey Forums/Monkey Programming/Optimizing opengl?

Sicilica(Posted 2016) [#1]
I've been playing with 3D recently (as per usual), but I get heavy slowdown even when drawing only around 500 polys, and I'm not sure why.

My app only uses about 30-40mb of RAM while running, and most of the time it runs at a stable 60fps while using under 2% of my CPU, but sometimes it will dip to really inconsistent fps in the 40 for a while and use 20-30% (though it will never use more to go any faster). Most of the time it will run slow like that when it first starts up and then eventually settle into 60fps, but it continues to dip periodically as long as it runs.

The source is at https://drive.google.com/open?id=0B9dkZ_sSmEVhZ2ZrbWlEOHhyUms in case anyone wants to look at it, but unfortunately it's hard to say that one part of it or another is the problem, so I don't see how I can post a snippet. Instead, I'll just try to explain my reasoning on a few points and maybe one of you can tell me where I'm mistaken.


So the model format I'm using doesn't store any position information for vertices, instead containing only the rigging information for the bones, then all vertices must be calculated from the bone structure. The biggest drain on the CPU is almost certainly calculating the positions of all those vertices every frame. While the positions for those vertices could be cached on keyframes of the animation, I need to smoothly interpolate the animation, so all the per-vertex work needs to be done per-frame anyway (though I would certainly cache any static position when I start optimizing, like if an idle frame would be held for a few seconds or whatever).

Calculating those vertex positions would be a really good job for CUDA cores, rather than the CPU, since they could all run in parallel and lockstep very easily. I would just need to figure out the maximum number of weights that any one vertex can possibly have. Unfortunately, I have no experience in using a GPU to do anything other than draw polygons (you know, cracking passwords or whatever other logic problems), so I'm not sure how you would go about it. A geometry shader perhaps would be a really clean way to do that, but we only have access to vertex and fragment shaders with GLES unless I'm mistaken?

I'm also sending all of the position information to the GPU on every draw call (ie, every frame), which seems like an expensive amount of data to transfer to me - but from what I understand that's what you're supposed to be doing? It's possible I just grossly misunderstood VAOs at some point.

Since the slowdowns are inconsistent and last for a while, it would make a little sense if I was introducing a bunch of overhead in the quaternion code under some situation like a divide by zero or something - but since I'm using a fixed "time" step to move the animation by each time I draw it, it should be a completely deterministic system. It only drifts from exact intervals between frames by whatever floating point errors are accumulating, which is, again, deterministic.

The object-oriented format I'm storing models in probably adds a lot of overhead with pointer dereferencing too - but I doubt it's significant compared to the heavy calculations that are happening?

It's also worth noting that a lot of the code is pretty rough and unoptimized (especially the code to load models into memory...), but I'm only worrying about the work that needs to be done every frame right now, since clearly loading has nothing to do with it.



So, yeah. Anyone have any experience with 3D and got an idea what's going on?


nullterm(Posted 2016) [#2]
You shouldn't calculate vertex position etc per frame on the CPU. That's what the GPU is for.

Vertex data should be uploaded via vertex buffers to the GPU at load time, once.

Then when drawing, your vertex and fragment shaders do the math to transform (and skin if applicable) from model space to world/camera space to draw. Atleast for GLES2.

CPU Monkey code should be only telling the GPU what model (vertex buffer) to draw and where (transform, skin pose, etc).

GPU shaders will always do it far far far faster than you can on the CPU.


Sicilica(Posted 2016) [#3]
So, send the rigging data to the vertex shader then send the skeleton data in uniforms as it animates?

I guess that makes a lot of sense, except I'll have to decide the biggest number of bones I would ever have in a single model so I can set up those uniforms. To clarify, I would want to allocate a separate VAO for each mesh that I load, right? I've only ever used a single VAO for each attribute and loaded the data per render call, but I guess that is a lot of bandwidth to use.

Thanks!


nullterm(Posted 2016) [#4]
Vertex buffers should have: position, normal, and rigging info like which bone each vertex is weighted to (or bones and weights to each one, but that's a step up).

Then you upload the model's transformation matrix and the matrices/quaternions+translation of the bones as a uniform array to the shader.

In the vertex shader, you grab the bone index (and/or bone weighting) from the vertex attributes (from the vertex buffer) and use the transform for that bone.

Then you tell GLES to draw your model, and the GPU does all the matrix math for you.