I recently watched the apparently infamous CppCon 2014 talk by Mike Acton, called "Data-Oriented Design and C++".

Inspired, I started to reprogram a camera that I wrote as part of a school assignment. It's a simple camera that uses a 2D rendering API to create a simple 3D space of billboard sprites.

My implementation used templated Vector4 and Matrix4x4 structures that represent world-space points and various transformations respectively. Each billboard drawn has a dedicated class Sprite which holds a pointer to the 2D Sprite provided by the API, as well as a world-space position. I collect these in a std::vector of dynamically allocated Sprite *.

The first optimization I did was to redefine my std::vector<Sprite *> to a std::vector<Sprite>, in order to ensure that they are in adjacent memory. I also changed the std::vector to a stack array, and replaced the depth sorting with my own implementation of the Quicksort algorithm, since I could no longer trivially use std::sort. After these changes, there were no noticable improvements in performance.

There are more concerning limitations in my camera implementation that probably have a larger impact on the performance:

The first is that even though I'm using adjacent memory for my Sprite array, it internally allocates a render object for the render API I'm using on the heap, which may cause a bunch of cache misses when I set the render object's final render position anyway.

The transformation pipeline does unnecessary work that could be offloaded by storing results in memory, like calculating the camera position for each sprite twice, the first time only to sort by the depth.

Also, sorting is done on all sprites rather than only on the sprites that are not culled for the current camera state. It would be much more efficient to define a new collection of sprites that are visible.