The cost of performance vs. productivity: an example
Sometimes, as programmers, we find it hard to strike the balance between the need to do things fast and to make things run fast. I strongly believe in the second option.
I just started a personal project, a game, Rogue-like. The basis of such a game is always some sort of matrix, in which the character is moved. My conditions for this project are to write the game in C++, using as much of C++11 as possible, for learning reasons.
So the first idea was to create a Matrix class, which should be a vector of vectors; this idea fell quite fast, I didn’t really like it and didn’t feel like using this method for doing a Matrix. So, obviously, I thought of rolling my own. Unfortunately, I can’t really add the code (WordPress really thinks it’s too smart for its own good, and decided that my template code is really XML).
I created a Matrix template class, and only ran the initialization code. First I used a std::vector for keeping the matrix data, then I used a simple T*. For the data themselves, I used an Entity class and a NullEntity derived from Entity that sets the entity name to “NULL”. A matrix of 5000×5000 items was created. Below we have the creation times:
|Container type||Item type (T)||Run time|
|STL vector||shared_ptr||1410-1440 ms|
|STL vector||C++ pointer||290-310 ms|
|C++ vector (T*)||shared_ptr||1010-1030 ms|
|C++ vector (T*)||C++ pointer||100-110 ms|
So for 25 million items, the time cost of using a shared pointer is about 900ms (or 1100 ms in the case of reallocation done with the STL vector). This means that creating a little over 25000 items will take 1 ms.
The benefits of using shared_ptr are quite big: I would be able to reuse safely these objects in the program and the lifetime would be well controlled. Not having to check for every possible copy of every object is an improvement in development time. This and the fact that each cell should not have only one entity, but a stack of entities makes me think that the cost may be worth it. Is the cost acceptable? I don’t know. I shall think about it. But I certainly don’t like the fact that the running time jumps an order of magnitude.
Of course, I should optimize in different areas. I should not have in memory a 5000×5000 matrix, so my numbers should become acceptable. But early optimization can be the death of any project, therefore I shall go the C++11/STL way nonetheless. The optimization should be made at a higher level, at design level: for example, we should not have a matrix but a sparse matrix, and we should not have all the data in memory at one time. But this kind of decisions I shall make after I have at least something working, not now.
As a PS, an optimization trick: What’s the difference between:
T *ptr = _data;
for (unsigned i=0; i&ls;nItems; ptr++, i++)
*ptr = value
for (unsigned i=0; i&ls;nItems; i++)
_data[i] = value
Answer: around 10 ms for 25 million items.
PS: Looking at how difficult it is to post code with wordpress makes me seriously think about creating my own publishing platform.