The pipe dream of O(1) memory allocation
There is a piece of well established knowledge, that says that memory allocation in managed languages is a linear operation. It is so, everyone knows that. But here are some things you need to consider: Even finding an item in an unsorted list is O(1) if the item is first in the list, all the time.
So let’s discuss about how the O(1) memory allocation works. It’s so simple (obviously, since it’s O(1)) that anyone can understand it. The algorithm is described below:
- Memorize the current heap pointer and increase the heap top pointer with the number of bytes required for the memory allocation.
- Success! Return the memorized heap pointer and call the constructor!
O(1)! Constant time! Success! Dorin, see, it was simple! Let’s rejoice and do a victory dance! Move to managed languages! All those people doing it wrong in C/C++ were wrong! Etc!
Now let’s examine the first step. Memorize the current heap pointer and increase. What if we are on two separate threads, and we try to allocate memory at the same time? We’ll memorize the same heap pointer but … Oh, oh, nevermind. We will use synchronization… Yay?!?!
No yay. Synchronization is a costy operation; depending on the implementation it may or may not require a userspace-kernel switch. But let’s assume that the best engineers in the world implemented the best lock free method; can it be really be done? Maybe, by using something like the Interlocked family of functions on Windows, but most likely it isn’t. Why? Let us continue our investigation, keeping in mind that after allocation the memory is used by the constructor (or zeroed by the virtual machine environment).
What happens if we reach the end of the memory space allocated for the heap? Oh, but that would never happen, because we have an address space of… Oh, wait.
Pipe dream broken. Because no program will ever be allocated 2GB of memory from the start. It will probably have a bit of memory, and then slightly more and slightly more until it actually grows to a respectable size. And now you will have to deal with the OS memory allocator, which is a whole different beast altogether. It handles memory pages, it handles swap space, it handles mapping between real mode and protected mode addresses. And this is definitely not an O(1) operation, since there are a lot of memory pages to be handled, and very few chances to grow in a constant time. Add to this synchronization inside the kernel, which probably will freeze other processes or threads from performing similar operations (or other operations if you have something like an infamous “Big Kernel Lock“), and you have a clearer picture about what’s going on.
And we’re only skimming the surface. That memory returned to the user, but what happens in the optimistic case, in which new pages of memory are not allocated by the kernel? Nothing much; the worst thing that could happen is to hit a major page fault, and bring back unused memory from the swap. Why would you do that?
Because life is tough. Let’s say you have an algorithm that allocates a lot of memory, it frees it, then another algorithm uses a similar amount of memory. When it frees it, let’s say the garbage collector kicks in, and clears stuff, and compacts the heap.
All is fun and games, until you realize that that the garbage collector can choose to dispose of the freed memory pages, and release them back to the operating system. Or, hopefully, it will choose to keep those memory pages around. But, you know, unused memory may at any time be swapped on disk by an operating system under memory requirements pressure i.e. with not enough resources.
So what will happen is what could happen at any memory access (but it’s more likely during a ‘new’ operation) to have a hard page fault, and require more complicated operations to happen in the background.
I wish that the story would be over here. It’s a clear exposition of how a constant time can go wrong, and the problem is that it probably will. The discussion is not over here, though.
When you allocate an object, you also allocate space for all the members of that objects. But probably you will not only have non-string primitive types in your objects. You will probably have more objects and more objects. The more the merrier. In C++ you have something that is quite fun, and that is the fact that you don’t need to hold a pointer to every object you contain in your class; you can have directly the object, inlined in memory. But you don’t really have this option in managed languages. So for each object you will have more memory allocations (including the kernel roundtrip for using the critical section, I am not sure that it can happen without the kernel roundtrip, but even so, it is an expensive operation). We are not talking about the code that runs in the constructor, since that would be unfair (that is a cost that really depends only on the implementer, and they should be aware of those costs).
I am not saying all these things happen all the time. Synchronization can be made cheaper by using lighter synchronization methods, and definitely some are in place; I didn’t look recently over a memory allocator for a managed language, but I can make some guesses. The possible costs are there; and O(1) is an amortized O(1).
I’m not saying that memory allocation will be cheaper in C/C++. Using the stock memory allocation in C/C++ will cost a lot, due to various strategies for memory allocation that malloc implements. However, you can implement different memory allocation strategies that are more suitable to your needs. For example, you can have a huge chunk of memory allocated, and filled on demand by your algorithm. When the algorithm is done, you simply discard that chunk of memory, avoiding costs of both memory allocation and deallocation. I’m not saying this is customary, but this can be done, and should be done especially by memory constrained developers (the forgetful developers are the worst!).
There is, however, a real O(1) memory allocation. It is the stack allocation, stack being preallocated and least likely to hit the disk, making the chance of page faults quite slim. C# unlocks this memory by the use of the unsafe ‘stackalloc‘ which in turn is actually a glorified name for the alloca C call.
So memory allocation in managed languages is a O(1) operation the same way O(log n) is O(1). It depends on how you choose to see things; it’s easy to forget the impact of O(1) on garbage collection, and, unfortunately, managed space developers tend to forget that their code runs on real operating systems, on real hardware.