Allocators – Part 2 – Coding with Bruno

Free list
Pool
Affix

Free list

A free list allocator is an allocator that keeps a linked list to the deallocation pointers and use another allocator to do the actually allocations. The ideia is allocate fixed size memory blocks and when we deallocate a memory block we add it to a linked list, but to do that we need the minimum size to be greater or equal sizeof(intptr_t), because we’ll set the linked list next pointer, as a pointer the size allocated need to be greater or equal sizeof(intptr_t). The general concept is simple but there are some types of free list. I’ll not cover these other types, I prefer keep it simple and in a more practical way.

Free list has a lot downsides, such as different memory flow with multithreading, fragmentation, grows forever, inflexibility.

Here’s a basic free list code chunks.

Class body and structs:

template < typename TSupportAllocator, size_t BlockSize, size_t ToleranceMin, size_t ToleranceMax >
class FreeListAllocator
{
	static_assert(BlockSize >= sizeof(intptr_t), "BlockSize needs to be greater or equal sizeof(intptr_t).");

	struct Node { Node* Next{}; };
	Node* mHead{};
    TSupportAllocator mAllocator{};
};

A basic body has just a Node to make the free list and a static assertion to check if the BlockSize is greater or equal the size of Node type. There is something I didn’t talk about, tolerances. Basically the idea here is when we call Allocate and pass to it a size, if size passed is in tolerance range, the procedures of free list will be activated.

Allocation function:

MemoryBlock Allocate(size_t Size, size_t Alignment = sizeof(std::max_align_t))
{
	if((Size >= ToleranceMin && Size <= ToleranceMax) && mHead)
	{
		void* lPtr = mHead;
		mHead = mHead->Next;
		return MemoryBlock{lPtr, Size};
	}
	return mAllocator.Allocate(BlockSize, Alignment);
}

First, the size is checked to determine if it’s within tolerance range and, if free list head is valid, get the pointer, advance linked list and return the memory block. If free list conditions are not valid, allocates a new memory block from support allocator and return it.

Deallocation function:

void Deallocation(MemoryBlock Mb)
{
	if(Mb.Size != BlockSize) mAllocator.Deallocate(Mb);
	Node* lNewNode = static_cast<Node*>(Mb.Ptr);
	lNewNode->Next = mHead;
	mHead = lNewNode;
}

Really simple, if memory block size is different from BlockSize, runs deallocation from support allocator, otherwise, add it to free list.

Owns function:

[[nodiscard]] bool Owns(MemoryBlock Mb) const
{
	return Mb.Size == BlockSize || mAllocator.Owns(Mb);
}

Owns isn’t used actually.

There are failures in Owns and Deallocation functions. In the Owns function (that it’ll not be used internally), as we don’t check the elements in free list we can’t actually tell if the memory block is owned by this allocator, the simple check of “Mb.Size == BlockSize” is error prone, since the user can pass a memory block allocated by another allocator with an allocation size equal to the BlockSize of this free list allocator. But iterating the free list is an insane non-performant way to do this, so the user needs to be careful interacting multiples allocators.
The same problem happens in Deallocation function, user can pass a memory block allocated by another allocator with same size as the BlockSize. So, how do we do this safely? Since iterating a linked list is out of question, we can add a extra pointer in every allocation block with the address of free list allocator. Each allocation is a fat pointer and will have minimum size of 16 bytes, we can enable this procedure only for debug builds, as we presume that in release mode the program was already been tested.

Now Allocation function will be done like this:

MemoryBlock Allocate(size_t Size, size_t Alignment = sizeof(std::max_align_t))
{
#ifndef NDEBUG
	if((Size >= ToleranceMin && Size <= ToleranceMax) && mHead)
	{
		intptr_t* lPtr = reinterpret_cast<intptr_t*>(mHead);
		mHead = mHead->Next;
		return MemoryBlock{lPtr + 1ull, Size};
	}
	const MemoryBlock lMemoryBlock = mAllocator.Allocate(BlockSize + sizeof(intptr_t), Alignment);
	intptr_t* lIntPtr = static_cast<intptr_t*>(lMemoryBlock.Ptr);
	*lIntPtr = reinterpret_cast<intptr_t>(this);
	return MemoryBlock{lIntPtr + 1ull, BlockSize};
#else
	if((Size >= ToleranceMin && Size <= ToleranceMax) && mHead)
	{
		void* lPtr = mHead;
		mHead = mHead->Next;
		return MemoryBlock{lPtr, Size};
	}
	return mAllocator.Allocate(BlockSize, Alignment);
#endif
}

So, the difference is to get the properly memory block pointer and to allocate from the support allocator. To get the pointer we just advance one unit from an intptr_t* casted head, because we presume that deallocation function operates on the fat pointer which its first information is the address of the free list allocator. If there is no head we allocate a block size plus intptr size and we set the first fat pointer value to the address of the allocator.

As follow we can create a function to perform the owns debug check:

#ifndef NDEBUG
[[nodiscard]] bool OwnsConditionDebug(MemoryBlock Mb) const
{
	return *(reinterpret_cast<intptr_t*>(Mb.Ptr) - 1ull) == reinterpret_cast<intptr_t>(this);
}
#endif

[[nodiscard]] bool Owns(MemoryBlock Mb) const
{
#ifndef NDEBUG
	return OwnsConditionDebug(Mb) || mAllocator.Owns(Mb);
#else
	return Mb.Size == BlockSize || mAllocator.Owns(Mb);
#endif
}

The OwnsConditionDebug just get the previous intptr_t* from the memory block to access the allocator address and compare with this allocator address.

Now, as we know that the memory block came from the free list allocator we can safe deallocate:

void Deallocation(MemoryBlock Mb)
{
#ifndef NDEBUG
	if(!OwnsConditionDebug(Mb)) return; // Can be asserted too
#endif
	if(Mb.Size != BlockSize) mAllocator.Deallocate(Mb);
	Node* lNewNode = static_cast<Node*>(Mb.Ptr);
	lNewNode->Next = mHead;
	mHead = lNewNode;
}

These security measures don’t need to be done actually if you know what you’re doing, for me right now, it seems more secure, but keeping extra information on each allocation will double the memory used by your program, in debug build I believe it is viable, once it was well tested there is no need to keep extra information in allocations.

There is a type of free list that I really like, I believe its called segregated free list. Basically we’ll keep an array for free lists one for each power of two size.

struct Node
{
    Node* Next{};
};
Node* mFreeLists[40];

The code chunk above will store free lists of size 2^3 up to 2^40. Basically how it works, when we allocate a new memory block we get the power of two value from the size, the allocation only works on power of two sizes, then we apply log2 to the size to get the index for manipulate free list array, as the minimum size is 8 we subtract the result of log2 by 3.
So, we need cheap functions to determine if the size is power of two and to operate log2.

We can check if a number is power of two by doing this:

return (Size & (Size - 1)) == 0;

It is just a bitwise and operation with the previous value. If the result is zero, the number is power of two.

We can get power of two value by doing something like this:

[[nodiscard]] static size_t GetPowerOfTwo(size_t Size)
{
	if ((Size & (Size - 1)) == 0)
	{
		return Size;
	}
	Size--;
	Size |= Size >> 1;
	Size |= Size >> 2;
	Size |= Size >> 4;
	Size |= Size >> 8;
	Size |= Size >> 16;
	Size |= Size >> 32;
	Size++;
	return Size;
}

The first check it is very important to performance, if the size is already power of two, just return it. Otherwise, compute power of two for 64bit.

And I found a fast implementation of log2 in stack overflow, here:

static const uint32_t Log2Tab64[64] = {63, 0,  58, 1,  59, 47, 53, 2,  60, 39, 48, 27, 54, 33, 42, 3,  61, 51, 37, 40, 49, 18,
						   28, 20, 55, 30, 34, 11, 43, 14, 22, 4,  62, 57, 46, 52, 38, 26, 32, 41, 50, 36, 17, 19,
						   29, 10, 13, 21, 56, 45, 25, 31, 35, 16, 9,  12, 44, 24, 15, 8,  23, 7,  6,  5};

static uint64_t Log2(uint64_t Value)
{
	Value |= Value >> 1;
    Value |= Value >> 2;
    Value |= Value >> 4;
    Value |= Value >> 8;
    Value |= Value >> 16;
    Value |= Value >> 32;
    return Log2Tab64[((uint64_t)((Value - (Value >> 1))*0x07EDD5E59A4E28C2)) >> 58];
}

And finally we can make the allocation function that will be something like this:

MemoryBlock Allocate(size_t Size, size_t Alignment = sizeof(std::max_align_t))
{
    const auto lSize = GetPowerOfTwo(Size);
    const auto lIndex = Log2(lSize) - 3ull;
    if(lIndex >= 40) return MemoryBlock{};

    auto& lHead = mFreeLists[lIndex];
    if(lHead)
    {
        void* lPtr = lHead;
        lHead = lHead->Next;
        return MemoryBlock{lPtr, lSize};
    }
    
    return mAllocator.Allocate(lSize, Alignment);
}

As I said above, we compute the power of two for size, after we compute the index using log2 and subtracting 3, if the index is greater or equal 40, return a empty block, and at last, we add it to the free list or we allocate a new memory block.

Now, the deallocation works like this:

void Deallocation(MemoryBlock Mb)
{
    const auto lSize = GetPowerOfTwo(Mb.Size);
    const auto lIndex = Log2(lSize) - 3ull;
    if(lIndex >= 40) return MemoryBlock{};

	Node* lNewNode = static_cast<Node*>(Mb.Ptr);
	lNewNode->Next = mFreeLists[lIndex];
	mFreeLists[lIndex] = lNewNode;
}

Almost the same as the previous. We get power of two from memory block size, we compute log2 – 3, and add it to free list.

So, to avoid deallocations in free list to grows forever, we can specify a maximum number of elements in free list. Then deallocation can be done like this:

void Deallocation(MemoryBlock Mb)
{
    const auto lSize = GetPowerOfTwo(Mb.Size);
    const auto lIndex = Log2(lSize) - 3ull;
    if(lIndex >= 40) return MemoryBlock{};

	Node* lNewNode = static_cast<Node*>(Mb.Ptr);
	lNewNode->Next = mFreeLists[lIndex];
	mFreeLists[lIndex] = lNewNode;
    ++mFreeListsCurrent[lIndex];

    if(mFreeListsCurrent[lIndex] >= mFreeListsMax)
    {
        mFreeListsCurrent[lIndex] = 0ull;
        Node* lNode = mFreeLists[lIndex];
        while(lNode)
        {
            void* lPtr = lNode;
            lNode = lNode->Next;
            mAllocator.Deallocate(lPtr);
        }
        mFreeLists[lIndex] = nullptr;
    }
}

So, we increment a current elements based on free list index, and if this value is greater or a max amount, we iterate all elements in free list deallocating them.

Pool allocator

A pool allocator basically allocates a fixed amount of objects in the construction, and it keeps reusing the objects until destruction. The way I’ll make it is: a pre-allocated objects buffer, a cursor to get current valid object and a free list so we can deallocate the objects in the pool. Also, a support allocator is needed to construction allocations and destruction deallocations.

template < size_t ElementSize, typename TSupportAllocator >
class PoolAllocator
{
    static_assert(ElementSize >= sizeof(intptr_t), "ElementSize needs to be greater or equal sizeof(intptr_t).");
	struct FreeList
	{
		FreeList* Next{};
	};

	MemoryBlock mData{};
	uint64_t mCursor{};
	FreeList* mFreeList{};
	TSupportAllocator mAllocator{};

public:
	static constexpr size_t ALIGNMENT = RoundToAligned(ElementSize);

public:
	PoolAllocator(const uint64_t Capacity) 
		: mData{mAllocator.Allocate(ElementSize*Capacity, ALIGNMENT)} {}

	~PoolAllocator() { mAllocator.Deallocate(mData); }
};

The Allocation/Deallocation are very straightforward:

MemoryBlock Allocate(size_t, size_t)
{
	void* lPtr;
	if(mFreeList)
	{
		lPtr = mFreeList;
		mFreeList = mFreeList->Next;
	}
	else
	{
		if((mCursor + ElementSize) < mData.Size) return MemoryBlock{};
		lPtr = mData.Ptr + mCursor;
		mCursor += ElementSize;
	}
	return MemoryBlock{lPtr, ElementSize};
}

So, if there is element in free list, take it, otherwise, check cursor bounds and take the current cursor pointer.

And for deallocation function, add the pointer to free list.

void Deallocate(MemoryBlock Mb)
{
	FreeList* lNewNode = reinterpret_cast<FreeList*>(Mb.Ptr);
	lNewNode->Next = mFreeList;
	mFreeList = lNewNode;
}

As we use a fixed buffer and we only destroy it in destructor, we can just add it to free list.

Affix

This allocator is very useful to debug, safety or if you want just more information in your allocations. The ideia is to have two type names, prefix and suffix, and use this values to insert at the beginning and at the end for each allocation, with this allocator we can carry information between custom allocators or use the filename and line as header to be able to check where the allocation came from at runtime and other things.

template < typename TSupportAllocator, typename TPrefix, typename TSuffix >
class AffixAllocator
{
    TSupportAllocator mAllocator{};

	struct InternalPrefix
	{
		intptr_t AllocatorAddress;
        size_t AllocationSize;
		TPrefix Prefix;
	};
	struct InternalSuffix
	{
		intptr_t AllocatorAddress;
		TSuffix Suffix;
	};
};

Here we made internal structures for prefix and suffix with the address and respective values. The prefix structure needs the allocation size to be able to check the suffix value in deallocate function.

For the prefix and suffix types, we can do something like this:

struct AffixAllocator_FilePrefix
{
    char Filename[64];
    int32_t LineNumber;
};

struct AffixAllocator_FileSuffix
{
};

So, allocation function will allocate a memory block of size of internal prefix, internal suffix and plus allocation size.

MemoryBlock Allocate(size_t Size,
	InternalPrefix&& Prefix, 
	InternalSuffix&& Suffix, 
	size_t Alignment = sizeof(std::max_align_t))
{
	auto lMb = mAllocator.Allocate(sizeof(InternalPrefix) + sizeof(InternalSuffix) + Size, Alignment);
	Prefix.AllocatorAddress = reinterpret_cast<intptr_t>(this);
	Prefix.AllocationSize = Size;
	Suffix.AllocatorAddress = reinterpret_cast<intptr_t>(this);
	*reinterpret_cast<InternalPrefix*>(lMb.Ptr) = std::move(Prefix);
	*reinterpret_cast<InternalSuffix*>(lMb.Ptr + sizeof(InternalPrefix) + Size) = std::move(Suffix);
	return MemoryBlock{lMb.Ptr + sizeof(InternalPrefix), Size};
}
#define AllocateDefaultPrefixSuffix(Size)	\
Allocate(Size, AffixAllocator_FilePrefix{__FILE__, __LINE__}, AffixAllocator_FileSuffix{})
#define AllocateDefaultPrefixSuffixAligned(Size, Alignment)	\
Allocate(Size, AffixAllocator_FilePrefix{__FILE__, __LINE__}, AffixAllocator_FileSuffix{}, Alignment)

It seems a lot to take but is really simple. The prefix and suffix are needed as parameter so the user can specify information for their system. First, allocation is done with total size. Second, set the allocator address and allocation size to the prefix and suffix. Third, move prefix and suffix parameters to the allocated memory. Fourth, return target memory block. I also made macros to apply in automatic way the prefix with filename and line number.

The Owns function basically access the prefix and suffix values and check the address.

[[nodiscard]] bool Owns(MemoryBlock Mb) const
{
	auto lAddress = reinterpret_cast<intptr_t>(this);
	auto lData = Mb.Ptr - sizeof(InternalPrefix);
	auto lPrefix = reinterpret_cast<InternalPrefix*>(lData);
	auto lSuffix = reinterpret_cast<InternalSuffix*>(lData + sizeof(InternalSuffix) + lPrefix->AllocationSize);
	return lPrefix->AllocatorAddress == lAddress && lSuffix->AllocatorAddress == lAddress;
}

For deallocate function we just need to apply Owns function so we can deallocate the memory block.

void Deallocate(MemoryBlock Mb)
{
	if(!Owns(Mb))
	{
		return;
	}
	auto lData = Mb.Ptr - sizeof(InternalPrefix);
	mAllocator.Deallocate(MemoryBlock{lData, sizeof(InternalPrefix) + sizeof(InternalSuffix) + Mb.Size});
}

Conclusion, there are so many allocators out there, for various and specific reasons. In the case of the MemoryBlock structure I was thinking, maybe it is more safe to use as a nested struct, we can still reinterpret cast them because size is the same, and we can’t deallocate a memory block allocated by another allocator without notice the type cast error. So, allocator is a huge topic, there are more things to add to it later.

In part 3, I’ll talk about more three allocators that I find interesting, which are allocators with stats, bitmap block and segregator.

Part 3

Allocators – Part 2

Related Posts

Flexible result system

Allocators – Part 1

Leave a Reply Cancel reply