Custom object memory allocation in Delphi: Bypassing FastMM for fun and profit

Want to avoid memory fragmentation? Want to write a memory pool for a certain type? Just want to write some cool code? This post delves into custom-allocating your classes.

Contents

Memory Fragmentation
Taking control of TObject’s memory
Custom per-class allocation with contiguous memory
Conclusion

Memory Fragmentation

Before Delphi Seattle, which fixes this, did you ever see Delphi give an ‘Out of Memory’ error and close, yet when you checked its memory usage in Task Manager, it was only using a few hundred megabytes? How did Delphi – how does any app – run out of memory when so much memory is supposedly still available?

The answer is memory fragmentation.

What is memory fragmentation?

Memory fragmentation is a situation where even though the total sum of free memory is larger than the amount you want to allocate, there is no single part, no contiguous block, of memory that is big enough for the allocation you want to make. That means that even though you have lots of free memory in total, you don’t have any single piece big enough, and when you try to allocate, your app raises an ‘Out of Memory’ exception and quits. Were you really out of memory? No, but in practice, since memory couldn’t be allocated, yes.

Consider this, where the periods represent free memory – here ten units big (ten bytes, ten gigabytes, whatever). It starts off empty:

……….

1	..........

Then you make an allocation:

AA……..

1	AA........

and another:

AABB……

1	AABB......

and a few more:

AABBCDEF..

1	AABBCDEF..

Then you free a few:

..BB.D.E.F..

1	..BB.D.E.F..

Your memory is only 50% allocated – not too bad. You have five units free. But suppose you want to allocate three units? 3 < 5, so it should work, but there is no single contiguous space in your memory that allows you to allocate three units. End result? ‘Out of Memory’. It’s even worse if you repeat the frees and allocations a few times – or for a few hours.

Six months ago I released Parnassus Navigator – a tool you should check out :) – and version 1.0 caused this same error, ‘Out of Memory’, to occur more quickly than normal for those who often got it in Delphi, and even occasionally for those who had never seen it before when they had large projects – i.e. were probably in memory-stressed situations. Clearly something was wrong. Now, I can’t fix Delphi (Embarcadero did; Delphi 10 Seattle has no out of memory issues, and though to my knowledge they only mentioned increasing the address space, it’s likely they also did something like the technique described here) but I can fix my own software. Some investigation showed memory fragmentation was probably the cause, especially since it made the existing Delphi bug more common. How do you go about reducing or completely preventing fragmentation?

One answer is to take control of allocation – specifically where memory is allocated – yourself. But how, and how do you do that without completely replacing the memory manager?

Taking control of TObject’s memory

We rarely think about exactly what happens when we create an object. Somewhere, memory is allocated and the constructor is called. This is common across all languages – it’s why you see the alloc/init idiom in Objective-C code – but unlike Objective-C the details aren’t something we see in Delphi at all. In general, assuming it allocates memory, it (whatever “it” is – the new object, compiler-generated code, what?) presumably calls into FastMM somehow and allocates some memory, which is now the object, and then calls the constructor – but where, how? We don’t normally need to know.

The answer is in TObject.NewInstance, which you can find in System.pas. This is a virtual (woohoo!) class function that looks like this:

class function TObject.NewInstance: TObject;
begin
  Result := InitInstance(_GetMem(InstanceSize));
{$IFDEF AUTOREFCOUNT}
  Result.FRefCount := 1;
{$ENDIF}
end;

class function TObject.NewInstance: TObject;

begin

Result := InitInstance(_GetMem(InstanceSize));

{$IFDEF AUTOREFCOUNT}

Result.FRefCount := 1;

{$ENDIF}

end;

There is a similar method TObject.FreeInstance.

Ignoring the refcounting, this is what returns a new object, and as you can see it is allocating memory of the instance size – the size of the class and all its fields and data – and then initialising it. (I don’t believe this is where the constructor is called from – InitInstance, at least the Pascal version, only seems to be fixing up the vtables.) But if we want to control memory allocation, i.e. to replace GetMem, this is where we do it – and since both methods are virtual you can override them in your own classes easily.

Try it! Copy the System implementation into overridden methods in a test class and create and free a few times. Works nicely.

This capability of Delphi – to be able to hook into an object’s allocation – is, I think, one of the least-known features of the RTL.

I only found out about this in 2013, but I’ve been wanting to write a blog post about it ever since. Back then I was writing a memory manager for Delphi – something I still haven’t released – and I wanted to make it object-oriented where possible. Perhaps a mistake; it may add too much overhead. But I used this technique to bootstrap the memory manager classes’ object instantiation before the MM itself was up and running.

Note that unfortunately you have to copy the entire implementation, i.e. InitInstance(_GetMem(InstanceSize)) and maybe the refcounting. I wish that the call to GetMem (which calls into the memory manager, usually FastMM) was also virtual, say a method called AllocInstance. (InstanceSize, too, is not virtual, and you could do some neat things if it was.) Currently to change one thing – memory allocation – you have to copy a bunch of code that does other stuff too. We’ll see how this makes life more complicated later, for TInterfacedObject, although if you’ve already guessed refcounts, you’re right. However, ignoring low-level architectural design, being able to replace the default way a class is allocated like this is really neat.

You might also find this post by Allen Bauer interesting – it explains the history of NewInstance and InitInstance and a case where he’s overridden them too.

So far so good. But what’s an example of using this in practice? The below is an example avoiding memory fragmentation, by replacing the call to GetMem with a call to a custom allocator for the class. Let’s dig in.

Custom per-class allocation with contiguous memory

First, let’s consider a use case. (The below code is not appropriate for the majority of your class allocations.) Let’s look at Parnassus Navigator. Navigator allocates a lot of objects of two different types:

The minimap’s syntax highlighting is built from a series of small objects, representing syntactical areas and how they should draw. Rendering the minimap is very fast and easy: loop through them all and call Render() (this renders to an offscreen bitmap, which is cached and blitted whenever necessary). But there can be many of them, and they are freed and recreated slowly as the file changes.
Navigator’s Go To window keeps track of useful unit areas, and so has objects representing the uses/type/whatever clauses, representing each method, each field, each property, etc. This is of course not nearly so many, but it reparses as the file changes. (Why? Because it can only generate those from a syntactically correct file, and it doesn’t know if it’s correct until it tries. Would you be happy pressing Ctrl+G and getting a ‘Could not parse’ error? No. But if it was valid a minute ago, that last valid data is what’s used.)

Now, it doesn’t do this often – certainly not each time the file changes. It’s smart, guesses at appropriate parsing times, etc. (And does it in a secondary low-priority thread, too, so you will never see any work done in the IDE’s main thread.) But still, over hours and in large units, it can add up to allocating, freeing and re-allocating a lot of objects. And in Delphi, which has a lot of extra stuff going on inside a small memory space, this doesn’t help the fragmentation situation.

This is a perfect use case for a custom memory allocator for those classes. If you allocate a single chunk of memory once, and then allocate and free from that chunk, the rest of the IDE’s memory space will remain absolutely unaffected. In practice, this memory is quite small – usually 32-64KB max – and avoiding any frees and allocs in the normal memory manager completely prevents the code being a cause of any fragmentation in normal memory. Delphi itself (pre-Seattle) might get very fragmented, but none of that will be caused by or exacerbated by my code.

This is the kind of unusual, and specific, use-case for this kind of custom allocator. In general you should just let the memory manager do its thing.

Allocator behaviour

The basic idea is not to allocate memory from, effectively, a random location, but to allocate all or many objects next to each other. This means you allocate a small number of large pieces of memory, maybe even only once, not many pieces of small memory many times. Instead of “holes” appearing all over the memory space, allocations are strictly controlled. Frees do not immediately return to the memory manager but are available to be given out again. Since small allocations are not given back to the memory manager, and memory is allocated and freed in larger chunks far less often (if freed at all, see below) fragmentation is drastically reduced.

In terms of implementation, of which this is demonstration code only, the code below is generic and can be used for any class. It allocates a chunk of memory. That is split into a number of pointers, each incremented by the class instance size. These are pushed onto a stack. A class’s overridden NewInstance calls the allocator for itself, and is given a new pointer back by popping off the stack. If the stack is empty, it allocates another large chunk of memory, again subdividing. This means you might have, say, one or two or three individual allocations, rather than hundreds of thousands of memory-manager-managed allocs and frees.

Freeing – should you do it?

Now what about freeing? Freeing returns the pointer to the allocator, which in this sample code pushes it back on the stack of available pointers. This implementation is simple and regards freeing an object as returning its memory to the pool, but does not free the larger allocations. It’s quite simple to do so: one past implementation of this I wrote kept separate stacks per allocation, and when every pointer was returned – the count of items in the stack was the same as the number the allocation split into – it freed the larger one. However, this isn’t required for my purposes when writing this – and, I think, in many situations where you’ll need this code you won’t need to either.

Why? Consider the situation that causes this: many allocations, and frees. If you are constantly allocating after freeing, why return the freed object’s memory to the memory manager in the intermediate time? Just keep it.

Code

Let’s see this in practice!

Let’s define an allocator than can handle any class type, and bulk-allocate for it. This implementation uses VirtualAlloc (because if we’re going to bypass FastMM, let’s demonstrate really bypassing it) although you could also call GetMem. VirtualAlloc returns memory allocated in page-sized blocks, usually 4KB, so each allocation should allocate a multiple of the Windows block size.

It also keeps a list of all the large allocations, so that they can be freed on shutdown. You could expand this to free when when all pointers belonging to them are returned. There is also a define that does some book-keeping that’s useful for debugging, eg to make sure you haven’t got any memory leaks.

Pretty simple:

uses
  System.Generics.Collections;

{.$define COUNT_ALLOCATIONS}

type
  TAllocator&lt;T&gt; = class
  private
    FInstanceSizeBytes : NativeUInt;
    FBulkAllocSizeBytes : NativeUInt;
    FIndividualObjects : TStack;
    FBulkAllocations : TStack;
    {$ifdef COUNT_ALLOCATIONS}
      FNumNews,
      FNumReturns,
      FNumBulkAllocs : Int64;
    {$endif}
    const NUM_ALLOCS : NativeUInt = 4096 * 4;
    procedure BulkAllocate;
  public
    constructor Create;
    destructor Destroy; override;

    function New : Pointer;
    procedure Return(const P : Pointer);
  end;

implementation

uses
  Winapi.Windows;

{ TAllocator&lt;T&gt; }

constructor TAllocator&lt;T&gt;.Create;
begin
  inherited;

  FInstanceSizeBytes := TClass(T).InstanceSize;
  FBulkAllocSizeBytes := FInstanceSizeBytes * NUM_ALLOCS; // 4K blocks in Windows, so alloc several of them

  FIndividualObjects := TStack.Create;
  FIndividualObjects.Capacity := NUM_ALLOCS;
  FBulkAllocations := TStack.Create;
  FBulkAllocations.Capacity := 16;

  {$ifdef COUNT_ALLOCATIONS}
    FNumNews := 0;
    FNumReturns := 0;
    FNumBulkAllocs := 0;
  {$endif}

  BulkAllocate;
end;

destructor TAllocator&lt;T&gt;.Destroy;
var
  P : Pointer;
begin
  {$ifdef COUNT_ALLOCATIONS}
    assert(FNumNews = FNumReturns); // Otherwise objects leaked
  {$endif}

  FIndividualObjects.Clear;

  // Free all bulk allocs
  while FBulkAllocations.Count &gt; 0 do
    VirtualFree(FBulkAllocations.Pop, 0, MEM_RELEASE);

  FBulkAllocations.Clear;

  FIndividualObjects.Free;
  FBulkAllocations.Free;

  inherited;
end;

procedure TAllocator&lt;T&gt;.BulkAllocate;
var
  P : Pointer;
  Item : NativeUInt;
  I : Integer;
begin
  P := VirtualAlloc(nil, FBulkAllocSizeBytes, MEM_COMMIT or MEM_RESERVE, PAGE_EXECUTE_READWRITE);
  FBulkAllocations.Push(P);

  // Now, split into a set of pointers which can become individual objects, ie
  // checks of memory FInstanceSize big each
  Item := NativeUInt(P);
  while Item &lt; NativeUInt(P) + FBulkAllocSizeBytes do begin
    FIndividualObjects.Push(Pointer(Item));
    Inc(Item, FInstanceSizeBytes);
  end;
  assert(Item = NativeUInt(P) + FBulkAllocSizeBytes);

  {$ifdef COUNT_ALLOCATIONS}
    Inc(FNumBulkAllocs);
  {$endif}
end;

function TAllocator&lt;T&gt;.New: Pointer;
begin
  if FIndividualObjects.Count = 0 then
    BulkAllocate;

  Result := FIndividualObjects.Pop;
  ZeroMemory(Result, FInstanceSizeBytes);

  {$ifdef COUNT_ALLOCATIONS}
    Inc(FNumNews);
  {$endif}
end;

procedure TAllocator&lt;T&gt;.Return(const P: Pointer);
begin
  FIndividualObjects.Push(P);

  {$ifdef COUNT_ALLOCATIONS}
    Inc(FNumReturns);
  {$endif}
end;

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

uses

System.Generics.Collections;

{.$define COUNT_ALLOCATIONS}

type

TAllocator<T> = class

private

FInstanceSizeBytes : NativeUInt;

FBulkAllocSizeBytes : NativeUInt;

FIndividualObjects : TStack;

FBulkAllocations : TStack;

{$ifdef COUNT_ALLOCATIONS}

FNumNews,

FNumReturns,

FNumBulkAllocs : Int64;

{$endif}

const NUM_ALLOCS : NativeUInt = 4096 * 4;

procedure BulkAllocate;

public

constructor Create;

destructor Destroy; override;

function New : Pointer;

procedure Return(const P : Pointer);

end;

implementation

uses

Winapi.Windows;

{ TAllocator<T> }

constructor TAllocator<T>.Create;

begin

inherited;

FInstanceSizeBytes := TClass(T).InstanceSize;

FBulkAllocSizeBytes := FInstanceSizeBytes * NUM_ALLOCS; // 4K blocks in Windows, so alloc several of them

FIndividualObjects := TStack.Create;

FIndividualObjects.Capacity := NUM_ALLOCS;

FBulkAllocations := TStack.Create;

FBulkAllocations.Capacity := 16;

{$ifdef COUNT_ALLOCATIONS}

FNumNews := 0;

FNumReturns := 0;

FNumBulkAllocs := 0;

{$endif}

BulkAllocate;

end;

destructor TAllocator<T>.Destroy;

var

P : Pointer;

begin

{$ifdef COUNT_ALLOCATIONS}

assert(FNumNews = FNumReturns); // Otherwise objects leaked

{$endif}

FIndividualObjects.Clear;

// Free all bulk allocs

while FBulkAllocations.Count > 0 do

VirtualFree(FBulkAllocations.Pop, 0, MEM_RELEASE);

FBulkAllocations.Clear;

FIndividualObjects.Free;

FBulkAllocations.Free;

inherited;

end;

procedure TAllocator<T>.BulkAllocate;

var

P : Pointer;

Item : NativeUInt;

I : Integer;

begin

P := VirtualAlloc(nil, FBulkAllocSizeBytes, MEM_COMMIT or MEM_RESERVE, PAGE_EXECUTE_READWRITE);

FBulkAllocations.Push(P);

// Now, split into a set of pointers which can become individual objects, ie

// checks of memory FInstanceSize big each

Item := NativeUInt(P);

while Item < NativeUInt(P) + FBulkAllocSizeBytes do begin

FIndividualObjects.Push(Pointer(Item));

Inc(Item, FInstanceSizeBytes);

end;

assert(Item = NativeUInt(P) + FBulkAllocSizeBytes);

{$ifdef COUNT_ALLOCATIONS}

Inc(FNumBulkAllocs);

{$endif}

end;

function TAllocator<T>.New: Pointer;

begin

if FIndividualObjects.Count = 0 then

BulkAllocate;

Result := FIndividualObjects.Pop;

ZeroMemory(Result, FInstanceSizeBytes);

{$ifdef COUNT_ALLOCATIONS}

Inc(FNumNews);

{$endif}

end;

procedure TAllocator<T>.Return(const P: Pointer);

begin

FIndividualObjects.Push(P);

{$ifdef COUNT_ALLOCATIONS}

Inc(FNumReturns);

{$endif}

end;

Then, the class that uses this overrides the NewInstance and FreeInstance methods. This sets up an allocator as a class var, and then uses it to allocate and free its instance memory.

class constructor TExample.ClassCreate;
begin
  FAllocator := TAllocator&lt;TExample&gt;.Create;
end;

class destructor TExample.ClassDestroy;
begin
  FAllocator.Free;
end;

class function TExample.NewInstance: TObject;
begin
  Result := InitInstance(FAllocator.New);
end;

procedure TExample.FreeInstance;
begin
  FAllocator.Return(Self);
end;

class constructor TExample.ClassCreate;

begin

FAllocator := TAllocator<TExample>.Create;

end;

class destructor TExample.ClassDestroy;

begin

FAllocator.Free;

end;

class function TExample.NewInstance: TObject;

begin

Result := InitInstance(FAllocator.New);

end;

procedure TExample.FreeInstance;

begin

FAllocator.Return(Self);

end;

And voila. Every instance of that class now comes out of that pool.

This gives you:

A good example of overriding NewInstance and FreeInstance
A simple example of a class-based memory pool, which in the use case detailed above, completely removes each class it’s used with as being a cause of fragmentation.

Things to note

Freeing: as written above, this returns all freed objects to the pool (stack), but never deallocates the large-chunk allocations until the programs ends. It could do that if you track which allocation a pointer belongs to. However, check if this is necessary: if you’re repeatedly allocating and freeing a hundred thousand objects, keeping the memory available is what you want. Is a constant allocation of, say, a few megabytes a bad thing, when it avoids fragmentation?
What about records? Excellent question. You can indeed also use this technique with records, but with some modifications. Records are normally stack-allocated, so to use this you need to always use pointers to your record, not the record type itself. In your record, you could define a class function New which returns a new PYourRecord, and a normal method Free which returns the current record pointer to the allocator. Getting the instance size of a record is of course very simple; just use SizeOf. This gives you code that reads far more like you are using classes.
Clearing memory. Delphi does not always zero-initialise the memory it returns. In fact, GetMem is not guaranteed to be zeroed, but AllocMem is. I don’t personally like this; I think zeroing is a Good Thing. For a class allocator, it doesn’t matter because InitInstance clears the memory anyway. (I don’t know why NewInstance doesn’t just use AllocMem instead of GetMem.) For records, you should of course always be careful to clear the returned pointer.
TInterfacedObject. TInterfacedObject’s NewInstance also handles the initial refcount. If you don’t take care of this in your overridden NewInstance you will get mysterious behaviour as your refcounted objects will start with a count of 0, and free themselves unexpectedly.
Instance size: you might want to round this up to a multiple of, say, eight bytes in order to align well. In my own code, the instance sizes were already powers of two.
Performance: popping an item off a stack is quite cheap, but Delphi’s default TStack is not as fast as it could be, mostly because it can call a number of events plus has some other specialised code. You might want to write a stripped-down, super-performant stack, since a stack of pointers doesn’t need to be very complicated. Testing with that, I got allocation times on par with FastMM (slower, but same order of magnitude.)
Be careful! If you don’t know why you should use this technique, don’t use it. Are you sure fragmentation is your problem? Are you sure FastMM’s behaviour isn’t good enough? Are you sure (if using the unmodified code above) that never returning memory to the operating system is really the desired behaviour? Are you sure you want related objects close to each other in memory? (This can really benefit, and also really mess up, your CPU cache.)
Hacks: there are all sorts of neat, and really hacky, things you could do either by overriding NewInstance or by hooking it. A class-based memory pool is only one of them. What if you want to store a flag along with object instances? You could allocate InstanceSize+n, then access memory at InstanceSize and above for something custom. (Why you wouldn’t do this with a field, I don’t know, but… :)) You could bootstrap objects before the memory manager has been initialised, such as in your own MM, as I did (see above.) You could ensure a singleton class truly is only a singleton, by making the allocator raise an exception if more than one is allocated. (Yes, you could do this in far less hacky ways, too – a class var keeping track of instances comes to mind.) Truthfully, probably the main reason to do this is when you have a good reason for controlling how and where object memory is allocated.

Why can’t FastMM do this?

The memory manager interface in Delphi is very low-level. It allows you to allocate and free a certain size, but there is no indication what that size is for. It can’t, for example, know that TMyObject will benefit from being bulk-allocated.

FastMM itself does pool allocations of small sizes, allocations of less than 2KB. This is, in fact, a very similar technique – a single larger allocation, subdivided, and is meant to reduce fragmentation

So why does using this sometimes prevent OoM errors, instead of just using FastMM? I don’t know enough about the internals of FastMM to say, but my guess is that it itself gets fragmented. It has to handle all sorts of allocation sizes, and allocations can even be shrunk while remaining at the same address. Maybe it’s unable to allocate the new large blocks at some point. What I can tell you is that using the above code, suitably modified for your use case, will drastically reduce fragmentation.

Conclusion

If you’ve read this far, fantastic! There are two main things:

How to change how a Delphi class is allocated and freed, without changing or hooking the memory manager
An example bulk-allocation or memory pool for a use case of many often-allocated and often-freed objects.

If you enjoyed the blog post, feel free to join in the conversation in the comments below.