Improved Sliced Array implementation

by Iztok Kacin in Coding

If you don’t know what sliced array is, I suggest that you first read the previous post, where I presented the sliced array. There I presented the data structure that is similar to classical dynamic array, but is more efficient for deletes and inserts. Also, adding elements is a bit faster, because memory allocation is more efficient. But the downside was that accessing item by index was slower then with classical dynamic array. And because accessing the item by index is the most important feature of such a structure, I was not satisfied with the results. So as usual, I looked for a way to do it better while still being efficient for deletes and inserts. And I did it, I found a way. First lets see what changed from previous implementation.

SlicedArray2

As you can see a lot changed. I threw out most of the pointers for doubly linked lists that were actually not needed. I also changed how each slice is implemented. The biggest problem with the old implementation was, that the slices were of variable sizes. That proved it more difficult to access the slice by random index. I could find it very fast due to the tricks I wrote about, but not fast enough. 2-3 times slower access by index is not something I was happy about. Furthermore using the helper pointer to remember last accessed slice made even reading thread unsafe. That was unacceptable. So I though of another solution. The slices need to be of equal size that is a must, no way around that if I want to be super efficient. But how do we then solve deletes and inserts. If I have to maintain equal slice size, that means I have to move items in all slices if I delete or insert one item. Well at least in slices above the targeted one. That way I gain nothing. Then I thought about a neat solution, I remembered how a typewriter works 🙂

What if we add some buffer space bellow and above the slice limits. This way if we insert or delete an item, we only have to move one item per slice. To demonstrate, if we insert an item into the N-th slice, we have to move all the items in that slice for one item upwards. Then, we have to move the last item of the N-th slice to the next slice to maintain equal item count. But because we have some buffer space in each slice, we can simply append the item before next slice items and we don’t have to move other items at all! Then just repeat that for each slice to the end. So lets say we have 100 slices. if we insert an item into the 10-th slice we only have to move items in that slice and then just move one item in each of the 90 slices upwards. No need to move items in all 90 slices. I we run out of buffer space in one of the slices, then we move all items to the middle again to reposition them. The bigger the buffer the more operations we can do, before we must move again. Just like the typewriter. It shifts to the right and when we come to the end of the paper we reposition it again. We don’t reposition for each letter.

It turns out this is very efficient. We don’t have many slices. For instance typical setup I use in testing is for one million items. Each slice has 10.000 items, so I have 100 slices. Each slice has for 1000 items of buffer space, that is 10% and it means I have 500 items buffer bellow and above slice limits. I tested with 500 items buffer and the speed was still mostly the same. We don’t need a lot of buffer space. Its simple math really. If you have only 10 items of buffer on each side you already are 10 time more efficient on deletes and inserts. The record structure for the slice now come down to this:

   TArraySlice = record

    Last: Integer;
    Start: Integer;
    Index: Integer;
    Data: TAnyValues;
  end;

And the lookup control is now simply:

  PSliceData = ^TSliceData;
  TSliceData = array of PArraySlice;

So quite simplified. The tests show great improvement. The next picture shows the same test as last time, for one million items (10.000 items per slice, 1.000 items buffer per slice). I only added sort test to see some real-time performance and changed Integer list with TValue list because TValue is something we compete against.

SpeedTest2

Nice, isn’t it. We retained around 10 times faster deletes and insert and we are now also very, very fast on indexed access. We got down from approximately 270 ms to around 110 ms. For 1.000.000 items look-up we are now only around 20 ms slower then pure dynamic array (eg. TList). You have to take into account that that 113 ms is mostly calling Random function for random indexed access. The true difference is seen in the iteration by index test. There it is 38 vs 29. I say this is more then acceptable. If that difference bothers you then well… 🙂

We can also note how slow variants and especially TValue are on deletes and inserts. 65 seconds on TValue against 0.1 with my implementation. Staggering difference. That is because TAnyValue is only 9 bytes in size on both 32 and 64 bit and TValue is just huge. That is why memory footprint is also very important. Few recognize it but when you move data around it shows. Its not just the pure memory consumption that hits you its also all operations. You need to work with so many more bytes in memory.

I have also done a stress and regression test for IAnyArray. I ran all basic operations in a loop for 5 hours, while checking the content of the array against TList after each cycle. If there was an inconsistency, I was informed. I ironed out all the major bugs I hope. Download section now has a separate AnyValue download entry, so you can download only AnyValue units. Also the updated code is already there, with all the tests and demos. Just don’t expect that you will know how to use the StressTest. That is only meant for me to test for bugs.

I truly think that this structure is better then TList in almost any way. The only downside is, it consumes more memory, but even that is not a given. I allocate memory in chunks of 10.000 items or whatever your slice size is and TList just doubles the size of the array each time it runs out of space. So it will quickly be less efficient even for that, at large item numbers. If you need an array with powerful interface and efficiency for large number of items then this is it.

If you find any bugs or have any thoughts on any of what I wrote, don’t be shy, drop a comment. Also I dare you all out there, to make it even faster if you can. What about a little competition? Take my code and lets see if you can squeeze something more out of it. 🙂

Example of Cromis Library usage

by Iztok Kacin in Coding

What good is a Library of code if you have no use for it, or if the use is so clumsy you don’t want to use it. Almost all the code in Cromis Library was written out of my personal need to solve some problems. The main thing I strive to achieve besides speed and efficiency is ease of use and high level abstraction. I like code that on one hand writes almost like a script language and still gives you full power down to the metal on the other. Today I had to quickly write a program to solve the following problem:

I had two directories full of XML files.  Each XML represents a survey. In one directory I had completed surveys and in the other I had sessions of those surveys. Each survey can be made in one or more sessions. The only ID I could match between them was SampleID (incremental number of the survey). I had some missing data for some external data linking, so I had to pull session ID from session XML files to the finished survey files. The steps are like that.

  1. Go through all the session files
  2. Sort them by save time
  3. Store the file names in cardinal hash table. If two or more exist with same SampleID, only use the one that is the oldest. That is why the sort.
  4. Go through all the finished surveys files
  5. For each XML check if the data is missing and if it is get it from hash table
  6. If file was changed write it back to HD

Ok you may think this should be a lot of code to write. Well it is not if you use my tools as I did. I uses AnyValue, IAnyArray, SimpleStorage and Cardinal HashTable.

procedure TForm2.Button1Click(Sender: TObject);
var
  TempHashList: TCardinalHashTable;
  TempArray: IAnyArray;
  ValueItem: PAnyValue;
  Document: IDocument;
  AnyValue: TAnyValue;
  SampleID: Integer;
  Answer: IElement;
begin
  TempHashList := TCardinalHashTable.Create;
  try
    TempArray := CreateAnyArray;
 
    CreateCollection(eSourceTemp.Text).Get
    (
      'Parameters/SampleID',
      procedure(const Document: IDocument; const Element: IElement; const Data: Pointer)
      begin
        AnyValue := TAnyValue.Null;
        AnyValue['Timestamp'] := Document.Data.GetAttr('SaveDate').AsDateTime;
        AnyValue['SampleID'] := Document.Data.Get(['Parameters', 'SampleID']).AsInteger;
        AnyValue['FullName'] := Document.Path;
        AnyValue['Filename'] := Document.Name;
        TempArray.Push(AnyValue);
      end,
      nil
    );
 
    TempArray.Sort
    (
      function(Item1, Item2: PAnyValue): Integer
      begin
        Result := CompareDateTime(Item1^['Timestamp'].AsDateTime,
                                  Item2^['Timestamp'].AsDateTime);
      end
    );
 
    for ValueItem in TempArray.Enum.Reverse do
      TempHashList.Item[ValueItem^['SampleID'].AsInteger] := ValueItem^;
 
    for Document in CreateCollection(eSourceData.Text) do
    begin
      if not Document.Data.Exists(['Answers', 'ZAPIS_WID']) then
      begin
        SampleID := Document.Data.Get(['Parameters', 'SampleID']).AsInteger;
 
        if TempHashList.ContainsKey(SampleID) then
        begin
          Answer := Document.Data.Ensure(['Answers', 'ZAPIS_WID']);
          Answer.EnsureAttr('et').AsInteger := 0;
          Answer.EnsureAttr('date').AsDateTime := Now;
          Answer.EnsureAttr('source').AsString := 'setvar';
          Answer.EnsureAttr('srcid').AsInteger := 2;
          Answer.AsString := TempHashList.Item[SampleID]['Filename'];
          Document.SaveChanges;
        end;
      end;
    end;
  finally
    TempHashList.Free;
  end;
end;

Sure this could be done with something else also. Probably there are tools out there to achieve the same, or not. I don’t know. I wrote this as an example, on how to use the code, to solve problems with as little code as possible. The only thing I still lack, as I solve XML problems very often is a true script interface for all this so no actual compile would be needed. It would be nice to have 🙂 By the way do you even see in the code that I work with XML files in a directory or that I work with XML at all? Also the array of TAnyValue is very abstract, no a simple dynamic array.

And as I said in previous posts, TAnyValue is becoming the central powerhouse for all this. Tomorrow I will post the new version of TAnyValue and IAnyArray that now uses the new improved sliced array data structure.

Is it possible to build a better dynamic array? Presenting sliced array.

by Iztok Kacin in Coding

From the last post about array handling in TAnyValue (and in general), I was looking to make the IAnyArray as powerful and useful as I could. As I told you, my goal is to make a strong infrastructure around and on top of TAnyValue. And so I thought about it. Simple TList clone with more features was not something I was very thrilled about. So I looked the web about improving upon dynamic array structure. To my surprise, I found very little and even that was not what I wanted. I presume most of you know what dynamic array is and know its strengths and weaknesses. But lets just summarize.

1. Dynamic array

DynamicArray

Very simple structure. It uses contiguous chunk of memory of same type of elements (integers, string, record, object…). Its strengths:

    • Adding element to the end: O(1)
    • Accessing n-th element by index: O(1)
    • Removing element from the end: O(1)
    • Good caching, fast indexed walk-through because of caching.

And its weaknesses:

    • Removing element from the middle: O(n)
    • Adding element to the middle: O(n)
    • Can have problems with resizing at large sizes because of memory fragmentation

 

2. Single linked list

 SingleLinkedList

 

Also very simple structure. It uses pointers to link each record in one way (double linked lists do it both ways).  Its strengths:

    • Adding element to the end: O(1)
    • Removing element from the middle: O(1)
    • Adding element to the middle: O(1)
    • Removing element from the end: O(1)
    • Easily re-sized, because we only need to allocate memory for one new element at a time.

And its weaknesses:

    • Accessing n-th element by index: O(n)
    • Bad caching, elements can be all over the memory

 

Also found this nice table comparing them both:

Operation Array Singly Linked List
Read (any where) O(1) O(n)
Add/Remove at end O(1) O(1)
Add/Remove in the interior O(n) O(1)
Resize O(n) N/A
Find By position O(1) O(n)
Find By target (value) O(n) O(n)

So as you see each has its strengths and its weaknesses. If we could just take best of each one and make an efficient hybrid, all would be fine. Well that is not so easy to do. If you go one way, you suffer on the other end and vice verse. But after a lot of thinking and testing, I found a good implementation of something i now call sliced array. It is a hybrid, that tries to take best of both worlds and I think it mainly succeeds at that. It has a lot of strengths and very, very little weaknesses. It looks like this:

SlicedArray

I “sliced” the large array into several small arrays and then turned everything into a double linked list. I also added a look-up control array which is very important as you will see later. So how does it all work. Lets look at how a single slice is implemented:

  TArraySlice = record
    Count: Int64;
    Lookup: PSliceLookup;
    ArrayData: TAnyValues;
    NextSlice: PArraySlice;
    PrevSlice: PArraySlice;
  end;

Simple actually. It holds an array of TAnyValue, pointers to previous and next slice, pointer to look-up record and count of elements it holds. Look-up record looks like this

  TSliceLookup = record
    Index: Integer;
    SumCount: Int64;
    Slice: PArraySlice;
  end;

This is even simpler. It holds the index in the look-up array, pointer to slice and count of all elements in all the slices to this one + elements in this slice that the look-up points to. Now for the goods and bads of the sliced array.

Its strengths:

  • Adding element to the end: O(1)
  • Removing element from the middle: O(SliceSize)
  • Adding element to the middle: O(SliceSize)
  • Removing element from the end: O(1)
  • Good caching. Slices are contiguous in memory and so it supports locality of access, which is very important
  • Easily re-sized because we only need to allocate memory for one new slice. No need to re-size whole memory space used by the structure.

And its weaknesses:

  • Accessing n-th element by index: Hard to tell without analyzing mathematically, but its still fast

So we only have one weakness, but that could be a huge weakness because random access by index is very, very important. If we can’t do it very, fast then the structure is not worth the trouble. After all we want a general purpose structure, a workhorse like TList.

To make it, I implemented a few tricks. I can use the look-up structure. The dumbest way would be to just go over the look-up array and check if our index we are searching falls inside the lower and upper index bound of the slice. Once the slice is found we just do the same as for classic dynamic array. So that would be O(SliceCount + 1). Not to shabby if we don’t have to many slices but not near good enough. Specially because the more slices we have, the more we trim down internal insert and delete. I used two tricks to improve the search for the correct slice. First one is a helper pointer that always points to the last slice accessed. So if we do linear iteration from 0 to  Count -1, we do not search for the correct slice, ever. We just use the pointer to the last accessed slice and do the offset to the correct element. When we come to the end of the slice we just jump to the next one. Simple but very efficient. This way also the locality of access is supported if slices are big enough.

The next trick is (for truly random access) to calculate where the index should be (it does not mean it is there). If we have 100 elements in each slice and we want to access 350-th element we can assume that it is in the 4-th slice. So we jump directly there. If we missed, and we can miss as slices get degraded from deletions and insertions, we did not miss by much, we just check if we need to go left or right and move until we find the correct slice. It turns out this is also very fast, as predictions are good enough.

Furthermore something helps us here. If we do no deletions and insertions then the slices are in perfect condition and each prediction will be a hit. So we have an O(2) operation always. Tests will demonstrate that. If we have degradation it is slightly worse, but not by much, and we gained a lot on insertions and deletions. The key here is we take care of degradation. I do this by merging slices if the count falls under half the slice size and I split the slice if the count rises to 2x the slice size. This way the structure is never to degraded.

Ok you may say, but all this comes into play with a lot of elements. For small arrays it does not matter. True, that is why default slice size is 10.000 elements and under that you basically have a single dynamic array with one look-up pointer. I also do a check before doing indexed access and if only one slice is present I just jump directly to that one, no searching is done. The only real downside of the structure is a quite large complexity of it and the algorithm to manage it. Also it will never be truly as fast for random indexed search, if nothing else, because I just call more code opposed to returning n-th element from array.

Probably you won’t believe everything on my word, so here are the tests made on Delphi 2010. First one is for small number of elements (5.000) so we only have one slice and the other is for (1.000.000) elements. Here we have 100 slices (as default size is 10.000).  I tested my IAnyArray that now uses this structure for data storage, TList<TAnyValue> as this was a fair reference point for dynamic array comparison. Then I also tested TList<Variant> and TList<Integer> for the fun of it 🙂

 

Small array test (5.000e lements) – times are in ms

SlicedArray_Small

 

 

Large array test (1.000.000 elements) – times are in ms

SlicedArray_Large

 

 

Just some clarifications.

  • Deletes and inserts are 100 times faster. It makes sense. I have to move 100 times less elements in memory because I have 100 slices
  • Adding elements is also faster. Simply because I just create new slice with 10.000 elements while dynamic array (TList) has to re-size the whole memory space
  • Linear iteration by index or local access is as fast as with dynamic array because of the trick with the last accessed slice pointer
  • True random access by index is slower by a magnitude 2-3. But it depends on the number of slices
  • enumeration is very fast, especially by pointers
  • I did also a “raw” access comparison where I work with PAnyValue so I avoid record copies.

 

It is a very long post, sorry for that but I really want to hear your honest opinion on this solution. Is it viable, what do you think of it? I find it very promising and very flexible in all usage scenarios. The only downside I see is complexity. Do you find truly random access to slow? Remember that was 1.000.000 access operations over 1.000.000 elements.

There is also a big update of code on my download section:

  • Latest TAnyValue with array support, name-value, pairs etc….
  • Hashing classes have been improved and are 2x faster now
  • TTaskPool has been improved and is faster with less locking, also TThreadQueue. TLockFreeStack has been rewritten
  • IPC is now faster because of all the changes. It went under 0.1 ms for Client->Server->Client cycle with full payload

TAnyValue and array handling

by Iztok Kacin in Coding

Until now, I was only focusing on speed improvements and memory consumption for TAnyValue. But if any of you think this is all I wanted to do with it, then you are wrong. All along I had plans to extend the functionality of TAnyValue and make a true powerhouse out of it. My goal is to make it an efficient all around data carrier and to make a powerful interface on top of it. Something that will blow Variant type and TValue away.

The first major add-on is array handling. Any value now has premium array handling support 🙂 I will cut this short and just go to examples. Let me just mention that not only array is now supported but I added support for many basic types that were still missing and also support for Variants. But there is more to come in the future. Ok to the examples then.

I have defined a new record type in Cromis.AnyValue.pas. Its called TAnyArray and it is a powerful wrapper around TAnyValues which is an dynamic array of TAnyValue and also new. TAnyArray can be used as a standalone variable with no problems, but it is also a part of TAnyValue. Lets say you need a new array stored inside TAnyValue (the type is avtArray). You simply write it like this:

  AnyValue.EnsureAsArray.Create([5, '5', 1.22, AnyValues([nil, MyObject, 6])]);

This created an array with values and additional array inside the first one. Yes arrays can be nested with no limitations. Also each array is again a TAnyValue. Actually the Create does not create the array it just initializes it. The creation comes from EnsureAsArray. For those familiar with my SimpleStorage you know that Ensure returns something or creates a new instance if not already there. In this case it ensures that an array is returned. Also if AnyValue contained some other data type, it clears it and sets the type to avtArray.

Now how do you access the array values. Simple, and actually you can do it two ways. Lets say you want a second element:

  AnyValue[1].AsString;
  AnyValue.GetAsArray.Item[1].AsString;

You can treat AnyValue as TAnyArray. The default property handles that for you. But if you want the more advanced stuff, you can call AnyValue.GetAsArray, which returns the array, or raises exception if AnyValue is not of type avtArray. Array also has a neat function called GetAsString. It returns the whole array as string and can be very useful. Lets say we have an empty array and we want to push some new items into it:

  AnyValue.GetAsArray.Push([5, '4.5', AnyValues([7, '5', 3, AnyValues([1.2, 3, '5'])])]);
  AnyValue.GetAsArray.GetAsString;

The result will be:

5,4.5,[7,5,3[1.2,3,5]]

You can also chose the delimiter. If you want tab for delimiter just call it like this:

  AnyValue.GetAsArray.GetAsString(#9);

Now lets see what other candy the TAnyArray holds

  MaValue := AnyValue.GetAsArray.Pop;
  MyClone := AnyValue.GetAsArray.Clone;
  AnyValue.GetAsArray.Reverse;
  MySlice := AnyValue.GetAsArray.Slice(2,6);
  AnyValue.GetAsArray.Sort
  (
    function(Item1, Item2: PAnyValue): Integer
    begin
      Result := StrToInt64Def(Item1.AsString, -1) - StrToInt64Def(Item2.AsString, -1);
    end
  );
  AnyValue.GetAsArray.Clear;
  AnyValue.GetAsArray.SetCapactiy(10000);
  AnyValue.GetAsArray.IndexOf(5);
  AnyValue.GetAsArray.Contains(5);
  AnyValue.GetAsArray.Delete(5, True);
  AnyValue.GetAsArray.Delete[5, '4', nil];
  AnyValue.GetAsArray.SaveToStream(MyStream);
  AnyValue.GetAsArray.LoadFromStream(MyStream);

As you can see, you can save and load from / to stream without being aware of types, arrays, nested depth, it just works. You can delete all instances of an item, you can delete many items at once, you can slice, clone etc…. All that comes in a lightweight package of a record with support for any type you wish to store in it.

And for the finish let me show named values support which is handled with arrays but each such value is a unique TAnyValue instance with type avtNamedValue and a very flexible hash list also with TAnyValue support.

If you want to have name-value type of variables all you have to do is.

  AnyValue['Delphi'] := 'XE3';

or

  AnyValue.GetAsArray.AddNamed('Delphi', 'XE3');

Both are again the same. Named item is added to the array if it does not yet exist. If it does, only the value is assigned. Value is TAnyValue. Also for all arrays, you have enumeration support:

  for Value in AnyValue do
    // do somehing

or

  for Value in AnyValue.GetAsArray do
    // do somehing

And as a final example, the hash that is much more convenient then the old hash where you could only add pointers and could only have strings for keys. Here both the key and the value are TAnyValue. Furthermore it is very easy to make new type of hashes with different hashing functions. You only derive from TCustomHashTable and override some functions. Lets see how easy it was to make string and cardinal key hashing classes.

  // hash table that uses cardinals as keys
  TCardinalHashTable = class(TCustomHashTable)
  protected
    function DoCalculateHash(const Key: TAnyValue): Cardinal; override;
  public
    constructor Create(const Size: Cardinal = cDefaultHashSize);
  end;
 
  // hash table that uses strings as keys
  TStringHashTable = class(TCustomHashTable)
  protected
    function DoCalculateHash(const Key: TAnyValue): Cardinal; override;
  public
    constructor Create(const Size: Cardinal = cDefaultHashSize);
  end;

The whole implementation is here

{ TCardinalHashTable }
 
constructor TCardinalHashTable.Create(const Size: Cardinal);
begin
  inherited Create(Hash_SimpleCardinalMod, Size);
end;
 
function TCardinalHashTable.DoCalculateHash(const Key: TAnyValue): Cardinal;
var
  KeyData: Cardinal;
begin
  KeyData := Key.AsCardinal;
  Result := FHashFunction(@KeyData, FSize);
end;
 
{ TStringHashTable }
 
constructor TStringHashTable.Create(const Size: Cardinal);
begin
  inherited Create(Hash_SuperFastHash, Size);
end;
 
function TStringHashTable.DoCalculateHash(const Key: TAnyValue): Cardinal;
var
  KeyData: string;
begin
  KeyData := Key.AsString;
 
  if KeyData &lt;&gt; '' then
    Result := FHashFunction(@KeyData[1], Length(KeyData) * SizeOf(Char)) mod FSize
  else
    Result := 0;
end;

Simple isn’t it. You can make whatever type of hash table you want. Also the already available string one uses SuperFastHash from “Davy Landman” You can look the licence for more details.

The use of string hash for example is trivial

  MyHashTable.Add('Delphi', 'XE3');
  MyHashTable.Add('Delphi', 2010);
  MyHashTable.Add('Delphi', nil);
  MyHashTable.Add(2013, 'Integer');
  MyHashTable.Add(4.5, 'Float');
  MyHashTable.Add([4.5, '2'], 'Even arrays would work');

All of this just works, the key is always taken as string and the value is TAnyValue. You see how much power there is in TAnyValue, all in a fast and memory friendly package. And you still retain type safety, every TAnyValue instance carries the value type inside and if you try to convert to something you cannot, an exception is raised.

If you want to see the demo for the arrays and named-values support you can download it here along with the speed test. Cromis.Hashing which contains the hash table classes is already available if you download the whole Cromis Library. I will add TAnyValue and all the new features soon as a separate download, but I still have to write some demo programs and tests to be sure everything works as it should. Meanwhile all constructive criticism and ideas are very welcome.

TAnyValue, an attempt to make the best “variable” data container

by Iztok Kacin in Coding

And the saga continues :). If you don’t know what the hell I am talking about read the first three articles in the series on this subject:

The latest resulting TAnyValue implementation presented in the last article was good, fast and had resonable memory size (12 bytes). But two things bothered me to no end:

  1. I could not finalize the record when it goes out of scope, so I can’t properly clean up, resulting in non-optimal solutions for data fields (dynamic array, IInterface…)
  2. I had to use three fields for data. One dynamic record, one IInterface and one dynamic array. This was so because I needed to store strings and interfaces along with extended type and have all that clean on its own when record falls out of scope. Yes I could only use IInterface for all non trivial data types but the speed goes down the drain. Having three fields was fine on 32 bit, resulting in 12 bytes size of the record, but on 64x is goes up to 24 bytes. I was not happy at all.

I don’t give up so easily so I pondered what are my options. I really do not understand why in all these years Delphi development team did not make records with destructors and constructors available. It would just take them one simple call to a procedure for each. Maybe I am not seeing something, but it would be trivial for them to make it. Doing that would open vast possibilities for developers. So I had to find a way to fix that shortcoming.  The only solution I could see working, was hooking some of the procedures in System.pas. The main targets would be:

  • _FinalizeRecord
  • _CopyRecord
  • _AddRefRecord

The most important here is _FinalizeRecord which is called by the compiler when record goes out of scope and such a record contains an open array or IInterface or a string variable. Basically each variable that needs finalization triggers the call to _FinalizeRecord. The problem is that this is compiler magic. Compiler knows at compile time what the record holds and calls the procedure if needed, when the record goes out of scope and is freed. So I had to hook it and make sure that it gets called for TAnyValue. I don’t like hooks, I see them as last resort to solve a particular problem. Here this was a last resort and it was not a system wide hook (which should be used only in very, very special cases). Before I would use the hook, I needed to see if certain conditions were meet.

  1. The hook needs to be stable under 32 and 64 bit.
  2. There should be minimal to no overhead from the hook
  3. The hook should not interfere with other hooks and with original procedure
  4. You should avoid hooking procedures or functions with very high call frequency

To answer that:

  1. I used KOLDetours.pas which will become part of the Cromis library. The unit is very stable and work equally well under 32 and 64 bit. Also the licence is very liberal and I can easily include it in my library. Naturally I will retain the licence and made perfectly clear who is the author and who the credit goes to.
  2. The overhead is basically non existent.  All I do is one simple pointer comparison and then I either call my finalize or the original finalize.
  3. Because this is not a simple patch but a detour I just call the original _FinalizeRecord if the record is not TAnyValue. Also this technique allows for multiple hooks to coexist.
  4. _FinalizeRecord and _CopyRecord have not such a high frequency as GetMem or FreeMem and similar. They are frequently called but because of no overhead that is not an issue. I did not hook _AddRefRecord because I believe that it is not even ever called for the solution I made.

Ok to the implementation then. How does it all work? First we hook the procedures and get the pointer to the TAnyValue TypeInfo.

initialization
  vTypeInfo := TypeInfo(TAnyValue);
  OldCopyRecord := InterceptCreate(GetCopyRecordAddress, @CustomCopyRecord);
  OldFinalizeRecord := InterceptCreate(GetFinalizeRecordAddress, @CustomFinalizeRecord);

Here we need the addresses of functions to hook. We can’t get them from pascal, because they are protected, compiler won’t see them that way. But we can get them with assembler.

initialization
function GetFinalizeRecordAddress: Pointer;
asm
{$IFDEF CPUX64}
  mov rcx, offset System.@FinalizeRecord;
  mov @Result, rcx;
{$ELSE}
  mov @Result, offset System.@FinalizeRecord;
{$ENDIF}
end;

You can get any address this way. Now that we have the addresses, we can write our improved and specialized _FinalizeRecord and _CopyRecord for TAnyValue

procedure FinalizeAnyValue(p : PAnyValue);
begin
  if p.ValueType &lt;&gt; avtNone then
  begin
    case p.ValueType of
      avtString, avtAnsiString, avtWideString: string(PValueData(@p.ValueData).VPointer) := '';
      avtInterface: IInterface(PValueData(@p.ValueData).VPointer) := nil;
    {$IFNDEF CPUX64}
      avtFloat: FreeMem(PValueData(@p.ValueData).VPointer);
    {$ENDIF}
    end;
 
    // set type to none and erase all data
    PValueData(@p.ValueData).VPointer := nil;
    p.ValueType := avtNone;
  end;
end;
 
procedure CopyAnyValue(dest, source : PAnyValue);
var
  dstData: PValueData;
  srcData: PValueData;
begin
  dstData := PValueData(@dest.ValueData);
  srcData := PValueData(@source.ValueData);
 
  if dest.ValueType &lt;&gt; source.ValueType then
  begin
    FinalizeAnyValue(dest);
    dest.ValueType := source.ValueType;
 
  {$IFNDEF CPUX64}
    case dest.ValueType of
      avtFloat: GetMem(dstData.VPointer, SizeOf(Extended));
    end;
  {$ENDIF}
  end;
 
  case source.ValueType of
  {$IFNDEF CPUX64}
    avtFloat: PExtended(dstData.VPointer)^ := PExtended(srcData.VPointer)^;
  {$ENDIF}
    avtInterface: IInterface(dstData.VPointer) := IInterface(srcData.VPointer);
    avtString, avtAnsiString, avtWideString: string(dstData.VPointer) := string(srcData.VPointer);
  else
    dstData^ := srcData^;
  end;
end;

All that is left are the detour functions.

procedure CustomCopyRecord(Dest, Source, TypeInfo: Pointer);
begin
  if vTypeInfo = typeInfo then
    CopyAnyValue(PAnyValue(Dest), PAnyValue(Source))
  else
    OldCopyRecord(Dest, Source, typeInfo);
end;
 
procedure CustomFinalizeRecord(p: Pointer; typeInfo: Pointer);
begin
  if vTypeInfo = typeInfo then
    FinalizeAnyValue(PAnyValue(p))
  else
    OldFinalizeRecord(p, typeInfo);
end;

See how little overhead there is. Basically none, I just compare two pointers and that is all. Also for other records then TAnyValue I just call the original functions. Here I have to say that Eric Grange who jumped in to the discussion at the last article helped me a lot with hooks and with record structure. So a public thanks to him for all the ideas and solutions he helped me provide with.

Ok the dirty hooking details are behind us. Now lets see how we did with the record structure. Because we can now cleanup properly, we can be much more creative with how we structure our record. I made the following structure.

  PAnyValue = ^TAnyValue;
  TAnyValue = packed record
  private
    ValueData: IInterface;
  {$HINTS OFF}
    {$IFNDEF CPUX64}
      Padding : array [0..3] of Byte;
    {$ENDIF}
  {$HINTS ON}
    ValueType: TValueType;
    function GetAsInt64: Int64; inline;
    ...

You may now wonder why the IInterface variable in the record. Well it is there for two purposes. First we need to trigger the _FinalizeRecord by the compiler and IInterface does just that. On the other hand it provides us with 4 bytes of space we can use. Now you probably know, why the additional array [0..3] of Byte. I need additional 4 bytes on 32 bit to store Int64, Double etc… directly without calling GetMem. On 64 bit IInterface itself is 8 bytes so no padding is needed. Quite a neat solution. It only takes 8 bytes on 32 bit and 64 bit per single record (plus one byte for type enumeration). Lets just look how data is stored and read. For most types its just like that

procedure TAnyValue.SetAsInteger(const Value: Integer);
begin
  if ValueType &lt;&gt; avtInteger then
  begin
    Self.Clear;
    ValueType := avtInteger;
  end;
 
  // assign the actual value
  PValueData(@ValueData).VInteger := Value;
end;
 
function TAnyValue.GetAsInteger: Integer;
begin
  if ValueType = avtInteger then
    Result := PValueData(@ValueData).VInteger
  else
    Result := GetAsIntegerWithCast;
end;
 
function TAnyValue.GetAsIntegerWithCast: Integer;
begin
  case ValueType of
    avtBoolean: Result := Integer(GetAsBoolean);
    avtString: Result := StrToInt(GetAsString);
    avtAnsiString: Result := StrToInt(string(GetAsAnsiString));
    avtWideString: Result := StrToInt(string(GetAsWideString));
    else
      raise Exception.Create('Value cannot be converted to Integer');
  end;
end;

Separate getters are because of the inlining and exceptions. Strings on the other hand are store like this

procedure TAnyValue.SetAsString(const Value: string);
begin
  Self.Clear;
  ValueType := avtString;
  string(PValueData(@ValueData).VPointer) := Value;
end;
 
function TAnyValue.GetAsString: string;
begin
  if ValueType = avtString then
    Result := string(PValueData(@ValueData).VPointer)
  else
    Result := GetAsStringWithCast;
end;
 
function TAnyValue.GetAsStringWithCast: string;
begin
  case ValueType of
    avtNone: Result := '';
    avtBoolean: Result := BoolToStr(AsBoolean, True);
    avtCardinal: Result := IntToStr(AsCardinal);
    avtInteger: Result := IntToStr(AsInteger);
    avtInt64: Result := IntToStr(AsInt64);
    avtFloat: Result := FloatToStr(AsFloat);
    avtDateTime: Result := DateTimeToStr(AsDateTime);
  {$IFDEF UNICODE}
    avtAnsiString: Result := string(AsAnsiString);
  {$ENDIF}
    avtWideString: Result := AsWideString;
    else
      raise Exception.Create('Value cannot be converted to string');
  end;
end;

This way we have reference counting which is very important. Same goes for interfaces. The only special one is Extended. It is like this.

procedure TAnyValue.SetAsFloat(const Value: Extended);
begin
  if ValueType &lt;&gt; avtFloat then
  begin
    Self.Clear;
    ValueType := avtFloat;
  {$IFNDEF CPUX64}
    GetMem(PValueData(@ValueData).VPointer, SizeOf(Extended));
  {$ENDIF}
  end;
 
{$IFNDEF CPUX64}
  PExtended(PValueData(@ValueData).VPointer)^ := Value;
{$ELSE}
  PValueData(@ValueData).VDouble := Value;
{$ENDIF}
end;
 
function TAnyValue.GetAsFloat: Extended;
begin
  if ValueType = avtFloat then
  begin
  {$IFNDEF CPUX64}
    Result := PExtended(PValueData(@ValueData).VPointer)^
  {$ELSE}
    Result := PValueData(@ValueData).VDouble
  {$ENDIF}
  end
  else
    Result := GetAsFloatWithCast;
end;

Because extended is 10 bytes on 32 bit systems, I refused to use that 2 additional bytes. It messes with aligning and its just not worth the trouble. The speed difference is negligible anyway. You might wonder why not just use Double? As you know Extended is mapped to Double on 64 bit anyway. I do use double. I have special getter and setter for Double type. The reason for using extended is twofold:

  1. Some people may need extended sometimes. The reasons may wary but still, they may need it.
  2. The compiler resolves I / 5 to extended type. So direct assignment fails.

This won’t compile if you have only Double on the implicit operators

  AnyValue := I / 5;

You have to write it like this

  AnyValue.AsDouble := I / 5;

The final tests, taken again, only on 32 bit XE3 are

Delphi XE3 test (times are in ms for 10000000 operations):

Type Variants TValue TAnyValue TOmniValue TVariableRec
j := I 149 266 87 270 62
j := I/5 236 316 101 4151 222
j := IntToStr(I) 3291 4856 2185 5776 1648
ALL 3933 5681 2462 12689 2205

The results show how close to the best speed, that a full blown record has, we came. Very close to the limit. I thinks this is really a very good variable container for data holding and transfer, way better then variants. And it can be further developed almost without limitations now that we can finalize. I also think that hooking is so stable and has no overhead that this is a perfectly viable solution. I will test this further and when the code is clean and well tested it will become the official TAnyValue.

You can download the current code along with the test here. It has to be added that the test only tests assigning variables and reading them. It does not address memory allocations / releases. I have a new test for that, but that is already way beyond the scope of this article.