TValue and other “Variants” like implementation tests – finale

by Iztok Kacin in Coding

I had no intentions to write any additional posts about TAnyValue. I thought it was more then enough and in my mind closed the subject. But then Stefan Glienke jumped in with great comments and test of his own. And as I love such discussions and challenges, I had to reopen the case. It seemed that under XE, XE2 and XE3 TValue was a lot faster. I already mentioned that. But now it seemed it was actually faster then TAnyValue for integer and float types. Something was not right and with Stefans help, I upgraded and tweaked the unit, until I got something I feel is truly fast now and uses little memory out of the box. The comments and previous post can be found here. What we did was:

  • Conditional defines were changed. Now out of the box TAnyValue uses very little memory (less then variants for instance and same as TOmniValue), way less then TValue. It trades a little bit of speed for that, but very little indeed as you will see in the tests. You can enable speed boost by defining “AnyValue_UseLargeNumbers” which uses more memory but is faster. It is advisable to enable this if you use mainly Int64, TDateTime or Extended types. You can disable interfaces as before and so save even more memory.
  • Stefan proposed removing inline from implicit class operators
  • Instead of using Move for copying Extended and Int64 values to byte array Stefan proposed following solution
procedure TAnyValue.SetAsFloat(const Value: Extended);
begin
  FValueType := avtFloat;
{$IFDEF AnyValue_UseLargeNumbers}
  FSimpleData.VExtended := Value;
{$ELSE}
  if Length(FComplexData) <> SizeOf(Extended) then
    SetLength(FComplexData, SizeOf(Extended));
  PExtended(@FComplexData[0])^ := Value;
{$ENDIF}
end;
 
function TAnyValue.GetAsFloat: Extended;
begin
  case FValueType of
    avtInt64: Result := GetAsInt64;
    avtInteger: Result := GetAsInteger;
    avtCardinal: Result := GetAsCardinal;
    avtBoolean: Result := Integer(GetAsBoolean);
    avtString: Result := StrToFloat(GetAsString);
    avtAnsiString: Result := StrToFloat(string(GetAsAnsiString));
    avtWideString: Result := StrToFloat(string(GetAsWideString));
    avtFloat:
      begin
        {$IFDEF AnyValue_UseLargeNumbers}
          Result := FSimpleData.VExtended;
        {$ELSE}
          Result := PExtended(@FComplexData[0])^;
        {$ENDIF}
      end
    else
      raise Exception.Create('Value cannot be converted to Extended');
  end;
end;

Doing all that, the speed was still considerably slower. Stefan claimed he got down to 200ms on the Extended tests, while I could not drop bellow 700ms. Something smelled there, so I asked him to send me the exact unit he used. When I got the unit I compared it with mine via Kdiff3 (great tool by the way). I noticed he removed inline only from the following class operator

  class operator Implicit(const Value: Extended): TAnyValue;

but left in on this one

  class operator Implicit(const Value: TAnyValue): Extended; inline;

I did the same and sure enough time dropped down to 200ms. I have no idea why such a difference with or without inline at that particular spot. Maybe someone more at home with assembler and inline mechanism can cast some light onto the matter. When I saw the impact I removed other inline directives and gained a lot for strings also. So without further delay, here are the complete tests for 2010 and XE3. It is also worth mentioning that Stefan made a wrapper for 2010 TValue implementation to bring it on par with XE3 speed. Maybe he will share it for those who have to use 2010 and TValue.

Delphi 2010 test (times are in ms for 10000000 operations):

Type Variants TValue TAnyValue TOmniValue TVariableRec
j := I 157 3176 69 82 65
j := I/5 184 3308 214 3410 190
j := IntToStr(I) 4562 10610 3857 6856 2917
ALL 5471 18915 5025 10640 3167

Delphi XE3 test (times are in ms for 10000000 operations):

Type Variants TValue TAnyValue TOmniValue TVariableRec
j := I 166 176 81 412 62
j := I/5 302 450 235 4007 328
j := IntToStr(I) 3429 5184 1701 6082 1945
ALL 4063 5731 3016 11407 2083

It is visible how much did I gain under XE3 for strings removing that inline. Why I don’t know as I already told. I will wrap it up here. I already bothered you to much with details and myself gave to much time into it. But hey it was worth it.

The test application can be downloaded here.

TValue and other “Variants” like implementation tests – revised

by Iztok Kacin in Coding

Recently I got an inquiry about the speed of my TAnyValue implementation. For the record TAnyValue is a TValue like implementation using records and implicit operators. It lacks in features compared to TValue but it is faster and it works on older Delphi versions (Delphi 2005 and up). I also like to tinker with things like this, so it was fun to play around with it. I was inspired by the TOmniValue which is part of OmniThreadLibrary. My goal was to eventually make it faster, to make it the fastest “variant” type implementation out there.

In the 2010 I did the initial tests with my initial version of the TAnyValue. You can find the results here. Back then TAnyValue was somehow on par with TOmniValue. It was quite slower then Variants and a simple variable record (which is the fastest it can be and is a good comparison on how fast you really are). TValue was catastrophic back then being by far the slowest solution because of generics which it used internally. I then improved my implementation over time being silent about it. But I came a long way and today my solution is the fastest out there. But It must be said that all solutions today are fast, the differences only matters if you use really a lot of these values (assignments) per second and you absolutely need the speed.

When doing such an implementation, the tricky part is to get a good balance between memory consumption and speed. The simplest and most crude solution would be to store every possiible data type you want to handle in the record as internal variable (field). This way you don’t have to explicitly assign memory or copy bytes. The compiler does it all and this is the fastest possible way. But it is also terrible in regards to memory consumption. I will make the case on my TAnyValue and the types it supports at this moment. If I had the simplest and fastest solution I would cover these types (I assume 32 bit compiler here):

  • Int64 (64)
  • Integer (32)
  • Extended (80)
  • WideString (variable)
  • AnsiString (variable)
  • String (variable)
  • Boolean (8)
  • TObject (32)
  • Pointer (32)
  • Cardinal (32)
  • TDateTime (80)
  • IInterface (32)

Ok this is a really dumb aproach but I want an upper memory consumption limit. Application with 10000000 such records, holding one integer each, consumed a whooping 706.244 KB of memory. A lot! On the other side of the spectrum you have two different solution. You can use variants for inner data storage but I really wanted to avoid them because what would be the point using them 🙂 Another very sleek approach is what TOmniValue has done. It uses one Int64 field for most of the data types with very smart assignments and one IInterface field for Extended and strings types. Basically for all types that need finalization as you cannot have a destructor in a record. The approach TOmniValue uses is good but if you work with strings and floating points a lot, it will be slow as interfaces are notoriously slow. I wanted something fast and still not to hard on memory consumption. So I came to this:

  TSimpleData = record
    case Byte of
      atInteger:   (VInteger: Integer);
      atCardinal:  (VCardinal: Cardinal);
      atBoolean:   (VBoolean: Boolean);
      atObject:    (VObject: TObject);
      atPointer:   (VPointer: Pointer);
      atClass:     (VClass: TClass);
      atWideChar:  (VWideChar: WideChar);
      atChar:      (VChar: AnsiChar);
    {$IFDEF AnyValue_UseLargeNumbers}
      atInt64:     (VInt64: Int64);
      atExtended:  (VExtended: Extended);
    {$ENDIF}
  end;
 
  TAnyValue = packed record
  private
    FValueType: TValueType;
  {$IFNDEF AnyValue_NoInterfaces}
    FInterface: IInterface;
  {$ENDIF}
    FSimpleData: TSimpleData;
    FComplexData: array of Byte;
    ...
  end;

I use three fields. I use IInterface only for interfaces, thus the conditional define, so you can easily turn them off if you don’t need them and so save memory. Also the trick here is in the variable record. The good side of this record is, it only takes the amount of memory that the largest field does. In my case this is 32 bit, unless “AnyValue_UseLargeNumbers” is defined, then this is 80 bit. This way I cut down on size dramatically, by 48 bit per record. Finally ,there is a dynamic array of bytes, for strings and floating point values if “AnyValue_UseLargeNumbers” is not defined. So lets look that the memory consumption compared to others (again holding 32 bit integers and 10000000 records):

TAnyValue (no defines): 128.960 KB
TAnyValue (AnyValue_NoInterfaces): 89.812 KB
TAnyValue (AnyValue_UseLargeNumbers): 246.376 KB
TAnyValue (AnyValue_NoInterfaces and AnyValue_UseLargeNumbers): 207.228 KB
TOmniValue: 128.960 KB
TValue: 246.376 KB (wow not only is TValue slow but takes a lot of memory)
Variants: 158.308 KB

The only problem when not using “AnyValue_UseLargeNumbers” is that if you then actually use floating point types, you will consume more memory then without that directive. And you will be a little bit slower. You can still use floating points but they should not be in majority. So you can tweak TAnyValue to step up to the task at hand. I should also add that numbers would naturally be different if other data type would be used. But for overall picture 32 bit Integer is just fine.

Now lets look at speed results. The test application is the same as it was last time.

Type Variants TValue TAnyValue TOmniValue TVariableRec
j := I 178 3431 62 108 68
j := I/5 187 3968 230 4054 199
j := IntToStr(I) 5491 10593 4342 6632 2728
ALL 4479 19541 4158 10966 3140

As you can see TAnyValue gained a lot of speed, everything else is the same as it was back then. I also did the full test on XE3 to see if TValue improved in any way. The results are bellow

Type Variants TValue TAnyValue TOmniValue TVariableRec
ALL 3846 6497 3318 10613 2062

It is clear that they worked on TValue which is now fast enough for use. In fact it is very fast. Given together with flexibility it is a powerful tool.

Probably a lot of you wonder why even bother. I bother because I can. I like to tweak the code and see if I can make it even better. So if any of you have ideas how to make it even smaller in regards to memory consumption and retain the speed throne, please let me know 🙂

 

P.S.
If you are looking for TAnyValue you can find it as a part of Cromis Library in the downloads section.

Cromis.IPC and Cromis.IMC updated

by Iztok Kacin in Coding

Small but very important updates were added to Cromis.IPC and Cromis.IMC units.

Now both units have the same syntax and use the same usage pattern. And both units share the same data packet format. Cromis.IPC also uses Connect now, to connect to the server side. This saves some time, if you send more then one packet over, but not much. It also ensures your worker thread on the other side will be waiting for you, until you disconnect and so you save on pool thread work, that does not have to assign new threads for each request.

WARNING. This is a breaking change for Cromis.IPC. The syntax stays the same so all your old code will compile and work without problems. But as the protocol is slightly different, due to connect and disconnect, new client won’t work with old server and vice verse. So you have to recompile both sides. There is also a fix for 100% client CPU consumption, if the server was not available. For other changes look at the change log. I have updated the downloads pages and examples accordingly.

Also, as you have noticed, I had to change the template unfortunately. I updated word-press and my old beloved minimalistic template finally stopped working 🙁 This is a temporary one until I find a good new simple template, with fluid layout.

Ok, the old template is still working, but I am afraid that with every word-press version there is a bigger chance it will stop to work properly. Well I will stick with it to the bitter end 🙂

Anyway I will be blogging about cross platform HTML5 development for smart phones if somebody is interested in that? Worked a lot on it lately and have some good info about that. If the community is not interested I will maybe start that in a new blog. Some info on that would be nice.

I will also blog about how I see a good development setup at your home / office. I aim at VMWare, multi monitors, NAS etc… Is anybody interested in that?

Cromis Library updated

by Iztok Kacin in Coding

A fairly big update was just commited. The main focus is on 64 bit compatibility.

  1. Cromis IMC added: IMC stands for inter machine communication. Just as my IPC which is inter process oriented, this aims at easy, message oriented communication between machines. Forget about TCP/IP, Indy, Synapse, ICS or any other technology. You want to send a message with data from one machine to the other and not worry about how to technically do that. IMC offers just that. Its fast its easy to use and abstracts the communication layer from you. Another good thing is that it uses exactly the same message carrier as IPC does. This basically means all the code you used in IPC for preparing the messages will work here. You can also chain data from IPC to IMC. The code uses Indy as TCP layer as that guarantees that it will work on any new delphi version. For now it is Indy 10 only but if there will be demand I can make it Indy 9 compatible.
  2. Cromis IPC:

    Change Log

    • 1.3.1
      • Added error description for the client
    • 1.3.0
      • 64 bit compiler compatible
    • 1.2.2
      • Improved wait for ERROR_IO_PENDING
      • Usage of CommTimeouts
  3. Cromis Threading:

    Change Log

    • 1.5.0
      • 64 bit compiler compatible
    • 1.4.3
      • Added StopAllTasks for TTaskPool
    • 1.4.2 (breaking change)
      • TTaskQueue is not only available as ITaskQueue interface
  4. Cron Scheduler:

    Change Log

    • 2.1.0
      • 64 bit compiler compatible
  5. Cron Scheduler:

    Change Log

    • 1.1.0
      • 64 bit compiler compatible
      • Fixed thread termination bug

 

Cromis.Threading

by Iztok Kacin in Coding

I have received quite a few mails recently, from people telling me, how I made their life easier using my code. I am really glad some of you find my code useful and easy enough to use. I made it public in case someone finds it useful.

I also made demo applications for most parts of the Cromis library, but one unit has almost no documentation and a lot of hidden content. This is the Cromis.Threading unit. Part of why this is so, is because this unit was made as a helper unit for Cromis.IPC. It contains the task (thread) pool that is used by Cromis.IPC. I needed my own lightweight implementation of a task pool so I wrote one. Then with time some other functionality regarding threading came into this unit. Mostly because I needed it here or there. But the side effects  of this are that this functionality is not documented and probably not so easy to use for someone who is not very familiar with the code. The e-mails I got recently just prove that. So I decided I will quickly write a few examples of how to use the code and show all of the functions that this unit provides.

TTaskPool

This is the most obvious class that gives you control over a pool of tasks (threads). You start using it like this

procedure TfMain.btnStartClick(Sender: TObject);
var
  Task: ITask;
begin
  FTaskPool.DynamicSize := cbDynamicPoolSize.Checked;
  FTaskPool.MinPoolSize := StrToInt(ePoolSize.Text);
  FTaskPool.OnTaskMessage := OnTaskMessage;
  FTaskPool.Initialize;
 
  tmPoolStatus.Enabled := False;
  btnStart.Enabled := False;
  btnStop.Enabled := True;
  FTerminate := False;
 
  while not FTerminate do
  begin
    Task := FTaskPool.AcquireTask(OnTaskExecute, 'RandomTask');
    Task.Values.Ensure('RandomNumber').AsInteger := Random(tbThreadTimeout.Position);
    Task.Run;
 
    pbPoolSize.Position := FTaskPool.PoolSize - FTaskPool.FreeTasks;
    stFreeThreadsValue.Caption := IntToStr(FTaskPool.FreeTasks);
    stPoolSizeValue.Caption := IntToStr(FTaskPool.PoolSize);
    Sleep(Random(tbCreationTimeout.Position));
    Application.ProcessMessages;
  end;
end;

You have two important properties here that I will explain:

DynamicSize:

This boolean property controls if the size of the pool is dynamic. Let me explain. If you start with MinPoolSize of 20 and DynamicSize is FALSE then when all 20 threads are used, the pool will assign a new thread for each request it needs. So it will adjust to the peak load of the pool. But it will then stay at that peak number of threads. If your peak is at 60 it will stay there even if the load will then drop. But if DynamicSize is TRUE it will destroy unneeded threads until you again have the 20 (MinPoolSize) of threads. In other words it will dynamically adjust to the load. Each may have its uses.

MinPoolSize:

This one is simple. It is the number of threads you start with. You cannot have less then MinPoolSize of threads in the pool.

Ok now lets look at other parts of the workings of the pool. First is when the each task is executed:

procedure TfMain.OnTaskExecute(const Task: ITask);
var
  Interval: Integer;
begin
  Interval := Task.Values.Get('RandomNumber').AsInteger;
  try
    Task.Message.Ensure('Result').AsInteger := Interval;
    Sleep(Interval);
  finally
    Task.SendMessageAsync;
  end;
end;

And the second is processing the messages that tasks send back:

procedure TfMain.OnTaskMessage(const Msg: ITaskMessage);
var
  Interval: Integer;
begin
  Inc(FTaskCounter);
  Interval := Msg.Values.Get('Result').AsInteger;
  stThreadsFinishedValue.Caption := IntToStr(FTaskCounter);
end;

As you can see all is very straightforward. Before the task is run, you fill in the values of the task and then run it. You write the code for each task and each task can send back messages to the main thread. You can do that in two ways:

Task.SendMessageAsync;
Task.SendMessageSync;

Each one speaks for itself.

Let me be clear here. This is a simple implementation of the task pool build for my internal needs. Some find it usefull and that is great. But it is in no way comparable to the OmniThreadLibrary.

TThreadSafeQueue

This is a simple implementation of the thread safe Queue, that uses locking. It is very spartan and fast.  Gabr wrote about it doing tests, some time ago:

http://www.thedelphigeek.com/2011/05/lock-free-vs-locking.html
http://www.thedelphigeek.com/2011/06/lock-free-vs-locking-rematch.html

The usage is very straightforward so no need to write about that.

TLockFreeStack

This class is a simple wrapper around the windows API and it enables the use of lock free stack.

 

ITaskQueue

This is a task queue that enables you to queue tasks even if the are run in multiple threads. This will ensure that your tasks will be executed in order that you want. The usage is very simple:

You create it like this:

  FTaskQueue := AcquireTaskQueue;

Then you enqueue

  FTaskQueue.EnqueueTask.WaitFor;

and dequeue

  FTaskQueue.DequeueTask;