Is it possible to build a better dynamic array? Presenting sliced array.
From the last post about array handling in TAnyValue (and in general), I was looking to make the IAnyArray as powerful and useful as I could. As I told you, my goal is to make a strong infrastructure around and on top of TAnyValue. And so I thought about it. Simple TList clone with more features was not something I was very thrilled about. So I looked the web about improving upon dynamic array structure. To my surprise, I found very little and even that was not what I wanted. I presume most of you know what dynamic array is and know its strengths and weaknesses. But lets just summarize.
1. Dynamic array
Very simple structure. It uses contiguous chunk of memory of same type of elements (integers, string, record, object…). Its strengths:
- Adding element to the end: O(1)
- Accessing n-th element by index: O(1)
- Removing element from the end: O(1)
- Good caching, fast indexed walk-through because of caching.
And its weaknesses:
- Removing element from the middle: O(n)
- Adding element to the middle: O(n)
- Can have problems with resizing at large sizes because of memory fragmentation
2. Single linked list
Also very simple structure. It uses pointers to link each record in one way (double linked lists do it both ways). Its strengths:
- Adding element to the end: O(1)
- Removing element from the middle: O(1)
- Adding element to the middle: O(1)
- Removing element from the end: O(1)
- Easily re-sized, because we only need to allocate memory for one new element at a time.
And its weaknesses:
- Accessing n-th element by index: O(n)
- Bad caching, elements can be all over the memory
Also found this nice table comparing them both:
| Operation | Array | Singly Linked List |
| Read (any where) | O(1) | O(n) |
| Add/Remove at end | O(1) | O(1) |
| Add/Remove in the interior | O(n) | O(1) |
| Resize | O(n) | N/A |
| Find By position | O(1) | O(n) |
| Find By target (value) | O(n) | O(n) |
So as you see each has its strengths and its weaknesses. If we could just take best of each one and make an efficient hybrid, all would be fine. Well that is not so easy to do. If you go one way, you suffer on the other end and vice verse. But after a lot of thinking and testing, I found a good implementation of something i now call sliced array. It is a hybrid, that tries to take best of both worlds and I think it mainly succeeds at that. It has a lot of strengths and very, very little weaknesses. It looks like this:
I “sliced” the large array into several small arrays and then turned everything into a double linked list. I also added a look-up control array which is very important as you will see later. So how does it all work. Lets look at how a single slice is implemented:
TArraySlice = record Count: Int64; Lookup: PSliceLookup; ArrayData: TAnyValues; NextSlice: PArraySlice; PrevSlice: PArraySlice; end; |
Simple actually. It holds an array of TAnyValue, pointers to previous and next slice, pointer to look-up record and count of elements it holds. Look-up record looks like this
TSliceLookup = record Index: Integer; SumCount: Int64; Slice: PArraySlice; end; |
This is even simpler. It holds the index in the look-up array, pointer to slice and count of all elements in all the slices to this one + elements in this slice that the look-up points to. Now for the goods and bads of the sliced array.
Its strengths:
- Adding element to the end: O(1)
- Removing element from the middle: O(SliceSize)
- Adding element to the middle: O(SliceSize)
- Removing element from the end: O(1)
- Good caching. Slices are contiguous in memory and so it supports locality of access, which is very important
- Easily re-sized because we only need to allocate memory for one new slice. No need to re-size whole memory space used by the structure.
And its weaknesses:
- Accessing n-th element by index: Hard to tell without analyzing mathematically, but its still fast
So we only have one weakness, but that could be a huge weakness because random access by index is very, very important. If we can’t do it very, fast then the structure is not worth the trouble. After all we want a general purpose structure, a workhorse like TList.
To make it, I implemented a few tricks. I can use the look-up structure. The dumbest way would be to just go over the look-up array and check if our index we are searching falls inside the lower and upper index bound of the slice. Once the slice is found we just do the same as for classic dynamic array. So that would be O(SliceCount + 1). Not to shabby if we don’t have to many slices but not near good enough. Specially because the more slices we have, the more we trim down internal insert and delete. I used two tricks to improve the search for the correct slice. First one is a helper pointer that always points to the last slice accessed. So if we do linear iteration from 0 to Count -1, we do not search for the correct slice, ever. We just use the pointer to the last accessed slice and do the offset to the correct element. When we come to the end of the slice we just jump to the next one. Simple but very efficient. This way also the locality of access is supported if slices are big enough.
The next trick is (for truly random access) to calculate where the index should be (it does not mean it is there). If we have 100 elements in each slice and we want to access 350-th element we can assume that it is in the 4-th slice. So we jump directly there. If we missed, and we can miss as slices get degraded from deletions and insertions, we did not miss by much, we just check if we need to go left or right and move until we find the correct slice. It turns out this is also very fast, as predictions are good enough.
Furthermore something helps us here. If we do no deletions and insertions then the slices are in perfect condition and each prediction will be a hit. So we have an O(2) operation always. Tests will demonstrate that. If we have degradation it is slightly worse, but not by much, and we gained a lot on insertions and deletions. The key here is we take care of degradation. I do this by merging slices if the count falls under half the slice size and I split the slice if the count rises to 2x the slice size. This way the structure is never to degraded.
Ok you may say, but all this comes into play with a lot of elements. For small arrays it does not matter. True, that is why default slice size is 10.000 elements and under that you basically have a single dynamic array with one look-up pointer. I also do a check before doing indexed access and if only one slice is present I just jump directly to that one, no searching is done. The only real downside of the structure is a quite large complexity of it and the algorithm to manage it. Also it will never be truly as fast for random indexed search, if nothing else, because I just call more code opposed to returning n-th element from array.
Probably you won’t believe everything on my word, so here are the tests made on Delphi 2010. First one is for small number of elements (5.000) so we only have one slice and the other is for (1.000.000) elements. Here we have 100 slices (as default size is 10.000). I tested my IAnyArray that now uses this structure for data storage, TList<TAnyValue> as this was a fair reference point for dynamic array comparison. Then I also tested TList<Variant> and TList<Integer> for the fun of it
Small array test (5.000e lements) – times are in ms
Large array test (1.000.000 elements) - times are in ms
Just some clarifications.
- Deletes and inserts are 100 times faster. It makes sense. I have to move 100 times less elements in memory because I have 100 slices
- Adding elements is also faster. Simply because I just create new slice with 10.000 elements while dynamic array (TList) has to re-size the whole memory space
- Linear iteration by index or local access is as fast as with dynamic array because of the trick with the last accessed slice pointer
- True random access by index is slower by a magnitude 2-3. But it depends on the number of slices
- enumeration is very fast, especially by pointers
- I did also a “raw” access comparison where I work with PAnyValue so I avoid record copies.
It is a very long post, sorry for that but I really want to hear your honest opinion on this solution. Is it viable, what do you think of it? I find it very promising and very flexible in all usage scenarios. The only downside I see is complexity. Do you find truly random access to slow? Remember that was 1.000.000 access operations over 1.000.000 elements.
There is also a big update of code on my download section:
- Latest TAnyValue with array support, name-value, pairs etc….
- Hashing classes have been improved and are 2x faster now
- TTaskPool has been improved and is faster with less locking, also TThreadQueue. TLockFreeStack has been rewritten
- IPC is now faster because of all the changes. It went under 0.1 ms for Client->Server->Client cycle with full payload




Leonardo Herrera wrote,
I did something like this in the past, but for records stored in files (needed a fast method to retrieve list of objects associated to a single ID in a several-terabytes distributed storage.) The only problem I found is that these kind of structures need grooming from time on time. We resorted to a weekly process of defragmenting files.
My indexes were more complex (needed to know which files (your segments) contained records belonging to each ID) and for that I used a splay tree, if memory serves me right.
In summary, I think these kind of structures are not only good, but necessary for certain problems. If you manage to hide its complexity, well, everything is okay (we use Generics all the time, and they are pretty complex on the inside.)
Link | March 1st, 2013 at 3:38 pm
Iztok Kacin wrote,
Hi Leonardo.
I did hide the complexity. On the outside it behaves just as an ordinary dynamic array. I also once programmed a Radix trie (http://en.wikipedia.org/wiki/Radix_tree) for very fast text search directly on the hard drive. You could search gigabytes of text very fast.
But my spliced array does not need a lot of grooming. When I delete or insert an item I just check if the slice degraded to much and merge / divide it. I have no special “garbage” collection outside of that. And it is not needed. Some degradation does not hurt it to much.
Link | March 1st, 2013 at 3:51 pm
ObjectMethodology.com wrote,
Seen these before in the DB world. But, cool and thx.
Link | March 1st, 2013 at 5:27 pm
Iztok Kacin wrote,
@ObjectMethodology
If you have any links, please do share. I am glad you like it
Link | March 1st, 2013 at 5:42 pm
Eric wrote,
Performance of these is really dependent a lot on what you do with them, arrays that are accessed by index but rarely changed take a significant hit, even if the overhead is limited, it can hurt because it affects the most common operations.
Also for enumeration without a random index (ie. via GetEnumerator or similar), you have to factor the overhead of the enumerator, and the copy of the elements if they are managed or structured and you don’t want to use pointers.
Last aspect is that for sequential scans on large arrays (larger than the CPU cache), the slicing can defeat the predictive CPU prefetchers.
Link | March 1st, 2013 at 6:08 pm
Iztok Kacin wrote,
@Eric
That is exactly my concern. How much does the performance suffer for heavy index accessed based problems. It still fast yes, 200ms (thats the best I can do with all the trics) for 1.000.000 is very fast, but on the other hand it is 2-3 times slower then pure dynamic array. But as soon as you throw the deletes and inserts into the mix this structure is the definitive winner.
I am not worried about sequential scan. I think that it should work well enough. I even see the benefit because you do not have one huge memory block but several large ones instead.
I will probably add one property where you can simply say you just want one slice (eg. classic dynamic array) and it should be as fast as the classic dynamic array.
Link | March 1st, 2013 at 8:38 pm
Eric wrote,
You could try different random index patterns, such as performing a quicksort, reverting the order of the elements, performing a binary search, etc. ie. try to use the array in realistic heavy-duty situations.
When there are many inserts or deletes, a sliced array won’t be competing against a monolithic arrays, but against trees, hashes, etc. which are structures that further sacrifice random access, but can gain more on other aspects.
Link | March 1st, 2013 at 11:29 pm
Pol wrote,
This reminds me alghoritms classes
We were implementing List-like structure (in terms of flexibility) in C using arrays. Great memories
Link | March 1st, 2013 at 11:38 pm
xor wrote,
Nice.. Btw, did you try to compile the other units in Cromis Library with this version of TAnyValue?
It won’t compile due to not finding Value.Data (i.e TAnyValue.Data) which is passed to TElementNode.Create (which expects a IAnyValue) in Cromis.SimpleStorage.Builder.pas.
Is that an old exposed field, or similar?
Link | March 2nd, 2013 at 1:45 am
Iztok Kacin wrote,
@xor
Thanks for the info. Yes SimpleStorage.Builder did no compile due to some internal changes to TAnyValue.I have already uploaded the fixd and updated code. Thanks.
Link | March 2nd, 2013 at 9:02 am
Jarto wrote,
Have you tested this with multiple threads? That helper pointer for last slice sounds like it requires a TCriticalSection.
Link | March 3rd, 2013 at 10:59 am
Iztok Kacin wrote,
@Jarto
You are correct it would require a critical section which is not good for multiple thread reading. But that is obsolete right now. I am already working on an improved version of sliced array, which has faster indexed access and does not need helper pointer anymore. Stay tuned
Link | March 3rd, 2013 at 11:57 am
Tommi Prami wrote,
Hello,
These has been very interesting articles to read.
I was thinking that it would be also valuable and interesting to have some kind of (I would prefer SVN) repository. (For the Cromis library).
Would be much easier to follow the changes that way (IMHO)…
Anyways, keep up the good work, as you can seen you have been noticed
Link | March 4th, 2013 at 9:46 am
Iztok Kacin wrote,
@Tommi
Thanks. I am glad you liked the articles
I already have an internal SVN for all my code and projects. I could never work without one. I have two options here. I can expose my SVN to public, but only the Cromis code. Or I can put the Cromis code on a public SVN and somehow synchronize them. I have thought about this, because it would also be easier for me to push out new versions.
Link | March 4th, 2013 at 12:36 pm
A. Bouchez wrote,
Nice article, with pretty clear content and diagrams.
I suspect that if you want to insert some in-place sorted data, TAnyArray.CheckSliceSize method may be called often. I mean, if you add items at the end, it will be very fast. But if all your inserts shall be done within the main content (e.g. you want to insert random data in a sorted way), your code will add some slices at the middle of the slices. Using an external sorted index will solve the problem, but with some additional storage, and only to some extend. But I guess the average benefit will be much better than using one dynamic array. It is better to have TAnyArray.CheckSliceSize moved the data once you reach the FSliceSize*2 limit, than moving the data for each insertion, as a dynamic array does.
Code is very easy to follow, by the way.
Nice work!
Do you have any regression tests included?
Such a library would benefit to publish the regression tests, with extended code coverage and as much border-cases available as possible.
By the way, as other wrote, a SVN/git/Fosssil repository is a need, if you want to share your great code with ease!
Link | March 6th, 2013 at 9:14 am
Iztok Kacin wrote,
@A. Bouchez
Thanks for the kind words. Your fame preeceds you
You do nice work on Synopse and SO. As for the sliced array, I acutally made a new version that is even better. The problem of the implementation in the post is, that the slices are of variable sizes. And even if I find the correct one pretty fast, that is still not fast enough. I now have all the slices to be of constant size. I solved the problem by making each slice a little larger then FSliceSize. If I have 10.000 elements in a slice, I now make a slice actually 10.500 or 11.000 elements large. This way each slice has a constant size, so I can access the elements by index in O(1). Well actually it is still sligtly slower then pure indexed access as I have to make a function call and a div and mod calculation, but it is so very close. For instance before in my test random access for 1.000.000 elements was around 270 ms (bear in mind there are actually Random calls for each access). Now I came down to 100-120ms. Classic TList is around 90ms. And I retained the around 120-150ms for insert and delete. I made this with that spare space. I only have to reposition the whole slice when I run out of buffer space in a slice. It works great, I actually think I made a structure that is better then classic dynamic array for most common operations.
About the tests, yes I know I need them, but my time is very limited and I tried to do as much as I can. If someone is willing to make some tests they think would be nice I would be gratefull. Also when I post the latest array code I will make a competition to see how fast people can make it go. I know it can be even faster
I will make code available on public SVN (it will be SVN as I already have it internally for all my code) as soon as I find the time to do it.
Sorry for the long reply
Link | March 6th, 2013 at 9:44 am
From Zero To One » Blog Archive » Improved Sliced Array implementation wrote,
[...] you don’t know what sliced array is, I suggest that you first read the previous post, where I presented the sliced array. There I presented the data structure that is similar to [...]
Link | March 13th, 2013 at 8:57 am