Array based semi-mutable sets

Persistent sets, on which my semi-mutable sets are based, are usually implemented as trees and are relatively inefficient in both insertion costs and memory usage. They are particularly inefficient when inserting many items in a single transaction as each insert requires the creation of a new path to the newly inserted item. This comes with all the associated memory allocation costs and subsequent garbage collection overhead. This problem can severe when deserializing large sets. Unless of course the data structure supports a merge function.

To illustrate this I have performed some quick and dirty tests on a few .Net data structures:

  • array
  • HashSet
  • FSharpSet

Where the FSharpSet is the only persistent data structure and which I use in my semi-mutable set. The first test consist of populating the data structures with 1’000’000 random longs:

    long[]    12 ms
    long[]   112 ms    and sorted
    HashSet  180 ms    added one at a time
    FSharp  4990 ms    added one at a time

Nothing surprising here, Note that creating the FSharpSet set from an array of longs isn’t faster than adding one item at at a time. A quick look at the source code indicates that the constructor actually adds these values one at a time.

The next tests consist of measuring the enumeration speed of the above structures. 100 enumerations of 1’000’000 elements:

    long[]    820 ms
    HashSet  1400 ms    
    FSharp  14000 ms

Again here, there is no real surprise, except maybe that the HashSet is doing rather well when compared to the array. Note that as the enumeration is executed 100 times in loop, the results might well be different when executed only once due to the caching of the data.

Finally memory usage is again unsurprising:

    long[]    7 MB
    HashSet  37 MB
    FSharp   34 MB

Arrays obviously win, by a factor of four. For small objects such as a long, the difference is massive, with an overhead over the actual data of a factor of 3 for longs. 28 MB of overhead for 1’000’000 elements is about 28 bytes per element, which becomes almost negligible when dealing with large objects.

So, the idea is to combine the advantages of both array and persistent sets in a single structure that minimizes batch insertion time and memory consumption with maximum enumeration speed. We create a persistent set of arrays of values. At its simplest every new batch of values is inserted in the semi-mutable set as an ordered array of values.

Here is a short list of some of the disadvantages of this approach

  • it is expensive to enumerate all the element in order as one will have to continuously hop through all the arrays, unless all the arrays are disjoint.
  • adding or removing a few elements requires the copying of a complete array
  • there is nothing to stop having multiple copies of the same elements in a given sets

However, since the arrays are ordered, it is relatively cheap to enumerate (not in order) all elements in a given range. Though it will still be slower than when working with a fully ordered set as all the arrays have to be searched (unless again they don’t overlap).

In the next post I shall illustrate the implementation and use of this data structure with a fairly realistic example. At that point we will a system with a very large number of relatively small array and we will see how our arrays can be consolidated into fewer, larger ones.

Model comparison

A long time overdue post…

One of my preferred slides when presenting my transactional model compares different locking models. I start with the most restrictive model, a single global lock:

model1

where I consider four threads named A,B,C,D. Writes are amber coloured, reads are in blue. Dashed sections represent blocked threads waiting to either write or read. Since there is a single global lock, the execution is serialized, there is no concurrent execution and there is always one active thread and three blocked ones. Long running “transactions” have a direct impact on the system’s throughput.

Things improve “dramatically” when we swap our single global lock for a reader-writer lock:

model2

Hey it’s concurrent now! Well as long as we read we have concurrency, but as soon as we want to write all the reader threads are blocked. But still a huge improvement over having a single global lock. Also the writer thread has to wait for all reader threads to terminate before it can acquire the writer lock. The reader threads cannot be allowed to last too long if we want to have to give the writer thread a chance to update the system. So there are still some serious constraints. On the positive side there is one important property: since only one writer is allowed, with no concurrent reader, there is no need for any transactional mechanism, all writes performed can only be read after the writer lock has been released. So no partial writes can be observed by any reader. Long running “transactions” still have a direct impact on the system’s throughput: long running read transaction prevents writes and vice-versa.

Next comes the versioned system:

model3

which uses a single, global writer lock. As before we can have any number of concurrent readers; the novelty is that since existing versions are strictly immutable, the reader threads can be executed concurrently with a writer thread. So in practice a reader thread can last as long as needed (at the cost of having to keep some versions in memory). We still have a single writer lock, so a thread that wants to perform writes has to wait for the lock to become available. So we are still single threaded in terms of writes. As a consequence there is still no need for any transactional mechanism as partial writes cannot be observed. Long running read-only transactions are possible, we can also have long running write transactions, but at the cost of blocking other would-be writers. The cost of moving to this higher level of concurrency is an increase in memory usage as all accessible versions must be preserved. Note that write transactions cannot fail when we are dealing with essentially single threaded writer systems. Additionally the thread can directly modify the versioned variables, no need to delay the writes to the commit phase. This model can also be fully single threaded in which case we get:

model4

The last model is the one I have described in this blog:

model5b

Here we are adding support for long-running concurrent write transactions. In terms of locking we have the single global version lock which is very short (increment a number and copy it) and the longer per transactional box/variable lock that must be taken for the duration of the validation and update phase. In the above diagram we have thread A waiting on thread D to acquire a box lock and later we have thread C waiting on thread D. We have seen before that this can be sometimes mitigated by partitioning the data.

This last model comes at a cost: registration of reads and writes during the execution of the body of the transaction, validation stage and commit stage. Concurrent writer transactions also introduce the risk of conflicts, but we have seen that leveraging the semantics of data structures can greatly reduce this risk.

Any system with a decent level of concurrency will, most likely have to use at least versioned variables. We must remember that server processors can have up to 18 Intel cores (expensive, $4115  ) or 16 AMD cores (cheaper, $719) You can use dual socket, 8 cores per processor, systems in the cloud, that’s 32 hyper threaded cores at a relatively low cost.

I expect the difference between the single threaded writer model and the concurrent writer model to be, in terms of ease of implementation and in terms of performance to be absolutely massive. However, when dealing with concurrent, long-running transactions (I shall come back to discussing what a transaction can be considered a long-running one) write transactions there is no substitute to the full model.

Piecewise immutable programming

Copyright © 2011 Gabriel Zs. K. Horvath

Although not a programming language expert at all, I will try to muster all my limited knowledge to briefly discuss software transactions in terms of programming model. Be aware that this is all very tentative…

On the one hand the level of isolation and immutability is very much reminiscent of functional programming, on the other hand the mutability of the versioned variables is essentially imperative programming. To better understand this relationship let’s see, very tentatively, how my model can be used to code in a “pure” imperative way or in a more functional one.

Read more of this post

Awelon Blue

Thoughts on Programming Experience Design

Joe Duffy's Blog

Adventures in the High-tech Underbelly

Design Matters

Furniture design blog from Design Matters author George Walker

Shavings

A Blog for Woodworkers by Gary Rogowski

randallnatomagan

Woodworking, life and all things between