Model comparison

A long time overdue post…

One of my preferred slides when presenting my transactional model compares different locking models. I start with the most restrictive model, a single global lock:

model1

where I consider four threads named A,B,C,D. Writes are amber coloured, reads are in blue. Dashed sections represent blocked threads waiting to either write or read. Since there is a single global lock, the execution is serialized, there is no concurrent execution and there is always one active thread and three blocked ones. Long running “transactions” have a direct impact on the system’s throughput.

Things improve “dramatically” when we swap our single global lock for a reader-writer lock:

model2

Hey it’s concurrent now! Well as long as we read we have concurrency, but as soon as we want to write all the reader threads are blocked. But still a huge improvement over having a single global lock. Also the writer thread has to wait for all reader threads to terminate before it can acquire the writer lock. The reader threads cannot be allowed to last too long if we want to have to give the writer thread a chance to update the system. So there are still some serious constraints. On the positive side there is one important property: since only one writer is allowed, with no concurrent reader, there is no need for any transactional mechanism, all writes performed can only be read after the writer lock has been released. So no partial writes can be observed by any reader. Long running “transactions” still have a direct impact on the system’s throughput: long running read transaction prevents writes and vice-versa.

Next comes the versioned system:

model3

which uses a single, global writer lock. As before we can have any number of concurrent readers; the novelty is that since existing versions are strictly immutable, the reader threads can be executed concurrently with a writer thread. So in practice a reader thread can last as long as needed (at the cost of having to keep some versions in memory). We still have a single writer lock, so a thread that wants to perform writes has to wait for the lock to become available. So we are still single threaded in terms of writes. As a consequence there is still no need for any transactional mechanism as partial writes cannot be observed. Long running read-only transactions are possible, we can also have long running write transactions, but at the cost of blocking other would-be writers. The cost of moving to this higher level of concurrency is an increase in memory usage as all accessible versions must be preserved. Note that write transactions cannot fail when we are dealing with essentially single threaded writer systems. Additionally the thread can directly modify the versioned variables, no need to delay the writes to the commit phase. This model can also be fully single threaded in which case we get:

model4

The last model is the one I have described in this blog:

model5b

Here we are adding support for long-running concurrent write transactions. In terms of locking we have the single global version lock which is very short (increment a number and copy it) and the longer per transactional box/variable lock that must be taken for the duration of the validation and update phase. In the above diagram we have thread A waiting on thread D to acquire a box lock and later we have thread C waiting on thread D. We have seen before that this can be sometimes mitigated by partitioning the data.

This last model comes at a cost: registration of reads and writes during the execution of the body of the transaction, validation stage and commit stage. Concurrent writer transactions also introduce the risk of conflicts, but we have seen that leveraging the semantics of data structures can greatly reduce this risk.

Any system with a decent level of concurrency will, most likely have to use at least versioned variables. We must remember that server processors can have up to 18 Intel cores (expensive, $4115  ) or 16 AMD cores (cheaper, $719) You can use dual socket, 8 cores per processor, systems in the cloud, that’s 32 hyper threaded cores at a relatively low cost.

I expect the difference between the single threaded writer model and the concurrent writer model to be, in terms of ease of implementation and in terms of performance to be absolutely massive. However, when dealing with concurrent, long-running transactions (I shall come back to discussing what a transaction can be considered a long-running one) write transactions there is no substitute to the full model.

Runtime scopes

Complexity

The project I am working on at the moment contains approximately 1000 interfaces and 4000 classes. We use Autofac for dependency injection, but not all the project uses dependency injection. Within the solution these types are grouped by folder/namespaces and this helps find related types. The only real organization is the use of one Autofac scope to manage sessions. So the types are grouped in two scopes, the global scope and the session scope.

I was expecting the scope separation between classes to be clearly visible on a diagram that represents the dependencies between types. In fact, producing such a diagram turns out to be much harder than I thought. This is a topic I will discuss further in another post. But I am hoping that a well organized diagram would clearly show a natural grouping of classes based on their dependencies.

My goal here is to somewhat formalize this grouping of classes. For this I am proposing to make use of the concept of scoping which plays an important role in many areas of computer science.

Scopes

Scopes are so pervasive that we use them pretty much unconsciously. In C# we have for example the following scopes:

  • statement block, delineated by curly brackets
  • using statement scope
  • method scope
  • class scope
  • namespace scope

Some characteristics and properties of scopes:

  • Always
  • Nested  or side-by-side
  • Non crossing
  • Isolation
    • Outer scopes have no access to the content of inner scopes
    • Inner scopes might or might not have access to the content of outer scopes
    • It is also good practice for inner scopes not to modify the state of outer scopes
    • Sharing
      • Content of outer scopes might be available to inner scopes
      • Entry/exit
        • Data passed in on entry
        • Data returned on exit
        • Creation on entry
        • Disposal on exit
        • Restoring state on exit
        • Lifetime of object inside scope

Generally speaking, scope is considered to be a source code level concept. Scopes are enforced and resolved at compile time. Some aspects are also important at runtime, e.g. call stack management for method calls, disposal of objects when leaving a using scope. But I should emphasize that scopes are always bound to the source code. So the source code of an inner scope is defined within the code of the outer scope. This characteristic represents a strong constraint and limits the usefulness of lexical scopes.

Runtime scope

By contrast, a runtime scope defines the scoping of types declaratively rather than lexically. As such types defined in different namespaces or assemblies can live in the same scope. Note that runtime scopes are limited to classes and interfaces. Moreover I will only discuss runtime scopes in the context of (constructor) dependency injection. I will stick to Autofac as I am most familiar with it and as it already provides some support for scopes.

Declarative scoping

Every interface must define a scope (for the sake of simplicity lets limit ourselves, for now, to a single scope per interface) using a generic marking interface: interface IScoped<TScope> where TScope : IScope.

Scopes themselves are defined as interfaces that derive from the IScope interface.

Finally, scopes can be either singletons (only one can exist within a given scope at any time) or concurrent. I will consider defining other semantics as required, e.g. one-shot singleton or contiguous singletons. The marking interfaces are ISingleScope and IConcurrentScope.

Naturally scopes themselves are scoped.

With Autofac, one registers an implementation of an interface with a class non-declaratively, e.g. by calling the RegisterType method of the builder instance. This has two disadvantages, firstly as the sharing of the class is defined at the registration site rather than declaratively with the interface or class one has to repeatedly navigate between to the registration site and the interface\class implementations. It also makes the analysis of type dependencies by a tool much harder to perform unless one can use Roslyn. Going for a declarative approach solves these two problems.

Sharing, that is whether an instance is shared or not, or in other words whether an instance is a singleton or not can be defined declaratively using either marking interfaces or attributes. In my first version I use marking interfaces: ISingleton (the easy one) and the more dubious IManifold (please, please do suggest a better name).

Semantics

Let’s have a look at the semantics of runtime scopes.

  1. A class or scope in TScope can only be resolved in a TScope or a sub-scope of TScope.
  2. All injected arguments of all constructors of a class (i.e. all its dependencies) in TScope must be in either TScope or a super-scope of TScope.

Condition (1) means that the content of inner scopes is not available to outer scopes, but the content of outer scopes is visible to inner scopes. This last condition could easily be restricted so as to fully isolate the inner scopes. Condition (2) states that all dependencies must come from the same scope or from an outer scope. Here again we could restrict this to only allow dependencies from the same scope.

New scopes are created by existing scopes. A scope can only create a direct inner scope.

Disposal of scopes

Ever scope can be closed, i.e. disposed of. Thanks to Autofac, every class in the scope will also be disposed of if it implements the IDisposable interface.

Returned results

Just as functions return one or more values on exit, a scope can return one or more values when it is closed. The current mechanism is to instantiate a class in its outer scope using Autofac, so that this shared instance can then be retrieved by the outer scope. I am not really convinced by this mechanism and it doesn’t work well with concurrent scopes. I am investigating other solutions, e.g. continuation scopes or have the closed scope expose the result.

Model specification

Copyright © 2011 Gabriel Zs. K. Horvath

This is my first attempt at giving a more formal definition of my software transactions model. Undoubtedly, a post which will be frequently updated.

Consider a system composed of the following:

  • A single transaction manager holding a unique global version number
  • Transactions
  • Threads
  • Semi-mutable versioned variables managed by the transaction manager
  • Non-versioned transaction-local immutable variables
  • External resources, these can be either read from or written to
  • A garbage collector

The program consists in the execution of transactions in one or more concurrent threads.

Read more of this post

Joining transactions

Copyright 2011 © Gabriel Zs. K. Horvath

Up to now all the transactions have been running in independent threads. There hasn’t been any mention of sub-transactions or the possibility of joining threads.

Sub-transactions

Since all variables within a transaction are immutable, it is meaningless to think in terms of sub-transactions which commit changes. Instead we can have multiple threads running concurrently within a single transaction and merge the recorded write operations of the various threads when joing the threads. Since the threads within the same transaction must still register all the reads and writes independently of the main transaction they are equivalent to normal transactions which happen to share the same version number as the main transaction. So far my model has no concept of sub transactions.

Read more of this post

Garbage collection


Copyright 2010-2011 © Gabriel Zs. K. Horvath

So far all the versions of the versioned variables have been kept in the variables’ history. This is obviously not a viable option and an unnecessary one as most of the versions are likely to become quickly unreachable. Let me first define what a reachable version is. Consider the following versioned variables:

Read more of this post

Contiguous transactions and notifications

Copyright 2009-2011 © Gabriel Zs. K. Horvath

Contiguous transactions

Up to now atomicity has consisted in ensuring that writes to a number of versioned variables were “performed instantly”. The idea is that there is no observable gap between when the variables involved in the transaction are modified. So we have no gap between writes, but what about gaps between transactions? After a transaction has committed one might need to start a new transaction in the state the previous transaction left it in. One could start a new transaction straightaway, hoping to catch the system in the state the previous transaction left it in. But of course there always a risk that another transaction commits in the meantime and modifies the state of the system before our follow-on transaction starts. A contiguous transaction is one which is guaranteed to see the system in the state its parent transaction put it in.

Read more of this post

Composition and silent reads

Copyright 2010-2011 © Gabriel Zs. K. Horvath

So far all the read operations performed in the atomic block were being recorded, so as to be re-executed at commit time. We will see in this post that there are circumstances where one does not want the reads to be recorded. I will call these silent reads.

Composition

One of the most important and powerful concept in software engineering is the one of composition. We want to be able to compose existing data structures together to build new ones. Or we want to add new methods to existing ones without having to perform open heart surgery on that component. So let’s look at the concrete example of trying to implement the Last method on top of the set data structure:

public IEnumerable<T> Last(this IEnumerable<T> that) {
    var enumerator = that.GetEnumerator()
    if (enumerator.MoveNext()) {
        T t = enumerator.Current;
        while (enumerator.MoveNext()) {
            t = enumerator.Current;
        }
        yield t;
    }
}

Read more of this post

Eliminating spurious conflicts

Copyright 2009-2011 © Gabriel Zs. K. Horvath

In my previous post I introduced software transactions with semi-mutable versioned variables. I also gave an example which created spurious conflicts in concurrent transactions. As promised I will demonstrate in this post how these conflicts can be eliminated.

Semantics

Let’s go back to the concurrent debit/credit example of our previous post:

Read more of this post

Software transactions with semi-mutable versioned variables

Copyright 2009-2011 © Gabriel Zs. K. Horvath

In this post I will introduce software transactions with semi-mutable versioned variables.

Introduction

Transactions have been around for a long time. They are typically associated with databases, but also commonly used in other areas such as source control systems and installers. Database transactions are the inspiration of memory transactions, either with hardware support or as software transactional memory. Every transactional system has its own variation; however they all share the fundamental concept of atomicity and provide some level of isolation. Transactions allow the concurrent execution of multiple execution threads while preserving the illusion of serial execution within each thread and preserving consistency.

Transactional systems tend to suffer from spurious conflicts which unnecessary fail transactions. These spurious conflicts are conflicts which have no valid semantic or logical origin. We will see how software transactions can help reduce or sometimes entirely remove these spurious conflicts.

Read more of this post

Awelon Blue

Thoughts on Programming Experience Design

Joe Duffy's Blog

Adventures in the High-tech Underbelly

Design Matters

Furniture design blog from Design Matters author George Walker

Shavings

A Blog for Woodworkers by Gary Rogowski

Paul Sellers' Blog

A Lifestyle Woodworker

randallnatomagan

Woodworking, life and all things between