Software transactions with semi-mutable versioned variables

May 8, 2011 by Gabriel Horvath Leave a comment

In this post I will introduce software transactions with semi-mutable versioned variables.

Introduction

Transactions have been around for a long time. They are typically associated with databases, but also commonly used in other areas such as source control systems and installers. Database transactions are the inspiration of memory transactions, either with hardware support or as software transactional memory. Every transactional system has its own variation; however they all share the fundamental concept of atomicity and provide some level of isolation. Transactions allow the concurrent execution of multiple execution threads while preserving the illusion of serial execution within each thread and preserving consistency.

Transactional systems tend to suffer from spurious conflicts which unnecessary fail transactions. These spurious conflicts are conflicts which have no valid semantic or logical origin. We will see how software transactions can help reduce or sometimes entirely remove these spurious conflicts.

Versioned variable

A versioned variable is a variable which keeps a history of all its past, committed, values. Whenever a new value is committed the value held in the versioned variable is not overwritten with the new one, but instead is kept in a historical record. Every value is associated with a version number. Think of this version number as a time stamp, or even more simply as the time; that is the time at which the value was created. The following diagram illustrates this with two examples of versioned variables; the first one holds an integer value, the second one a set of integer values:

The integer variable has two versions, the current one (6) with a version number of 3, and a previous one (4) which was committed with a version number of 1. The other versioned variable got its first value with version 2; it was then updated with a new version 3, and again with another version 5. As explained earlier, these version numbers are akin to time, so we can say that at time 1, the integer variable was holding a value of 4, at time 2 it was still holding a value of 4, but since time 3 it holds a value of 6. Similarly for the set variable, at time 2, it holds {1, 2}, at time 3 and 4 it contains {1}, and since time 5 it contains {5, 6}. Most importantly all the versioned data is strictly immutable; the only way a versioned variable can ever be modified is by adding a new version to it. In the time analogy picture, this simply states that the past cannot be changed.

Transactions

A transaction is defined as a set of instructions contained in an atomic block. A new transaction is created at the start of an atomic block. The statements contained in the atomic block are then executed sequentially in the context of this transaction. The transaction will then attempt to commit the changes performed during the execution of the atomic block. A fixed version number, or time stamp, is associated to every transaction. This version number is the value of the global version number when the transaction is created.

Writing to a semi-mutable versioned variable

Contrary to mutable variables, semi-mutable variables are immutable during a transaction. The effect of write operations on a semi-mutable variable is not observable even within the transaction. The writes are simply recorded so they can be executed at commit time. This immutability increases further the level of isolation within a transaction as readers are isolated from even local writes. This results in the immutability of all variables within a transaction.

Reading from a semi-mutable versioned variable

The code executed in the context of the running transaction will access all existing versioned variables using this transaction’s version number. Since this number is fixed and the versioned variable version history is immutable, the reads will always return the same value. To illustrate this point, consider a transaction started with version 2 accessing the versioned variables of our first diagram, a read of the integer box will always return a value of 4, while the set will always return {1,2}. This ensures that the transaction is isolated from other transactions. From the point of view of the code executed in the atomic block and using the time analogy, time has stopped. The resulting level of isolation between transactions is much higher than that observed in more traditional transactional systems and guarantees a consistent view of the system.

Closing a transaction

A transaction can be terminated either by cancelling it or by closing it. If a transaction is simply cancelled, the transaction and all the recorded write operations are discarded. Closing a transaction consists of committing or publishing, i.e. exposing its changes to the outside. The changes consist of the write operations made in the atomic block. But before this can be done, a transaction must be validated.

Validation

Up to now we have implicitly assumed that all input in a transaction come from reads on versioned variables. As explained before, the version of the variable might well have changed, in which case the value returned by the modified variable might be different at validation time from the value returned when the atomic block was executed. If all the read operations return the same value now as they did during the execution of the atomic block, then we can be certain that the writes performed on the variables will be the same. This of course requires the atomic block to be a pure function of the versioned variable. So the validation phase consists in re-executing the read operations and comparing the result of these with the result of the same reads performed during the transaction.

Commit

If the transaction is successfully validated we can commit the changes. This consists in applying the recorded write operations against the current state of the variables, which results in the creation of a new version for all the modified versioned variables.

Before discussing this transactional model any further let’s looks at two special kinds of transaction: read-only and write-only transactions.

Read-only transactions

In this particular case, versioned variables are not modified by the atomic section, they are only read. If no versioned variable is modified, there are no writes to perform in the commit phase, hence the transaction doesn’t even need to be validated! What we gain here is a consistent view of the state of the system, because the transaction is completely isolated from any external change. Consider the case of a bank which wants to know the total balance of all the customer’s accounts. With our transactional system, we would start a new read-only transaction. This transaction would accumulate the balance of all customer accounts. Since the transaction has a consistent view of the system, e.g. it never sees a partial money transfer; the total balance will be exact. To be more accurate, the total balance will be exact for the time at which the transaction started. It will of course soon be invalid, but that’s not a problem, one can just start another transaction. The crucial point about this read-only transaction is that it doesn’t stop any other transaction from carrying on; we didn’t have to block any accounts activity to get a consistent total balance.

Write-only transactions

Write-only transactions don’t read any versioned variable, so they do not depend on the state of the system, hence the validation stage always succeeds and they always commit their changes. In plain English we would say that a write-only transaction doesn’t care about the state of the system, it is a blind write. Crediting your bank account is an example of a write-only transaction; we don’t care how much there is in the account, we only want to add some more. What we have here is basically a last write wins approach to dealing with writes. This is just the natural consequence of our consistency condition.

Mixed read-write transactions

Mixed read-write transactions link the read and write operations; in principle in the same way as cause and effect are linked. Consider the following example where two transactions access concurrently a bank account. Transaction A withdraws 50 from the account if balance is larger than 50 while transaction B concurrently increases balance by 100. In this example, both transactions start with the same version number, but transaction B commits before transaction A. When transaction A is first executed, balance will be 100 and the debit operation will be performed; but when the transaction A reaches the validation stage the value of balance will have changed to 200 and when the read on balance is re-executed, the returned value (200) will be different from what it was before (100). So transaction A will have detected a conflict and it will correctly fail to commit its change to balance.

Although this example illustrates a dependency between a write and a read operation it also contains some code which will cause spurious conflicts. Firstly there is no reason to fail a debit operation on an account because a credit operation is performed concurrently, actually quite the opposite! Secondly there shouldn’t be a need to read the balance in order to credit or debit it. We will see in the next post how the above example can be made to work without introducing any spurious conflict.

IO

So far we have not considered any IO operations. IO operations can be either parameterized read or write operations. Both read and write operations can be either checked or unchecked. Checked reads are re-executed at validation time and will fail the validation stage if they return a result different from the value read during the transaction. Unchecked reads are not validated. For example, prompting a user for input should be implemented as a unchecked read, while an access to the file system could be implemented as a checked one.

Just as for semi-mutable versioned boxes, IO writes are recorded during the execution of the atomic section and only applied at commit time. This solves nicely the missile problem which will only be launched if no conflict is detected. Checked writes require the validation stage to be performed, while unchecked writes do not.

A transaction which contains only reads and unchecked writes will not require to be validated. A transaction which contains only write operations and unchecked reads won’t have to be validated either.

Local variables

Data might also be stored in local variables; these are immutable and bound to a transaction.

Transactions everywhere

My model assumes the transactions everywhere approach, indeed

variables are either versioned or transaction local, there are no other variables.
versioned variables can only be accessed in the context of a transaction
all code runs in the context of a transaction
cross transactions communication can only be made through IO operations or via versioned variables

Filed under immutable, software transactions Tagged with concurrency, shared state, STM, transactions

Software Transactions

Software transactions with semi-mutable versioned variables

Introduction

Versioned variable

Transactions

Writing to a semi-mutable versioned variable

Reading from a semi-mutable versioned variable

Closing a transaction

Validation

Commit

Read-only transactions

Write-only transactions

Mixed read-write transactions

IO

Local variables

Transactions everywhere

Leave a comment Cancel reply

Tags

Categories

Blogs I Follow

Copyright © Gabriel Zs. K. Horvath

Software Transactions

Software transactions with semi-mutable versioned variables

Introduction

Versioned variable

Transactions

Writing to a semi-mutable versioned variable

Reading from a semi-mutable versioned variable

Closing a transaction

Validation

Commit

Read-only transactions

Write-only transactions

Mixed read-write transactions

IO

Local variables

Transactions everywhere

Share this:

Related

Leave a comment Cancel reply

Tags

Categories

Blogs I Follow

Copyright © Gabriel Zs. K. Horvath