Traced execution


I will present in this post a proof of concept of an idea I got nearly a year ago while trying to remote the execution of some code and got tired of the limitations of C# in general and of Linq.Expressions in particular. Run-time reflection in C# 4.0 is limited to type and Linq.Expressions, here I demonstrate a form of reflection that can be performed on a small subset of C# statements.

Basically the idea is to record the execution of the code. If one is careful enough and not too demanding this recording will capture the actual code or at least its important aspects. Practically, the code gets intercepted and diverted to be recorded instead of being simply executed. The recording is performed by intercepting the code execution and by passing around not the result of the normal execution but an encapsulation of the computation. Consider the following assignment:

record.Length = users.Count;

instead of passing an integer across the assignment, we pass a recording of the fact that the Count property of users was accessed; then instead of assigning this to the Length property, we extend the received recording to include the fact that users.Count was assigned to record.Length. This way we have fully recorded this assignment operation.

This process is of course reminiscent of continuation-passing style, but this is somehow the opposite process as we are recording and passing around the previous computation instead of passing forward the next step of the computation. I suppose that I can also safely quote Gilad Bracha here: “Some kind soul will doubtless point out to me how you can view actors  this tracing as monads or some such.

The actual technique to achieve this is to intercept everything by wrapping up all the objects, functions and operators which need intercepting. So this will only work with objects or values the user can wrap. This sounds like a drastic constraint, but it is actually still powerful enough to help with a large class of problems. When it comes to primitive types one has to wrap them up and re-implement all their operators. For the classes we want to capture, all primitive types must be replaced with their wrapped counterparts and the properties and methods of the classes themselves need to be modified.

Limitations

Since there is no way of intercepting assignments to variables, these cannot be traced. Assigning a trace/recording to a variable can be used to store a computation as illustrated in the example below. The same limitation applies to fields. Properties however are just what we need, as they naturally provide the required interception mechanism through their getter and setter methods.

Obviously conditional statements cannot be trivially traced. But there are (somewhat contorted) ways around, for example

Trace.If(root.Enabled);
    // then code
Trace.Else;
    // else code
Trace.End;

One could also use a few methods and lambdas. Most C# conditionals and loops could be implemented this way. Conveniently the most commonly used loop statement in C#, the foreach statement can be fairly easily traced. Obviously there is no problem capturing Linq queries.

Example

I will illustrate this by demonstrating how the following code can be captured:

TracedInt j = 6;

root.Count = j;
root.Count = 1 + 2 + root.Item.Length + root.Count + 4 + 5;

foreach (var i in root.Items) {
    foreach (var k in i.Integers) {
        root.Count += k;
    };
};
root.Items.Add(new Item());
root.Item.Call(root.Item);
var filtered = root.Items.Where(i => i.Length == 0);
root.Items.AddMany(filtered);

The recording, output as a string, gives:

root.Count = 6;
root.Count = 3 + root.Item.Length + root.Count + 4 + 5;
foreach (var item in root.Items) {
    foreach (var i in item.Integers) {
        root.Count = root.Count + i;

    };
};
root.Items.Add(new Item());
root.Item.Call(root.Item);
root.Items.AddMany(root.Items.Where(i => i.Length == 0));

where, as expected, we have lost a few bits during translation, i.e. 1 + 2 has been reduced to 3 as neither of the first constants are implicitly converted to TracedInt. The local variables filtered and j have been lost and the iterator variables have had their names changed. Note that Linq.Expressions would replace both 1 + 2 and 4 + 5 with their sums. Well, thinking about it, this is very similar to what happens with list comprehensions, the same idea of propagating a representation of the computation rather than the result of the computation. Although limited, the set of constructs which can be used, especially with the inclusion of Linq should be sufficient for this technique to be easily and efficiently used in a real world scenario.

Implementation

Let’s have a quick look at the actual implementation. First of all we define the two classes used in the example, Item and Root:

public class Item : Traced {

    public Item() : base() { }

    public TracedList<TracedInt> Integers {
        get { return TraceGetTracedList<TracedList<TracedInt>>("Integers", "i"); }
    }

    public void Call(Item item) {
        TraceMethodCall("Call", item);
    }

    public TracedInt Length {
        get { return TraceGetProperty<TracedInt>("Length"); }
        set { TraceSetProperty("Length", value); }
    }
}
public class Root : Traced {

    public Root() {
        this.Expression = Expression.Variable(this.GetType(), "root");
    }

    public TracedList<Item> Items {
        get { return TraceGetTracedList<TracedList<Item>>("Items", "item"); }
    }

    public TracedList<Item> FilteredItems {
        get { return TraceGetTracedList<TracedList<Item>>("FilteredItems", "item"); }
    }

    public Item Item {
        get { return TraceGetProperty<Item>("Item"); }
        set { TraceSetProperty("Item", value); }
    }

    public TracedInt Count {
        get { return TraceGetProperty<TracedInt>("Count"); }
        set { TraceSetProperty("Count", value); }
    }
}

where the interception code is inserted into every property and method. These make use of the Traced base class:

/// <summary>
/// Base class for all  traced objects
/// Contains a few helper routines
/// </summary>
public class Traced {

    /// <summary>
    /// This one contains the Linq expression associated with this object
    /// e.g. could contain an AST of how the current value has been computed
    /// </summary>
    public Expression Expression { get; set; }

    public Traced() {
        this.Expression = Expression.New(this.GetType());
    }

    protected void TraceMethodCall(string methodName, params Traced[] arguments) {
        Context.Current.Statements.Add(Expression.Call(this.Expression, this.GetType().GetMethod(methodName), arguments.Select(d => d.Expression)));
    }

    protected T TraceMethodCall<T>(string methodName, params Expression[] arguments)  where T : TracedEnumerable, new() {
        return new T() { Expression = Expression.Call(this.Expression, this.GetType().GetMethod(methodName), arguments)};
    }

    protected T TraceGetTracedList<T>(string propertyName, string iteratorVariableName) where T : TracedEnumerable, new() {
        return new T() {
            Expression = Expression.MakeMemberAccess(this.Expression, this.GetType().GetProperty(propertyName)),
            iteratorVariableName = iteratorVariableName,
        };
    }

    protected T TraceGetProperty<T>(string propertyName) where T : Traced, new() {
            return new T() { Expression = Expression.MakeMemberAccess(this.Expression, this.GetType().GetProperty(propertyName))};
    }

    protected void TraceSetProperty<T>(string propertyName, T value) where T : Traced, new() {
        Context.Current.Statements.Add(Expression.Assign(Expression.MakeMemberAccess(this.Expression, this.GetType().GetProperty(propertyName)), value.Expression));
    }
}

The Traced class hosts the Expression field which holds the actual recorded code as a Linq.Expression instance. Note that the name of the iterator variable in the foreach loop is defined by the enumerable property as it cannot be captured.

The integer type is wrapped in the TracedInt class which overrides the required operators:

public class TracedInt : Traced {
    public int value;

    public TracedInt() : base() {
        this.Expression = Expression.Constant(this);
    }

    public TracedInt(int value) : this() {
        this.value = value;
    }

    public static implicit operator int(TracedInt wrapper) {
        return wrapper.value;
    }

    public static implicit operator TracedInt(int value) {
        return new TracedInt(value) { Expression = Expression.Constant(new TracedInt(value)) };
    }

    public static TracedInt operator + (TracedInt lhs, TracedInt rhs) {
        var result = new TracedInt(0);
        result.Expression = Expression.Add(lhs.Expression, rhs.Expression);
        return result;
    }
}

The implicit converters between the int and TracedInt facilitate the conversion from the traced code to normal code and vice-versa.

I also implemented support for lists and foreach loops which can even be nested:

public class TracedEnumerator : Traced {
    public TracedEnumerable enumerable;
    public BlockExpression Block;
    public string iteratorVariableName;
}

public class TracedEnumerator<T> : TracedEnumerator, IEnumerator<T>, IDisposable where T : Traced, new() {

    public TracedEnumerator() { }

    private T iteratorVariable;
    public TracedEnumerator(TracedEnumerable<T> enumerable, string iteratorVariableName) {
        this.iteratorVariableName = iteratorVariableName;
        previousStatements = Context.Current.Statements;
        Context.Current.Statements = new List<Expression>();
        this.enumerable = enumerable;
    }
    private List<Expression> previousStatements;
    public void Dispose() {
        this.Block = Expression.Block(Context.Current.Statements);
        Context.Current.Statements = previousStatements;
        Context.Current.Statements.Add(new ForeachStatement() { Enumerator = this, IteratorVariable = iteratorVariable });
    }

    private bool doneFirst;
    public T Current {
        get {
            doneFirst = true;
            iteratorVariable = new T();
            iteratorVariable.Expression = Expression.Variable(typeof(T), iteratorVariableName);
            return iteratorVariable;
        }
    }

    object IEnumerator.Current {
        get { throw new NotImplementedException(); }
    }

    public bool MoveNext() {
        return !doneFirst;
    }

    public void Reset() {
        throw new NotImplementedException();
    }
}

A new foreach loop is detected when the code calls GetEnumerator which returns a new instance of the TracedEnumerator class. This one returns just one value, the iterator variable which can be referenced in the loop’s body. The end of the loop’s block is detected when the Dispose method is called. Note the use of the custom ForeachStatement class to host the foreach statement.

The actual AST is built using Linq.Expression classes. So in principle the resulting block of code could be compiled. Also one might want to distinguish between recording some code and consuming it. This could be done using the Context class, which also holds the list of recorded statements. This can then be used to extract real values from the traced objects rather than recordings.

public class Context : IDisposable {

    [ThreadStatic]
    private static Context currentContext;

    public static Context Current {
        get { return currentContext; }
    }

    public static Context Traced {
        get {
            currentContext = new Context();
            return currentContext;
        }
    }

    private Context() { }

    public void Dispose() {
        // implement what needs to be done when exiting the using block
    }

    public List<Expression> Statements = new List<Expression>();
}

Type Providers

The overhead of wrapping up these types and modifying their implementation could be avoided either by generating the traceable classes at run-time in a dynamic assembly or potentially by using the new F# type providers.

Applications

So far I have mostly thought of using this for remote execution.  In any multi-tiered system, network latency can lead to extremely slow response times and have a serious impact on the scalability of any system, especially ones based on SQL databases. Consider the case of a loop which iterates over a large number of elements, assuming a round-trip latency of say 20 ms, that would take 20 seconds per thousand elements. This is not something anybody would want to execute in the context of a database transaction. Even if you are using software transactions as described in my posts on the topic, the length of the transaction makes it much more likely to fail to commit. The best solution in these cases is to move the code closer to the data. For a SQL database one could imagine translating the C# code into SQL, LINQ-to-SQL style. But for systems based on software transactions the variables just need to be bound to the database instances and it should be possible to simply compile the LINQ.Expression to get executable code.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Awelon Blue

Thoughts on Programming Experience Design

Joe Duffy's Blog

Adventures in the High-tech Underbelly

Design Matters

Furniture design blog from Design Matters author George Walker

Shavings

A Blog for Woodworkers by Gary Rogowski

Paul Sellers' Blog

– A Lifestyle Woodworker –

randallnatomagan

Woodworking, life and all things between

%d bloggers like this: