A simple expression parser


In my previous post I gave a brief introduction to my parser generator and its grammar definition language REBNF. Here I will give a complete example of a grammar for parsing simple arithmetic expressions:

namespace -> "Example.Expressions"; 

goal = Expressions; 
Expressions = Expression/";"...; 
Expression : Term | Terms; 
Terms = Term\AddOp...; 
AddOp = "+" | "-"; 
Term : Factor | Factors; Factors = Factor\MulOp...; 
MulOp = "*" | "/"; 
Factor : constant | SubExpression; 
SubExpression = "(" Expression ")"; 
decimalDigit :: ['0'-'9']; 
constant == decimalDigit+; 
newLine == U+000D | U+000A | U+000D U+000A | U+2028 | U+2029; 
whitespaceChar :: [' ', U+0009, U+000B, U+000C]; 
whitespace == whitespaceChar+;

ignore -> whitespace; 
ignore -> newLine; 

This grammar contains also lexical definitions for three terminal tokens: constant, newLine and whitespace. The last two lines of the grammar are options which indicate that the whitespace and newLine tokens should be ignored by the parser. The option on the first line defines the namespace to use for the generated classes.

A class is generated for every terminal, here is the one for the constant token:

public partial class constant : ITerminal, IFactor {
    public constant(IToken token) {
        Token = token;
    }
    public IToken Token { get; private set; } }

All terminals implement the following ITerminal interface:

public interface ITerminal {
    IToken Token { get; }
}

So every terminal class exposes the token returned by the lexer. An empty interface for every alternative rule is also generated. These are IExpression, ITerm and IFactor. IExpression is inherited by the ITerm interface and implemented by the Terms class. Similarly for ITerm is inherited by the IFactor interface and implemented by the Factors class. Finally IFactor is implemented by the constant and SubExpression classes. So both constant and SubExpression classes implement all three interfaces.

The big advantage of this grammar is that an expression which consists of a constant only will contain just one instance of the constant class, rather than, as in more usual approaches, of a hierarchy of expression, term, factor and constant tree nodes.

I will now demonstrate how this can be used to implement an Evaluate function for an expression. Firstly I extend the IExpression interface (all generated classes and interfaces are marked as partial) with the following method:

public partial interface IExpression {
    double Evaluate();
}

Now all classes which implement this interface will have to implement this method, so we have the following:

public partial class Terms : ZsK.G5.Syntax.IExpression {
    public double Evaluate() {
        var result = Terms.First().Evaluate();
        foreach (var pair in Terms.Skip(1).Zip(AddOps, (factor, addOp) => new { op = addOp, val = factor.Evaluate() })) {
            if (pair.op.Minus == null) {
                result += pair.val;
            } else {
                result -= pair.val;
            }
        }
        return result;
    }
}
public partial class Factors {
    public double Evaluate() {
        var result = Factors.First().Evaluate();
        foreach (var pair in Factors.Skip(1).Zip(MulOps, (factor, mulOp) => new { op = mulOp, val = factor.Evaluate() })) {
            if (pair.op.Times == null) {
                result /= pair.val;
            } else {
                result *= pair.val;
            }
        }
        return result;
    }
}
public partial class constant : ITerminal, IFactor {
    public double Evaluate() {
        return int.Parse(this.Token.Value);
    }
}
public partial class SubExpression : IFactor {
    public double Evaluate() {
        return this.Expression.Evaluate();
    }
}

The processing on the lists of terms and factors could be improved by using better list processing primitives. Otherwise the code consists mostly in mapping lexical terms to actual values and operations. I think that this example illustrates nicely the advantages of mapping alternative rules to interfaces.

Walking the syntax tree

There are two ways one can walk the syntax tree, in the first case one’s code gets called at every step of the walk, in the other one, one can override specific method of a generated class. Let me start with the first one:

class Walker : WalkContext {
    public override void OnEnter(object node) {
        ...
    }
    public override void OnExit(object node) {
        ...
    }
    public override void OnLeaf(object leaf) {
        ...
    }
}

The following implementation of the OnLeaf method lets you reconstrcut the original source:

public override void OnLeaf(object leaf) {
    if (leaf != null) {
        IToken t = (leaf as dynamic).Token;
        Debug.Write(t.Value);
        if (t.Next != null) {
            if (t.Next.Symbol.Name == "whitespace" || t.Next.Symbol.Name == "newLine") {
                Debug.Write(t.Next.Value);
            }
        }
    }
}

Remember that we directed the grammar to ignore all whitespaces and new lines, so these tokens are not referenced anywhere in the syntax tree, however there are still stored in singly linked list of tokens as illustrated in the code. The following Walker class is also generated where I have only expanded just one of the generated method for illustration purposes:

public class Walker {
    public virtual void OnEnter(object node) {...}
    public virtual void OnExit(object node) {...}
    public virtual void OnLeaf(global::ZsK.G5.Interfaces.ITerminal terminal) {...}
    public virtual void Walk(AddOp rule) {...}
    public virtual void Walk(constant item) {...}
    public virtual void Walk(Expressions rule) {
        this.OnEnter(rule);
        int Expression_i = 0;
        if ((rule.Expression != null)) {
            for (Expression_i = 0; (Expression_i < (rule.Expression.Count - 0)); Expression_i = (Expression_i + 1)) {
                this.Walk(rule.Expression[Expression_i]);
                this.Walk(rule.SemiColon[Expression_i]);
            }
        }
        this.OnExit(rule);
    }
    public virtual void Walk(Factors rule) {...}
    public virtual void Walk(goal rule) {...}
    public virtual void Walk(IExpression item) {...}
    public virtual void Walk(IFactor item) {...}
    public virtual void Walk(ITerm item) {...}
    public virtual void Walk(Minus item) {...}
    public virtual void Walk(MulOp rule) {...}
    public virtual void Walk(newLine item) {...}
    public virtual void Walk(Plus item) {...}
    public virtual void Walk(RoundBra item) {...}
    public virtual void Walk(RoundKet item) {...}
    public virtual void Walk(SemiColon item) {...}
    public virtual void Walk(SubExpression rule) {...}
    public virtual void Walk(Terms rule) {...}
    public virtual void Walk(Times item) {...}
    public virtual void Walk(UpSlash item) {...}
    public virtual void Walk(whitespace item) {...}
}

This approach offers more flexibility than the first option.

I am in the process of writing the C# 4.0 grammar in REBNF. This is a really useful exercise (hopefully it will be more than that) as it is showing up already plenty of problems with the current form of REBNF. So changes are likely to happen in the next version of REBNF. Rev-ing the grammar itself is usually a rather painful process, and I have usually been waiting until I have collected enough annoying issues in the generator itself to move forward. I should specify that the current REBNF parser was generated using the previous version of REBNF, which in turn was generated using its preceding one, etc. The current REBNF stands at version 5.

Not quite sure what the next post on this topic will cover, I might discuss the C# grammar, or give more details about REBNF, or even add some functionality to this post’s expression parser example.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Awelon Blue

Thoughts on Programming Experience Design

Joe Duffy's Blog

Adventures in the High-tech Underbelly

Design Matters

Furniture design blog from Design Matters author George Walker

Shavings

A Blog for Woodworkers by Gary Rogowski

Paul Sellers' Blog

– A Lifestyle Woodworker –

randallnatomagan

Woodworking, life and all things between

%d bloggers like this: