Introduction to REBNF


My previous post was about a year old idea, this post is about a 20 years old one! It is basically a parser and scanner generator. Originally written in Turbo Pascal, then in Delphi Pascal, I eventually ported it to C# 2.0 and 3.0. This project has spent most of its life in a dormant state, and was painfully revived every couple of years.

This scanner and parser generator has the following features:

  • Combines scanner and parser definitions in a single grammar definition
  • Maps grammar rules to C# classes
  • Maps alternatives to C# interfaces
  • EBNF style grammar is restricted to ease mapping to classes
  • Supports repetition with separators
  • Syntax tree is strongly typed
  • Syntax tree contains all source tokens
  • Normally ignored tokens such as whitespaces and comments are also available in the syntax tree
  • Two types of syntax tree walk mechanisms are generated
  • Generates LR1 parsers

Here is a simplified version of REBNF in REBNF:

Grammar = Rule…;
Rule : ConcatenationRule | AlternativesRule;
AlternativesRule = identifier ‘:’ Alternatives ‘;’;
ConcatenationRule = identifier ‘=’ Expression ‘;’;
Expression = Term…;
Term : Option | Sequence | Alternatives;
Option = ‘[‘ Expression ‘]’ [‘!’ | ‘!!’ | ‘!!!’];
Sequence = SimpleFactor [‘/’ | ‘\’ | ‘/\’ SimpleFactor] ‘…’;
Alternatives = SimpleFactor\’|’…;
SimpleFactor : identifier | string;

Apart from the list and exclamation marks elements, this should be fairly self-explanatory. Note that contrary to EBNF, the alternative operator | has the highest precedence in REBNF. Since my grammar is more restrictive than EBNF, I am calling it Restricted EBNF.

The following examples illustrate the list feature:

"A"… matches A, AA, AAA, etc 
"A"\"B"… matches A, ABA, ABABA, etc 
"A"/"B"… matches AB, ABAB, ABABAB, etc 
"A"/\"B"… matches A, AB, ABA, ABAB, etc 

This construct is specifically aimed at repetitions with a separator or an operator. It also deals with the special case of the optional last separator.

The exclamation marks allow to link disparate optional factor:

["A"]! "B" ["C"]! matches B and ABC but won't match AB or BC.
["A"]! ["B"]!! C ["D"]! ["E"]!! matches C, ACD, BCE and ABCDE

I shall give a more detailed definition of REBNF in a future post.

Mapping to classes and interfaces

Production rules are mapped to classes or interfaces. An alternatives production rule (defined with a colon) maps to an interface. All simple factors listed on the right-hand-side must implement that interface. Note that the generated interfaces are empty. I will illustrate how these interfaces can be used in the next post.

A concatenation production rule (defined with an equal sign) maps to a class. Every factor defined on the right-hand-side maps to a property in the generated class. Sequences map to one or two generic lists (second list is for the separators). A constructor is generated for every possible reduction of the production rule.

I will use the following production rule to illustrates this mapping:

Rule = “private”|”public” “class” identifier [“:” identifier\”,”…];

Results in the following class being generated

// Rule = "private"|"public" "class" identifier [":" identifier\","...];
public partial class Rule {
    // Rule = "private" "class" identifier;
    public Rule(@private @private, @class @class, identifier @identifier) {...}

    // Rule = "public" "class" identifier;
    public Rule(@public @public, @class @class, identifier identifier) {...}

    // Rule = "private" "class" identifier ":" identifier\","...;
    public Rule(@private @private, @class @class, identifier identifier, Colon colon, DoubleList<identifier,Coma> listOfIdentifierAndComa) {...}

    // Rule = "public" "class" identifier ":" identifier\","...;
    public Rule(@public @public, @class @class, identifier identifier, Colon colon, DoubleList<identifier,Coma> listOfIdentifierAndComa) {...}

    public @private @private { get; private set; }
    public @public @public  { get; private set; }
    public @class @class  { get; private set; }
    public identifier identifier  { get; private set; }
    public Colon Colon  { get; private set; }
    public List<identifier> identifiers { get; private set; }
    public List<Coma> Comas  { get; private set; }
}

The generator will first generate all the classes and interfaces. These classes are then compiled into a parser. The output of the parser is an instance of the grammar’s root class. The parser itself is LR1 and at every reduction a new instance of the class corresponding to the production is created using the matching constructor. For historical reasons, the current implementation doesn’t fully leverage all the existing features of C# and relies far too heavily on generated source code. I hope to discuss some of the issues and advantages of using generated classes in a future post.

My next post will present a small but complete example and will illustrate how to take advantage of the interfaces associated to an alternative rule.

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Awelon Blue

Thoughts on Programming Experience Design

Joe Duffy's Blog

Adventures in the High-tech Underbelly

Design Matters

Furniture design blog from Design Matters author George Walker

Shavings

A Blog for Woodworkers by Gary Rogowski

randallnatomagan

Woodworking, life and all things between

%d bloggers like this: