Here is a simple language that we can use to understand the difference between
parsing an operator as left or right associative.

    L = { a, a+a, a+a+a, a+a+a+a, ... }


-----------------------------------------------------------------------------


1.) Here is an ambiguous BNF grammar for language L.


BNF:    expr -> expr '+' expr | 'a'


Using this grammar you can parse the string "a+a+a" two different ways
and you can parse the string "a+a+a+a" three different ways (make sure
you can draw the AST for every one of these parses).


-----------------------------------------------------------------------------


2.) Here is an unambiguous, "right associative", BNF grammar for language L
and two EBNF grammars derived from it. All three of these grammars have
just a single production


BNF:    expr -> 'a' '+' expr | 'a'   // notice that this is right recursive


EBNF:   expr -> 'a' [ '+' expr ]     // notice that this is still right recursive

EBNF:   expr -> 'a' ( '+' 'a' )*     // we removed (or "eliminated") the recursion


Using the BNF grammar, or the first EBNF grammar, we can parse the
string "a+a+a+a" only one way, as if it were "grouped" as a+(a+(a+a)).


Let us look at how the right recursion is removed from the BNF grammar to derive
the second EBNF grammar.

Here is a sequence of right-most derivations of sentential forms from the BNF grammar.

        a + expr
        a + a + expr
        a + a + a + expr
        a + a + a + a + expr
        a + a + a + a + a

Notice how the "expression part" is moving to the right, and in each step
it grows the expression by concatenating the string "+ a" onto what had
been to its left. So that means we can think of growing a string in the
language by starting with the sting "a" and then concatenating on the right
as many "+ a" strings as we like. Hence, the description
       expr -> a ( '+' a )*
which uses iteration (the Kleene star) in place of recursion.

Here is how we can write the getExpr() function that implements a
recognizing parser for this language. We will use the right-recursive
EBNF production (not the iterative EBNF production).

    expr -> 'a' [ '+' expr ]  // this is a right recursive production

void getExpr(tokens)
{
   if (! tokens.match("a") )
      throw new ParseException();
   if ( tokens.hasToken() )
   {
       if (! tokens.match("+") )
          throw new ParseException();
       getExpr(tokens);     // recursion
   }
}

What about writing parser code for the other production, the one
with the Kleene star?

    expr -> 'a' ( '+' 'a' )*

We will consider this production below.

-----------------------------------------------------------------------------


3.) Here is an unambiguous, "left associative", BNF grammar for language L
and three EBNF grammars derived from it. All four of these grammars have
just a single production


BNF:    expr ->  expr '+' 'a' | 'a'  // notice that this is left recursive


EBNF:   expr -> [ expr '+' ] 'a'     // factor out 'a', this is still left recursive

EBNF:   expr -> ( 'a' '+' )* 'a'     // we remove (or "eliminate") the recursion

EBNF:   expr -> 'a' ( '+' 'a' )*     // a better way to eliminate the recursion

Using the BNF grammar, or the first EBNF grammar, we can parse the
string "a+a+a+a" only one way, as if it were "grouped" as ((a+a)+a)+a.

Notice that the two grammars that use the Kleene star don't really tell
us how we should parse a string (we will see below why not).


Notice, from the last two BNF grammars, that, in essence, a left-associative
operator needs to have a left-recursive grammar production that captures the
"big stuff" on the left side of the operator and it captures a "little thing"
on the right side of the operator, e.g.,
       expr -> expr '+' 'a'
and a right-associative operator needs to have a right-recursive grammar
production that captures "big stuff" on the right side of the operator
and a "little thing" on the left side of the operator, e.g.,
       expr -> 'a' '+' expr



Let us look at how the left recursion is removed from the BNF grammar to
derive the second and third EBNF grammars.

Here is a sequence of left-most derivations of sentential forms from the
BNF grammar.

                 expr + a
             expr + a + a
         expr + a + a + a
     expr + a + a + a + a
        a + a + a + a + a

Notice how the "expression part" is moving to the left, and in each step
it grows the expression by concatenating the string "a +" onto what had
been to its right. So that means we can think of growing a string in the
language by starting with the sting "a" and then concatenating on the left
as many "a +" strings as we like. Hence, the description
       expr -> ( 'a' '+' )* 'a'
which uses iteration (the Kleene star) in place of recursion.

But we can just as reasonably look at the final string
      a + a + a + a + a
and say that we grow this string by starting with the string "a" and then
concatenating on the right(!) as many "+ a" strings as we like. Hence, we
can also derive this description of the language.
       expr -> 'a' ( '+' 'a' )*

So the left recursion in this BNF
       expr ->  expr '+' 'a' | 'a'
can be factored out using either this EBNF
       expr -> ( 'a' '+' )* 'a'
or this EBNF
       expr -> 'a' ( '+' 'a' )*
The second form of EBNF is preferable since it is easy to translate it
into a while-loop that parses the language. But the second EBNF is also the
EBNF that we derived from the right-recursive grammar! Since right-recursion
gives us right-associativity, and left-recursion gives us left-associativity,
and the last ENBF can be derived from either the right or left recursive BNF's,
how can the EBNF grammar determine associativity? Well, it doesn't. The
associativity of the operator will be determined by how we write the parser,
not by how we wrote the (EBNF) grammar. Let's look at an example.

Below is a (non-recursive) piece of code that parses this grammar

    expr -> 'a' ( '+' 'a' )*

void getExpr(tokens)
{
   tokens.match("a");
   while ( tokens.hasToken() )   // iteration instead of recursion
   {
       tokens.match("+");
       tokens.match("a");
   }
}

This is a recognizing parser. It doesn't do anything but parse (and
should throw an exception if a stream of tokens doesn't parse).

We need to modify this parser so that it builds a parse tree (or an
abstract syntax tree). To motivate how to modify the above code, let
us consider the example string "a+a+a+a+a+a".

Here is the sequence of (left associative) AST's that we get as we build
up "a+a+a+a+a+a" by starting with "a" and then iterating the concatenation
of "+a" on the right.

string: "a"   "a+a"      "a+a+a"     "a+a+a+a"     "a+a+a+a+a"    "a+a+a+a+a+a"

  tree:  a      +           +            +              +                +
               / \         / \          / \            / \              / \
              a   a       +   a        +   a          +   a            +   a
                         / \          / \            / \              / \
                        a   a        +   a          +   a            +   a
                                    / \            / \              / \
                                   a   a          +   a            +   a
                                                 / \              / \
                                                a   a            +   a
                                                                / \
                                                               a   a

Notice that as we move to the right from string to string, the trees grow in
a very specific way. The next tree in the sequence of trees always has the
previous tree as the left branch of its root.

               next tree  -- >  +
                               / \
                      previous     a
                      tree

This is the hint that we need to write the code that builds these abstract
syntax trees. Be sure to carefully compare this version of getExp() to the
previous version.

Tree getExpr(tokens)
{
   tokens.match("a");
   Tree currentTree = new Tree("a");

   while ( tokens.hasToken() )   // iteration instead of recursion
   {
       tokens.match("+");
       tokens.match("a");
       currentTree = new Tree("+", currentTree, "a")   // left associative
   }
   return currentTree;
}

Follow this code as it parses the string "a+a+a+a" (INPORTANT: really
do follow this code as it parses the string). It parses the string
into a left-associative parse tree.

But now modify the code this way.

Tree getExpr(tokens)
{
   tokens.match("a");
   Tree currentTree = new Tree("a");

   while ( tokens.hasToken() )
   {
       tokens.match("+");
       tokens.match("a");
       currentTree = new Tree("+", "a", currentTree)   // right associative?
   }
   return currentTree;
}


Again, follow this code as it parses the string "a+a+a+a" (INPORTANT: really
do follow this code as it parses the string). Now it parses the string
into (what seems to be) a right-associative parse tree. But there's a problem.


Modify the parser once again (so that it can parse strings with variables
other than "a").

Tree getExpr(tokens)
{
   String tk = tokens.nextToken();
   Tree currentTree = new Tree(tk);

   while ( tokens.hasToken() )
   {
       tokens.match("+");
       tk = tokens.nextToken();
       currentTree = new Tree("+", tk, currentTree)   // right associative
   }
   return currentTree;
}

Now follow the above parser as it parses the string "a+b+c+d". You will
see that it is not really parsing the expression to be right-associative.
It's not even paring the string correctly.

But if you tokenize the string "a+b+c+d" from right-to-left, so the token
stream is
    ["d", "+", "c", "+", "b", "+", "a"]
and then you once again follow the parser as it parses this token stream,
then you should get a correct, right-associative, parse tree.


The last several examples show that the EBNF grammar

       expr -> 'a' ( '+' 'a' )*

DOES NOT determine any associativity for the operator. It doesn't really tell
us how to parse. But we can use the grammar as a guide to implement parsers
for either a left-associative operator or a right-associative operator (but
the right-associative parser needs a right-to-left tokenizer!).

Of course, if we really want a right-associative operator, we should use this
right recursive EBNF grammar

       expr -> 'a' [ '+' expr ]

and write the recursive descent parser for this production, and use a
left-to-right tokenizer.



Question: What does the following EBNF grammar give you? Notice
that this production mixes (left) recursion with iteration.

EBNF:   expr -> ( expr '+' )* 'a'