<?xml version='1.0' encoding='UTF-8'?>
<?xml-stylesheet href='paper/document.xsl' type='text/xsl'?>
<!DOCTYPE document SYSTEM 'paper/document.dtd'>

<!-- examples tested with
     Mac OS X 10.5.3   examples  JSM boot  Mini preprocess 
       Camino 1.5.5       ok         ok       no
       Firefox 3.0        ok         ok       ok
       Mozilla 1.7.13     ok         ok       no       misaligns navigation
                                                       no close confirmation
       Navigator 9.0.0.1  ok         ok       no       no close confirmation
       OmniWeb 5.6        ok         no       no       no close confirmation
       Opera 9.27         ok         ok       ok       no close confirmation
       RealPlayer 11.0.0  -          -        -        cannot handle XML
       Safari 3.1.1       ok         no       no
       SeaMonkey 1.1.9    ok         ok       no       no close confirmation
       
     Windows XP SP2 / VMWare Fusion 1.1.3
       Firefox 3.0        ok         ok       ok
       IE 7.0.5730.13     some       -        -        does not process XML
       Safari 3.0.4       ok         no       no
  -->
<document controller='paper/paper.js' stylesheet='paper/style.css'>
  <title id='title'>
    Monadic Parsing using JavaScript
  </title>

  <author>
    Axel T. Schreiner,
    Department of Computer Science,
    Rochester Institute of Technology
  </author>

  <section>
    <title> Abstract </title>
    
    <p>
      <i>Monad</i> is a class in Haskell which is fundamental to encapsulating side effects. A monadic type can be used, e.g., to maintain and manipulate state alongside another computation or to bypass sequential execution and recover from failure. A significant problem domain is parsing: support for monadic parsers exists for Haskell, Python, and other languages.
    </p>
    <p>
      This web page describes monadic LL(n) parsing with JavaScript, complete with a base class for monadic classes which wrap state functions, a notation to embed monadic computations in JavaScript (i.e., an equivalent to the <tt>do</tt> notation in Haskell), a preprocessor to translate the notation into JavaScript, a scanner generator based on regular expressions, a factory for classes to represent parse trees, and the implementation of a little language with exception handling as another example of a monadic computation. The preprocessor is implemented using the monadic parser which it supports.
    </p>
  </section>

  <section>
    <title id='s-technical'> Technical Information </title>
    
    <p>
      All source code in this web page is live and can be edited and executed interactively. A number of examples have been set up for execution in this web page. They are usually followed by suggestions for further investigations which involve editing and re-executing the examples. The page has been tested with numerous browsers in Mac OS X and Windows XP -- only Internet Explorer had issues with preprocessing the examples. (Details are recorded in the comments in this page.)
    </p>
    <p>
      The source code is organized in namespaces, i.e., global collections; each editable area is labelled with the namespace it belongs to and has a reset button to restore the original content. Two more pages are opened behind this page which contain the remaining code of two namespaces (<tt>JSM</tt>, the preprocessor for the monadic notation, and <tt>Mini</tt>, the implementation of a small programming language) of which only excerpts are presented in this web page. Using the links in <ref value='Namespaces'/> below the original content of each namespace can be displayed on a separate web page. 
    </p>
    <p>
      Monadic notation needs to be translated to Javascript before it can be executed; this is accomplished using the preprocess button following each editable area with monadic notation. Any preprocess button can be (ab-)used to translate arbitrary code inserted into the preceding editable area; the reset button will restore the original content. 
    </p>
    <p>
      A read-only (preprocess or output) area can be expanded or contracted by clicking into the area. The following buttons can be used preprocess all code (except where the preprocessed source is part of the web page to begin with), execute all JavaScript examples, interpret all little language examples, and expand the size of all read-only areas.
    </p>
    <p><all/></p>
  </section>
  
  <chapter>
    <title id='c-introduction'> Introduction </title>
    
    <p>
      In Haskell <ref value='r-Haskell'/> a data type is a set of values. A data type can be defined as an instance of one or more classes; a class declares polymorphic operators and methods which a type instance must implement. <i>Monad</i> is a class with operations which control sequential execution; the <i>do</i> notation is a very convenient way to express computations in a seemingly imperative style as long as they involve values from types which are instances of <i>Monad</i>. Implementing the <i>Monad</i> operations makes it possible to abandon sequential execution, e.g., there can be a reaction to failure, and state can be maintained separately from sequential execution, e.g., to trace operations, support a form of assignment, or perform input and output. Specifically, Haskell uses the <i>IO</i> monad to separate stateful input/output from stateless, "pure" functional programming.
    </p>
    <p>
      Many imperative languages such as C# <ref value='r-CSharp'/>, Groovy <ref value='r-Groovy'/>, Java <ref value='r-Java-Closures'/>, JavaScript <ref value='r-JavaScript'/>, Python <ref value='r-Python'/>, Ruby <ref value='r-Ruby'/>, and of course all variants of Lisp <ref value='r-Scheme'/>, support functions as first-order values. Some of these languages, e.g., JavaScript and Lisp, are dynamically typed, others, like Haskell or C#, support some form of type inference; either variant of typing greatly simplifies the use of functions. However, but for Haskell, most languages do not integrate monadic types with special language constructs.
    </p>
    <p>
      If a programming language supports assignment, input/output, and some kind of global storage, there is no pressing need for monads because state can be maintained rather explicitly. However, this web page shows that monads can be created given only functions as first-order values. The web page illustrates that -- at least in the area of parsing -- monads are a very useful way of structuring computation. Specifically, this web page shows how to program recursive descent parsers directly from LL(n) grammars based on an infrastructure which requires only regular expressions and functions as first-order values. It also extends JavaScript with a notation similar to Haskell's <i>do</i> notation which can be viewed as the input language for a parser generator, but which can be used for other monadic computations as well. There is a short discussion how the notation is used to convert itself into JavaScript, i.e., how the parser generator is implemented. Finally, there is an implementation of an interpreter for a very small, conventional programming language which serves as an example for other monadic computations.
    </p>
    <p>
      The web page provides tools and examples for those who need to implement small languages using JavaScript, and it suggests by example how the tools can be quickly implemented in other languages which support regular expressions and functions as first-order values. 
    </p>
  </chapter>
  
  <chapter>
    <title id='c-monad'> What's in a Monad? </title>
    <p>
      This section describes <tt>Monad</tt>, a base class for monadic classes in JavaScript. For the system described here a monadic value is an object which contains a state function:
    </p>
    
    <javascript namespace='Monad' seq='a'><![CDATA[
      function Monad (stateFunction) { this.stateFunction = stateFunction; }
    ]]></javascript>
    
    <p id='apply'>
      Given a monadic value, its state function can be applied to a state value:
    </p>

    <javascript namespace='Monad' seq='b'><![CDATA[
      Monad.prototype.apply = function (state) {
        return this.stateFunction(state);
      };
    ]]></javascript>
  
    <p>
      Unlike Haskell, JavaScript is a dynamically typed language, i.e., neither the type of <tt>state</tt> nor the return type of the state function have to be specified; therefore, <tt>Monad</tt> can provide the functionality of all state-function-based monadic classes. By convention in this system, on success a state function will return a collection with two properties, the new <tt>state</tt> and the <tt>value</tt> encapsulated by the monadic value which contains the state function; on failure a state function will return a collection with a <tt>fail</tt> property, e.g., with an error message. With this convention <tt>Monad</tt> combines the capabilities of Haskell's <i>Either</i> and state-function-based monads.
    </p>

    <p id='orElse'>
      Given two monadic values, <tt>orElse</tt> creates a new monadic value in the same class as the receiver. The new value contains a state function which will apply the receiver's state function or -- only in case of failure -- the state function contained in the argument value for <tt>orElse</tt>:
    </p>

    <javascript namespace='Monad' seq='c'><![CDATA[
      Monad.prototype.orElse = function (b) {
        var a = this;                          // for closure
        return new this.constructor(           // receiver's class
          function (state) {
            var result = a.apply(state);
            return 'fail' in result ? b.apply(state) : result;
          }
        );
      };
    ]]></javascript>

    <p>
      Loosely speaking, monadic values "are" state functions and <tt>orElse</tt> combines them for alternative execution. It should be noted that either state function is applied to the same incoming state. In the terminology of Haskell, <tt>orElse</tt> is the <i>mplus</i> operation of the <i>MonadPlus</i> class.
    </p>
  
    <p id='andThen'>
      Intuitively, the <i>bind</i> operation, denoted as <tt>>>=</tt> in Haskell's <i>Monad</i> class, combines two monadic values, i.e., state functions, for sequential execution (and creates a scope for a result value). However, sequential execution has to be controllable: it must be guaranteed that the first state function is executed first, and it should be possible to suppress executing the second state function if the first one fails. Additionally, and unlike <tt>orElse</tt>, if both state functions succeed the final state and value should depend on both functions.
    </p>
  
    <p>
      Therefore, the method <tt>andThen</tt> does not combine two monadic values directly. Instead, it accepts as an argument a function which is expected to accept a <tt>value</tt> property produced by the receiver's state function and return the monadic value which <tt>andThen</tt> is to combine with the receiver:
    </p>

    <javascript namespace='Monad' seq='d'><![CDATA[
      Monad.prototype.andThen = function (b) {
        var a = this;                           // for closure
        return new this.constructor(            // receiver's class
          function (state) {
            var result = a.apply(state);
            return 'fail' in result ? result
              : b(result.value).apply(result.state);
          }
        );  
      };
    ]]></javascript>

    <p>
      The method <tt>andThen</tt> returns a monadic value in the same class as the receiver containing a state function which first applies the receiver's state function. If successful, the <tt>result.value</tt> is used to produce the second monadic value and its state function is applied to the <tt>result.state</tt>. Loosely speaking the final value results from composing both state functions, the incoming state is sent only to the first state function, and its result state is sent to the second state function.
    </p>
    
    <p id='subclass'>
      <tt>Monad.subclass</tt> is a convenience function to create constructors for new monadic classes which inherit the methods described thus far. (Strictly speaking, there is no inheritance among classes in JavaScript; however, this web page uses the terminology of Java. In this parlance <tt>Monad.subclass</tt> is a class method -- which is not inherited by the new monadic classes.) 
    </p>
    
    <javascript namespace='Monad' seq='ea'><![CDATA[
      Monad.subclass = function () {
        var result = function () {         // the new constructor
          Monad.call(this, arguments[0]);  //   chained to Monad's constructor
        };

        result.prototype = new Monad();
        delete result.prototype.stateFunction;
        result.prototype.constructor = result;
    ]]></javascript><javascript namespace='Monad' seq='ez'><![CDATA[
        // ... method definitions ...
        return result;
      };
    ]]></javascript>

    <p>
      <tt>Monad</tt> is the superclass of all monadic classes. Therefore, the new constructor <tt>result</tt> is chained to the <tt>Monad</tt> constructor and a <tt>Monad</tt> object is set up as the <tt>prototype</tt> of the new class. The purpose of the <tt>prototype</tt> is to inherit <tt>andThen</tt>, <tt>apply</tt> and <tt>orElse</tt>; therefore, the <tt>stateFunction</tt> is deleted from the <tt>prototype</tt>. Finally <tt>result</tt> is added as <tt>constructor</tt> to the <tt>prototype</tt> and returned.
    </p>
    <p id='succeed'>
      A monadic class should contain some monadic values. Therefore, <tt>subclass</tt> creates a few class methods for the new class which create monadic values. <tt>succeed</tt> creates an instance of the new class for its argument value, i.e., it returns a monadic value containing a state function which will return a collection with the argument value of <tt>succeed</tt> and the incoming state:  
    </p>
    
    <javascript namespace='Monad' seq='eb'><![CDATA[
        result.succeed = function (value) {
          return new result(
            function (state) {
              return {value: value, state: state, toString: result.dump };
            }
          );
        };
    ]]></javascript>

    <p id='dump'>
      <tt>dump</tt> serializes a collection and is connected as <tt>toString</tt> method for the result of the state function defined for <tt>succeed</tt>, but it can also be used as a class method to display its argument:
    </p>
       
    <javascript namespace='Monad' seq='ec'><![CDATA[
        result.dump = function dump () {
          var s = '',
            arg = arguments && arguments.length ? arguments[0] : this;
          if (arg == null) return 'null';
          if (typeof arg == 'undefined') return 'undefined';
          if (typeof arg != 'object') return arg.toString();
          for (var key in arg)
            if (arg.hasOwnProperty(key) && key != 'toString')
              s += ', '+key+': '+dump(arg[key]);
          return s ? '{'+s.substring(2)+'}' : '';
        };
    ]]></javascript>

    <p id='fail'> 
      The class method <tt>fail</tt> creates a monadic value with a state function which will return a collection with the argument of <tt>fail</tt>:
    </p>

    <javascript namespace='Monad' seq='ed'><![CDATA[
        result.fail = function (message) {
          return new result(
            function (state) { return { fail: message, toString: result.dump }; }
          );
        };
    ]]></javascript>

    <p>
      <tt>succeed</tt> and <tt>fail</tt> are the <i>return</i> and <i>fail</i> operations required by Haskell's <i>Monad</i> class. <tt>fail</tt> is also the operation <i>mzero</i> of <i>MonadPlus</i>.
    </p>
    <p id='get'>
      Finally, two more kinds of monadic values will turn out to be useful and are defined as value of a class variable and results of a class method, respectively. <tt>get</tt> is a monadic value containing a state function which returns the incoming state as both, <tt>value</tt> and <tt>state</tt>; it is used to expose the current state within a monadic computation:
    </p>

    <javascript namespace='Monad' seq='ee'><![CDATA[
        result.get =
          new result(
            function (state) {
              return { value: state, state: state, toString: result.dump };
            }
          );
    ]]></javascript>

    <p id='put'>
      <tt>put</tt> returns a monadic value containing a state function which ignores the incoming state and returns the arguments of <tt>put</tt> as the new <tt>value</tt> and <tt>state</tt>; it is used to set the current state from within a monadic computation:
    </p>

    <javascript namespace='Monad' seq='ef'><![CDATA[
        result.put = function (value, state) {
          return new result(
            function () {
              return { value: value, state: state, toString: result.dump };
            }
          );
        };
    ]]></javascript>

    <p>
      To simplify debugging, all result values are connected to <tt>dump</tt>.
    </p>
  </chapter>
  
  <chapter>
    <title id='c-axioms'> Monad axioms </title>
    <p>
      Operations of <i>Monad</i> and <i>MonadPlus</i> should satisfy certain axioms which can now be tested using <tt>Monad</tt>. The tests are cumulative and can be edited and executed below. With a stand-alone implementation of JavaScript such as <i>Rhino</i> <ref value='r-Rhino'/> or <i>SpiderMonkey</i> <ref value='r-Spidermonkey'/> the examples can be executed interactively once the <tt>Monad</tt> definition from the previous section is loaded.
    </p>
    <p>
      <tt>Axioms</tt> is a monadic class and <tt>m</tt> is a monadic value for which the axioms will be tested; <tt>Axioms</tt> is also used as a namespace to avoid global clutter:
    </p>

    <javascript namespace='Axioms' out='out'><![CDATA[
      var Axioms = Monad.subclass();
      with (Axioms) {
        Axioms.m = succeed('hello');
        dump(m);
      }
    ]]></javascript>
    
    <p>
      Used as a class method above, <tt>dump</tt> shows that <tt>m</tt> contains a state function which will return the argument <tt>'hello'</tt> used to create <tt>m</tt> and the incoming state. JavaScript cannot show that the free variable <tt>value</tt> of the state function has in fact closed over the argument given to <tt>succeed</tt> when <tt>m</tt> was created.
    </p>
    <p>
      Implicitly used to convert to a string below, <tt>dump</tt> shows that the result of applying the state function contained in <tt>m</tt> combines <tt>'hello'</tt> and the incoming state <tt>'s'</tt>.
    </p>
    
    <javascript namespace='Axioms' out='out'><![CDATA[
      with (Axioms) {
        m.apply('s');
      }
    ]]></javascript>

    <eg>
      <tt>fail('message')</tt>, <tt>get</tt>, and <tt>put('value',</tt> <tt>'state')</tt> can be used to obtain other monadic values. Change <tt>m</tt> above, or introduce additional variables, and examine the results above and when testing the axioms below.
    </eg>
    
    <p>
      Loosely speaking, if one considers monadic values as a set with an operation <tt>andThen</tt> the function <tt>succeed</tt> has to act as a right unit:
    </p>

    <javascript namespace='Axioms' out='out'><![CDATA[
      with (Axioms) {
        m.andThen(succeed).apply('s');
      }
    ]]></javascript>
    
    <p>
      This shows that the two monadic values <tt>m</tt> and <tt>m.andThen(succeed)</tt> exhibit the same behavior. However, they contain very different state functions:
    </p>

    <javascript namespace='Axioms' out='out'><![CDATA[
      with (Axioms) {
        dump(m.andThen(succeed));
      }
    ]]></javascript>
    
    <eg>
      Note that the results of the two invocations of <tt>apply</tt> equal each other, independent of which monadic value you bind to <tt>m</tt> because <tt>succeed</tt> acts as a right unit for <tt>andThen</tt>. Also note that the state function immediately above does not change because it is the result of <tt>andThen</tt> itself.
    </eg>
    <p>
      Similarly, a monadic value constructed with <tt>succeed</tt> acts as a left unit. Given some function <tt>f</tt> which returns a monadic value and some argument <tt>'hello'</tt>:
    </p>

    <javascript namespace='Axioms' out='out'><![CDATA[
      with (Axioms) {
        Axioms.f = function (v) { return get; }
        f('hello').apply('world!');
      }
    ]]></javascript>

    <p>
      The monadic value <tt>f('hello')</tt> exhibits the same behavior above as the monadic value resulting from combining <tt>succeed('hello').andThen(f)</tt> below:
    </p>

    <javascript namespace='Axioms' out='out'><![CDATA[
      with (Axioms) {
        succeed('hello').andThen(f).apply('world!');
      }
    ]]></javascript>

    <eg>
      Again, the monadic value in the definition of <tt>f</tt> above can be changed to check the behavior for another function. Define <tt>f</tt> so that the result involves <tt>hello</tt> <tt> world!</tt>.
    </eg>
    <p>
      The third axiom requires <tt>andThen</tt> to be an associative operation. Continuing the example here is an illustration which involves a monadic value constructed with <tt>succeed</tt>:
    </p>
    
    <javascript namespace='Axioms' out='out'><![CDATA[
      with (Axioms) {
        Axioms.g = function (v) { return succeed(v+' world!'); }
        m.andThen(f).andThen(g).apply('hello');
      }
    ]]></javascript>

    <p>
      The original definition of <tt>f</tt> uses <tt>get</tt> which copies the incoming state <tt>'hello'</tt> as value and <tt>g</tt> uses <tt>succeed</tt> to append <tt>' world!'</tt> to it. The chain above is executed from left to right, but the functions can be combined to change this:
    </p>
      
    <javascript namespace='Axioms' out='out'><![CDATA[
      with (Axioms) {
        m.andThen(function (x) { return f(x).andThen(g); }).apply('hello');
      }
    ]]></javascript>
      
    <p>
       The example shows that (at least for these specific monadic values and functions) the behavior does not depend on whether <tt>m</tt> is first combined with <tt>g</tt> and the result is combined with <tt>f</tt> (association from the left), or <tt>m</tt> is combined with the result of combining <tt>g</tt> with <tt>f</tt> (association from the right).
    </p>
    <eg>
      Change the definitions of <tt>m</tt>, <tt>f</tt>, and <tt>g</tt> to investigate other cases for the third axiom.
    </eg>
    
    <p>
      <i>MonadPlus</i> operations are also expected to satisfy certain axioms: <i>mzero</i> should act as a right and left zero in combinations with <tt>>>=</tt> and <i>mplus</i>, i.e., <tt>fail</tt>, <tt>andThen</tt> and <tt>orElse</tt> should act just like <i>false</i> combined with preemptive <i>and</i> and <i>or</i> operations in a programming language.
    </p>
    <p>
      <i>x</i> and <i>false</i> is <i>false</i>:
    </p>

    <javascript namespace='Axioms' out='out'><![CDATA[
      with (Axioms) {
        get.andThen(function (v) { return fail('fail'); }).apply('hello');
      }
    ]]></javascript>

    <p>
      <i>x</i> or <i>false</i> is <i>x</i>:
    </p>

    <javascript namespace='Axioms' out='out'><![CDATA[
      with (Axioms) {
        get.orElse(fail('fail')).apply('hello');
      }
    ]]></javascript>

    <p>
      <i>false</i> and <i>x</i> is <i>false</i>:
    </p>

    <javascript namespace='Axioms' out='out'><![CDATA[
      with (Axioms) {
        fail('fail').andThen(g).apply('hello');
      }
    ]]></javascript>

    <p>
      <i>false</i> or <i>x</i> is <i>x</i>:
    </p>

    <javascript namespace='Axioms' out='out'><![CDATA[
      with (Axioms) {
        fail('fail').orElse(get).apply('hello');
      }
    ]]></javascript>
    
    <eg>
      Introduce a variable <tt>x</tt> in each axiom and bind different monadic values to <tt>x</tt> to confirm the axioms for other values. Change the function <tt>g</tt> and check again.
    </eg>
    <p>
      It should be stressed that the examples do not <i>prove</i> that the methods and values inherited from <tt>Monad</tt> satisfy the axioms. The examples just illustrate that the axioms are observed for a specific monadic value and for the function <tt>g</tt> defined earlier; many values and functions can be inserted to check again.
    </p>
  </chapter>
  
  <chapter>
    <title id='c-parser'> What's in a Parser? </title>

    <p>
      Ideally, a parser examines an input string and produces a value, e.g., the input string might contain an arithmetic expression and the value is a tree representing the expression, or even the value of the expression. A parser is often unable to complete the task -- it might understand just an initial part of the input or even fail completely. Therefore, it makes sense to require that a parser function accept input and return a value and the remaining input if successful, and a string with an error message if not. In other words a parser function is a state function and the state is the input to be examined.
    </p>
    <p>
      This section describes a class <tt>Parser</tt> with monadic values containing parser functions:
    </p>
    
    <javascript namespace='Parser' seq='a'><![CDATA[
      var Parser = Monad.subclass();
    ]]></javascript>
  
    <p>
      This section owes much to Hutton's excellent book <ref value='r-Hutton'/>. He starts with parsers which accept a single character or even nothing at all and builds up to parsers which accept numbers or identifiers optionally surrounded by white space. However, JavaScript supports regular expressions which simplify the description of low-level building blocks for complex parsers. For example, the following collection suffices to describe the pieces which can be combined to form arithmetic expressions:
    </p>
  
    <javascript namespace='arithmetic'><![CDATA[
      var arithmetic = {
        skip:    /\s+/,     // white space
        number:  /[0-9]+/,  // digit sequence
        symbol:  /./,       // operators, parentheses, etc.
        eof:     /$/        // end of file
      };
    ]]></javascript>
  
    <p>
      A collection with a few more properties can already describe a small programming language:
    </p>
  
    <javascript namespace='language'><![CDATA[
      var language = {
            skip:    /^(\s|\/\/.*)+/,            // space and comments
            number:  /^[0-9]+(\.[0-9]*)?/,       // decimal value
            symbol:  /^(<=|>=|<>|.)/,            // operators and comparison
            word:    /^[a-zA-Z_][a-zA-Z_0-9]*/,  // identifier et al.
            quoted:  /^"([^"\\\n]|\\.|\\\n)*"/,  // Java string
            eof:     /^$/                        // end of file
          };
      language.word.reserved = ['if', 'else', 'while'];  // reserved words
    ]]></javascript>

    <p>
      The property <tt>skip</tt> has special significance as discussed below; all other property names can be chosen at will. Every property has a regular expression as a value and all but <tt>skip</tt> can themselves have a property <tt>reserved</tt> with a list of exceptions to indicate matches with a special meaning. A parser always examines the initial part of the input; therefore, the regular expressions are more efficient if they are anchored with <tt>^</tt>.
    </p>
    <p>
      <tt>reserved</tt> allows to create parsers for specific symbols and for all other symbols matching a pattern. For example, many programming languages use the same conventions for user-defined identifiers and language-specific reserved words. Therefore, the regular expression specified for a property like <tt>word</tt> is assumed to match both kinds of input. If a match of <tt>word</tt> is contained in <tt>word.reserved</tt> it is accepted only by a parser generated for a reserved word, not by the parser which accepts anything else matched by the regular expression specified for <tt>word</tt>.  
    </p>
    
    <p id='Parser.Factory'>
      <tt>Parser.Factory</tt> is a class of objects which can produce monadic <tt>Parser</tt> values. A factory object is constructed with a collection of properties as described above. Optionally, a trace flag can be specified:
    </p>
    
    <javascript namespace='Parser' seq='b'><![CDATA[
      Parser.Factory = function (table, trace) {
        var self = this;                          // for closure
        if (table.skip)                           // create skip parser
          self.skip = new Parser(
            function (input) {
              return self.scan(input, table.skip, 'skip');
            });
    ]]></javascript><javascript namespace='Parser' seq='d'><![CDATA[
        // create other parsers if any...
      };
    ]]></javascript>
    
    <p>
      The actual analysis of input, i.e., the body of the parser function, can be delegated to a common method <tt>scan</tt> which accepts input and a regular expression and returns a collection with the matched text as <tt>value</tt> and the remaining input as <tt>state</tt>:
    </p>
    
    <javascript namespace='Parser' seq='e'><![CDATA[
      Parser.Factory.prototype.scan = function (input, re, e) {
        if (typeof input == 'string')                          // initialize line count
          input = {lno: 1, text: input};
        var m = re.exec(input.text);
        if (!m || m.index)
          return this.error(input, 'expecting "'+e+'"');
        var nls = m[0].match(/\n/g);                           // # newlines in match
        return {value: m[0],
                state: {lno: input.lno + (nls ? nls.length : 0),
                        text: input.text.substring(m[0].length)},
                toString: Parser.dump};
      };
    ]]></javascript>

    <p>
      The original input can be a string, but the remaining input will be a collection with the remaining string as <tt>text</tt> and the line number where this string begins as <tt>lno</tt>. If there is no match, or if the match is not at the beginning of the input, the result is an error message which includes the current line number in the input:
    </p>

    <javascript namespace='Parser' seq='f'><![CDATA[
      Parser.Factory.prototype.error = function (input, message) {
        return { fail: '('+input.lno+') '+message, toString: Parser.dump };
      };
    ]]></javascript>

    <p>
      Either result is connected to <tt>dump</tt> to simplify debugging, for example:
    </p>

    <javascript namespace='Scanners' out='out'><![CDATA[
      var Scanners = {};  // namespace
      with (Scanners) {
        Scanners.af = new Parser.Factory(arithmetic);
        af.skip.apply(' \n foo bar');
      }
    ]]></javascript>
    
    <eg>
      The pattern <tt>arithmetic.skip</tt> matches one or more white space characters. Remove the leading white space from the input to see an error message when <tt>skip</tt> fails. Note the different line numbers.
    </eg>
    <eg>
      Change <tt>arithmetic</tt> so that a newline character is recognized as a <tt>symbol</tt> rather than white space, i.e., the example above should fail as soon as the single leading space is removed from the input. Apply <tt>symbol()</tt> to recognize the leading newline.
    </eg>
    <p>
      The factory instance variable <tt>skip</tt> references a parser. For every other property in the collection a method is added to the factory which will create a parser. If there is a <tt>skip</tt> property, these parsers will discard a match before they consider the rest of their input: 
    </p>

    <javascript namespace='Parser' seq='c'><![CDATA[
        for (var p in table)                // all other properties
          if (p != 'skip') {           
            self[p] = (function (p) {       // property name for closure
              return function (expected) {  // generator method
                if (!expected || !self[p].reserved
                    || expected in self[p].reserved)
                  return new Parser(        // created by generator method
                    function (input) {      // parser function
                      if (self.skip) {      // discard match of skip if any
                        var skipped = self.skip.apply(input);
                        if (!('fail' in skipped))
                          input = skipped.state;
                      }                     // match property's expression
                      var result = self.scan(input, table[p], p);
                      if (!('fail' in result))
                        if (expected) {     // must equal argument if any
                          if (result.value != expected)
                            result = self.error(input, 'expecting "'+expected+'"');
                        } else              // or must not be in reserved list if any
                          if (self[p].reserved && result.value in self[p].reserved)
                            result = self.error(input, '"'+result.value+'" is reserved');
                      if (trace && !('fail' in result))
                        print('('+result.state.lno+') '+p+': '+result.value);
                      return result;
                    });
                else                        // argument must be in reserved list if any
                  throw new Error('"'+expected+'" is not reserved');
              };
            })(p);
            if (table[p].reserved) {        // create reserved list
              self[p].reserved = {};
              for (var n = 0; n < table[p].reserved.length; ++ n)
                self[p].reserved[table[p].reserved[n]] = true;
            }
          }
    ]]></javascript>

    <p>
      For example, the parser factory created from the <tt>arithmetic</tt> collection is sufficient to deal with arithmetic expressions:
    </p>

    <javascript namespace='Scanners' out='out'><![CDATA[
      with (Scanners) {
        Scanners.a = af.number().apply(' 12 + 34 ');
        Scanners.b = af.symbol('+').apply(a.state);
        af.number(34).apply(b.state);
      }
    ]]></javascript>
    
    <eg>
      Each parser picks one item from the input; invoke <tt>print</tt> to display the successive values above.
    </eg>
    <p>
      If the <tt>scan</tt> method finds a match the result value can be filtered with an argument specified to the generator method.
    </p>
    <eg>
      Change the argument of <tt>number</tt> to see the third parser fail. Then change the original input so that the parser succeeds again.
    </eg>
    <p>
      If there is a <tt>reserved</tt> list for a property the corresponding method creates a parser only if the argument, if any, is in the list. The <tt>language</tt> collection includes reserved words:
    </p>

    <javascript namespace='Scanners' out='out'><![CDATA[
      with (Scanners) {
        Scanners.lf = new Parser.Factory(language);
        Scanners.lf.identifier = lf.word();
        Scanners.lf.number = lf.number();
        Scanners.lf.ifToken = lf.word('if');
        Scanners.lf.emptyString = lf.quoted('""');
        
        Scanners.a = lf.ifToken.apply(' if a // comment \n "" + 10 ');
        Scanners.b = lf.identifier.apply(a.state);
        Scanners.c = lf.emptyString.apply(b.state);
        Scanners.d = lf.symbol('+').apply(c.state);
        Scanners.e = lf.number.apply(d.state);
        lf.eof().apply(e.state);
      }
    ]]></javascript>
    
    <eg>
      Again, invoke <tt>print</tt> to display how the successive items in the original input are accepted; i.e., print <tt>a</tt>, <tt>b</tt>, <tt>c</tt>, <tt>d</tt>, and <tt>e</tt>.
    </eg>
    <p>
      Parsers are objects which can be applied to accept input. Parsers can be created by generator methods of a parser factory. The example shows that frequently used parsers can be saved, e.g., as instance variables of the parser factory; the instance variable can even replace the generator method.
    </p>
    <p>
      If there is no <tt>reserved</tt> list for a property, a parser can be generated without an argument to accept any input matching the corresponding regular expression, or it can be generated with an argument to accept only input which must match the regular expression and must equal the argument. If there is a <tt>reserved</tt> list, a parser can be generated with an argument from the list to accept input equal to the argument; if it is generated without an argument it accepts any match of the regular expression which is not in the <tt>reserved</tt> list. In any case, if there is a <tt>skip</tt> expression, input matching <tt>skip</tt> is first discarded.
    </p>
    <p>
      In summary, regular expressions are used to partition input, <tt>skip</tt> indicates input to be discarded, generator arguments restrict what partitioned input is acceptable, and each <tt>reserved</tt> list describes special cases to be considered once input has been partitioned. 
    </p>
    
  </chapter>

  <chapter>
    <title id='c-combining'> Combining parsers </title>
    
    <p>
      This section discusses how simple parsers can be combined to create parsers which accept input with a more complicated structure. Consider a grammar for arithmetic expressions:
    </p>
    
    <pre>
      term:    <i>number</i> | '(' sum ')';
      product: term '*' product | term '/' product | term;
      sum:     product '+' sum | product '-' sum | product;
    </pre>
    
    <p>
      A <tt>Parser.Factory</tt> based on the <tt>arithmetic</tt> collection introduced in the preceding section produces parsers for <tt><i>number</i></tt> and the literal operator symbols used in this grammar. Parsers are monadic values and can be combined with <tt>andThen</tt> for sequential execution and <tt>orElse</tt> for alternatives, i.e., those two operations are sufficient to construct parsers for the nonterminals <tt>term</tt>, <tt>product</tt>, and <tt>sum</tt> in this grammar, simply by translating the grammar:
    </p>
    
    <javascript namespace='Expr' out='out'><![CDATA[
      var Expr = {};  // namespace
      with (Expr) {
        Expr.af = new Parser.Factory(arithmetic);

        Expr.term =            af.number()             .orElse(
                               af.symbol('(')       .andThen(
          function () { return sum             .andThen(
          function () { return af.symbol(')'); }       ); } ) );

        term.apply(' 10 ;');
      }
    ]]></javascript>
    
    <eg>
      Use a different <tt><i>number</i></tt> as input to <tt>term</tt>. Then try to input <tt>(10)</tt>.
    </eg>
    <p>
      Each terminal symbol of the grammar such as <tt><i>number</i></tt> or <tt>'('</tt> is translated into a parser generated from a pattern collection by <tt>Parser.Factory</tt>.
    </p>
    <p>
      Each nonterminal symbol such as <tt>term</tt> is translated into a parser combined from other parsers: a sequence of items in the grammar is combined using <tt>andThen</tt>, alternatives are combined using <tt>orElse</tt>. The example above shows that the parsers can often be applied before all other parsers have been defined.
    </p>
    
    <javascript namespace='Expr' out='out'><![CDATA[
      with (Expr) {
        Expr.product =         term                .andThen(
          function () { return af.symbol('*') .andThen(
          function () { return product; }             ); } )   .orElse(
                               term                .andThen(
          function () { return af.symbol('/') .andThen(
          function () { return product; }             ); } ) .orElse(
                               term                                 ) );
    
        Expr.sum =             product             .andThen(
          function () { return af.symbol('+') .andThen(
          function () { return sum; }                 ); } )   .orElse(
                               product             .andThen(
          function () { return af.symbol('-') .andThen(
          function () { return sum; }                 ); } ) .orElse(
                               product                              ) );
        
        term.apply(' (1 + 2*3) ; ');
      }
    ]]></javascript>

    <eg>
      Input some other arithmetic expressions. Also, specify an invalid expression such as <tt>-23</tt>. Change the definition of <tt>term</tt> to permit a single (or even more than one) minus sign.
    </eg>
    <p>
      With a preprocessor-supported notation as described below it will be much easier to specify this nest of function calls. Nevertheless, the result is a functioning parser for arithmetic expressions -- it produces a <tt>value</tt> and the remaining input for a correct phrase as shown above, and an error message in case of failure:
    </p>
    
    <javascript namespace='Expr' out='out'><![CDATA[
      with (Expr) {
        Expr.expr =            sum      .andThen(
          function () { return af.eof(); }      );

        expr.apply(' (1 + 2*3) \n ; ');
      }
    ]]></javascript>

    <p>
      <tt>expr</tt> expects that the input string contains only numbers, operators, and parentheses in the proper order, and optionally white space, nothing else. In the example above the expression is followed by a semicolon on a second line to demonstrate that an error message contains the current line number.
    </p>
    <eg>
      Remove the semicolon from the input to see <tt>expr</tt> succeed. Input other expressions.
    </eg>
    <p>
      It should be noted that the order of the alternatives is important. <tt>orElse</tt> does not implement backtracking, it only activates the second parser if the first one fails. For example, the alternatives of <tt>sum</tt> could be reordered: 
    </p>
    
    <pre>
      sum:     product | product '+' sum | product '-' sum; 
    </pre>
    
    <p>
      which translates into
    </p>
    
    <javascript namespace='Expr_badSum' out='out'><![CDATA[
      with (Expr) {
        Expr.badSum =          product                         .orElse(
                               product             .andThen(
          function () { return af.symbol('+') .andThen(
          function () { return sum; }                 ); } ) .orElse(
                               product             .andThen(
          function () { return af.symbol('-') .andThen(
          function () { return sum; }                 ); } )        ) );

        af.error = function (input, message) {
          var result = Parser.Factory.prototype.error(input, message);
          print(result); return result;
        };
        
        badSum.apply(' 1 + 2 ');
        // delete af['error']
      }
    ]]></javascript>
    
    <eg>
      How much of the input above does <tt>badSum</tt> accept? What happens if the input is enclosed in parentheses?
    </eg>
    <p>
      The <tt>error</tt> method is (temporarily) overridden above so that it produces output as soon as there is any failure. Execution with the original input shows that only the operators of <tt>product</tt> are searched. The two failures happen when <tt>product</tt> tries to find a multiplication operator before it settles for a simple <tt>term</tt>. While <tt>badSum</tt> succeeds, it does not accept the entire expression because the first alternative succeeds with a simple <tt>product</tt>. Therefore, when a grammar is translated into parsers, the longer alternatives have to be specified first.
    </p>
    <p>
      The example suggests that it is inefficient when a failure happens in the middle of an alternative: <tt>orElse</tt> starts over with the original input, i.e., in this case <tt>1</tt> is parsed by <tt>number</tt> and <tt>term</tt> a total of three times before <tt>product</tt> finally succeeds. The grammar can be changed to avoid this:
    </p>

    <pre>
      betterProduct: term mulDivs;
      mulDivs:       '*' betterProduct | '/' betterProduct | <i>/* empty */</i>; 
    </pre>
    
    <p>
      An empty alternative succeeds without accepting input, i.e., without changing state:
    </p>
  
    <javascript namespace='Expr' out='out'><![CDATA[
      with (Parser) with (Expr) {
        Expr.betterProduct =   term    .andThen(
          function () { return mulDivs; }      );

        Expr.mulDivs =         af.symbol('*') .andThen(
          function () { return betterProduct; }       )   .orElse(
                               af.symbol('/') .andThen(
          function () { return betterProduct; }       ) .orElse(
                               succeed('')                     ) );
                               
        betterProduct.apply(' 2 * 3 ');
      }
    ]]></javascript>
    
    <p>
      <tt>with</tt> <tt>(Parser)</tt> is needed in this code because <tt>Parser</tt> provides the method <tt>succeed</tt>.
    </p>
    <eg>
      Insert <tt>badSum</tt> above and combine it with <tt>betterProduct</tt>. Does this improve performance? Does it reduce the number of failures if the input is enclosed in parentheses?
    </eg>
    <p>
      Extended BNF, pioneered by Nicklaus Wirth for Pascal <ref value='r-Pascal'/>, introduces constructs to describe a grammar and avoid recursion. A popular style for the constructs is usually used in Internet RFCs <ref value='r-RFC'/>: items can be grouped with parentheses, optional items are marked with a suffix <tt>?</tt>, and the suffixes <tt>*</tt> and <tt>+</tt> indicate that an item can be repeated zero and one or more times, respectively. The grammar for arithmetic expressions and the translation to parsers could be changed as follows:
    </p>
    
    <pre>
      betterSum: product summands?;
      summands:  ('+' product | '-' product)+;
    </pre>

    <p>
      This translates into:
    </p>
    
    <javascript namespace='Expr' out='out'><![CDATA[
      with (Expr) {
        Expr.betterSum =       product        .andThen(
          function () { return summands.optional(); } );

        Expr.summands =        af.symbol('+') .andThen(
          function () { return product; }             ) .orElse(
                               af.symbol('-') .andThen(
          function () { return product; }             )        ).some();
        
        betterSum.apply(' 2 + 3 - 4 ');
      }
    ]]></javascript>
      
    <eg>
      Add more summands to the input and observe the resulting value.
    </eg>
    <p>
      Often, the parentheses need not be translated explicitly; here they are implied by the fact that method application is left-associative. The implementation uses two of the <tt>Parser</tt> methods <tt>optional</tt>, <tt>some</tt>, and <tt>many</tt> corresponding to the suffixes <tt>?</tt>, <tt>+</tt>, and <tt>*</tt>,  respectively, which can be applied to any <tt>Parser</tt> value:
    </p>
    
    <javascript namespace='Parser' seq='g'><![CDATA[
      Parser.prototype.optional = function (value) {
        with (Parser)
          return this.orElse(succeed(arguments && arguments.length > 0 ? value : ''));
      };
         
      Parser.prototype.some = function () {
        var self = this; // for closure
        with (Parser)
                                return self                                .andThen(
          function (fromSelf) { return self.many()                     .andThen(
          function (fromMany) { return succeed([fromSelf].concat(fromMany)); } ) } );      
      };
      
      Parser.prototype.many = function () {
        with (Parser)
          return this.some().orElse(succeed([]));
      };
    ]]></javascript>
    
    <p id='optional'>
      Given a parser, <tt>optional</tt> creates a new parser with a parser function which will execute the receiver's parser function and return the result if successful; otherwise it will return the argument <tt>value</tt> (or an empty string) and the original input as <tt>state</tt>.
    </p>
    <p id='some'>
      Given a parser, <tt>some</tt> creates a new parser with a parser function which will execute the receiver's parser function one or more times and return a list with the results as <tt>value</tt> and the remaining input as <tt>state</tt>; the parser function fails if the receiver function does not succeed at least once.
    </p>
    <p id='many'>
      Given a parser, <tt>many</tt> creates a new parser with a parser function which tries to execute the receiver's parser function one or more times and return a list with the results as <tt>value</tt> and the remaining input as <tt>state</tt>; the parser function will always succeed but the resulting <tt>value</tt> list is empty and the input <tt>state</tt> is unchanged if the receiver's parser function does not succeed at least once.
    </p>
    <p>
      The implementation of <tt>some</tt> uses the fact that <tt>andThen</tt> combines a parser and a function and passes the <tt>value</tt> resulting from successful execution of the receiver's parser function as argument to the function. In the chain of <tt>andThen</tt> above the function definitions are nested so that all parameter scopes extend to the end of the innermost function, i.e., any parameter can be used from the point where it is introduced and in all the nested functions.
    </p>
    <p>
      The implementations of the methods above suggest that a parser function need not always return a string as <tt>value</tt>. In fact, <tt>some</tt> and <tt>many</tt> arrange to return lists and <tt>optional</tt> can arrange to return an arbitrary value. This can be exploited, e.g., to interpret an arithmetic expression as it is parsed. Here is a grammar expressed with extended BNF:
    </p>

    <pre>
      term:    <i>number</i> | '(' sum ')';
      product: term ('*' term | '/' term)*;
      sum:     product ('+' product | '-' product)*;
      expr:    sum <i>eof</i>;
    </pre>
    
    <p>
      This translates into the following interpreter:
    </p>
    
    <javascript namespace='Eval' out='out'><![CDATA[
      var Eval = {};  // namespace
      with (Parser) with (Eval) {
        Eval.af = new Parser.Factory(arithmetic);
      
        Eval.term =             af.number()                 .orElse(
                                af.symbol('(')           .andThen(
          function ()  { return sum                 .andThen(
          function (s) { return af.symbol(')') .andThen(
          function ()  { return succeed(s); }          ); } ); } ) );

        Eval.product =           term                                           .andThen(
          function (l)  { return af.symbol('*')  .andThen(
          function ()   { return term       .andThen(
          function (r)  { return succeed(
             function (x) { return Number(x) * Number(r); }
                                        ); }        ); } ) .orElse(
                                 af.symbol('/')  .andThen(
          function ()   { return term       .andThen(
          function (r)  { return succeed(
             function (x) { return Number(x) / Number(r); }
                                       ); }        ); } )         ).many() .andThen(
          function (rs) { return succeed(foldl(l, rs)); }                          ); } );
  
        Eval.sum =               product                                        .andThen(
          function (l)  { return af.symbol('+')  .andThen(
          function ()   { return product    .andThen(
          function (r)  { return succeed(
             function (x) { return Number(x) + Number(r); }
                                        ); }        ); } ) .orElse(
                                 af.symbol('-')  .andThen(
          function ()   { return product    .andThen(
          function (r)  { return succeed(
             function (x) { return Number(x) - Number(r); }
                                       ); }         ); } )        ).many() .andThen(
          function (rs) { return succeed(foldl(l, rs)); }                          ); } );
  
        Eval.expr =             sum            .andThen(
          function (s) { return af.eof()  .andThen(
          function ()  { return succeed(s); }     ); } );

        expr.apply(' 10 - 20*30 / (40+50) ');
      }
    ]]></javascript>
    
    <p>
      Where appropriate, functions passed to <tt>andThen</tt> have been given a parameter to bind the value produced by the preceding parser and another <tt>andThen</tt> has been added to each sequence (except to the first alternative of <tt>term</tt>) with a function which uses <tt>succeed</tt> to return a value for the recognized expression phrase.
    </p>
    <p id='foldl'>
      The parsers created using <tt>many</tt> above return lists of curried functions, e.g., the phrase <tt>-</tt> <tt>10</tt> will return
    </p>
    
    <pre>
      function (x) { return Number(x) - 10; }
    </pre>
    
    <p>
      and <tt>many</tt> creates a list of such functions. <tt>foldl</tt> is a generally useful class method of <tt>Parser</tt> which takes a value and a (possibly empty) list of curried functions and applies them left to right to accumulate the result value, i.e., <tt>foldl</tt> interprets left associative operation sequences:
    </p>

    <javascript namespace='Parser' seq='h'><![CDATA[
      Parser.foldl = function (l, rs) {
        for (var n = 0; n < rs.length; ++ n)
          l = rs[n](l);
        return l;
      };
    ]]></javascript>
    
    <eg>
      Input some other expressions.
    </eg>
    <eg>
      Remove <tt>Number</tt> from all functions and evaluate the original input as well as <tt>1+2</tt> and <tt>3*4</tt>. Add code to process the result of <tt>af.number()</tt> so that expressions are again evaluated correctly.
    </eg>
    
  </chapter>

  <chapter>
    <title id='c-sugar'> Syntactic sugar </title>
    
    <p id='notation'>
      The interpreter for arithmetic expressions shown in the preceding section works, but the implementation is rather error-prone due to the excessive use of nested functions as required by <tt>andThen</tt>. Haskell's <i>do</i> notation hugely simplifies specifying computations with monadic values and this section discusses something similar which is needed to make monadic values palatable in JavaScript. The grammar for arithmetic expressions 
    </p>
    
    <pre>
      term:    <i>number</i> | '(' sum ')';
      product: term ('*' term | '/' term)*;
      sum:     product ('+' product | '-' product)*;
      expr:    sum <i>eof</i>;
    </pre>
    
    <p>
      can be translated into an interpreter much more literally using a notation for monadic values:
    </p>
    
    <jsm namespace='EvalM' out='out'><source><![CDATA[
      var EvalM = {};  // namespace
      with (Parser) with (EvalM) {
        EvalM.af = new Parser.Factory(arithmetic);
      
        EvalM.term =
          {{{
              af.number();
          |||
              af.symbol('(');
              s <- sum;
              af.symbol(')');
              succeed(s);
          }}};
          
        term.apply(' 10 ;');
      }
    ]]></source></jsm>
    
    <p>
      <tt>{{{</tt> and <tt>}}}</tt> enclose a computation which will return a monadic value. The computation consists of one or more alternatives, separated by <tt>|||</tt>. An alternative consists of one or more pieces of JavaScript code. Each piece must deliver a monadic value and must be terminated with a semicolon. (Depending on context JavaScript allows a newline to act as a statement terminator but this is not supported here.)
    </p>
    <p>
      Each piece of JavaScript code may be preceded by an identifier and <tt>&lt;-</tt>. In this case the value <i>wrapped</i> by the monadic value produced by the JavaScript code is bound to the identifier, e.g.,
    </p>
    
    <pre>
      <i>x</i> &lt;- succeed(<i>y</i>);
    </pre>
    
    <p>
      will bind the value <tt><i>y</i></tt> to the identifier <tt><i>x</i></tt>. The scope of the identifier extends from the <i>next</i> monadic value to the end of the alternative (i.e., <tt><i>x</i></tt> above cannot be used in place of <tt><i>y</i></tt> but it can be used beyond <tt>succeed</tt>); the identifier may be shadowed within its scope.
    </p>
    <p>
      The notation can be nested and suffixes such as <tt>optional</tt>, <tt>many</tt>, and <tt>some</tt> can be applied. This helps to translate the rest of the grammar into monadic notation:
    </p>
    
    <jsm namespace='EvalM' out='out'><source><![CDATA[
      with (Parser) with (EvalM) {
        EvalM.product =
          {{{
              l <- term;
              rs <-
                {{{
                    af.symbol('*');
                    r <- term;
                    succeed(function (x) { return x * r; });
                |||
                    af.symbol('/');
                    r <- term;
                    succeed(function (x) { return x / r; });
                }}}.many();
              succeed(foldl(l, rs));
          }}};
                
        EvalM.sum =
          {{{
              l <- product;
              rs <-
                {{{
                    af.symbol('+');
                    r <- product;
                    succeed(function (x) { return x + r; });
                |||
                    af.symbol('-');
                    r <- product;
                    succeed(function (x) { return x - r; });
                }}}.many();
              succeed(foldl(l, rs));
          }}};
        
        EvalM.expr =
          {{{
              s <- sum;
              af.eof();
              succeed(s);
          }}};

        expr.apply(' 10 - 20*30 / (40+50) ');
      }
    ]]></source></jsm>
    
    <p>
      The preprocessor described in a subsequent section converts this notation into the code developed in the previous section.
    </p>
    <eg>
      The resulting numerical value is not correct. Use the inputs <tt>1+2</tt> and <tt>3*4</tt>, preprocess, execute, and then amend (and preprocess) the first alternative of <tt>term</tt> above to produce the correct result.
    </eg>
    <eg>
      Add <tt>%</tt> as an operator returning the remainder after division. Add a minus sign operation.
    </eg>
    <eg>
      What happens if there is a division by zero? What happens if a division by zero returns <tt>fail('zero')</tt>?
    </eg> 
    <p>
      Some browsers (most notably Apple's Safari) and the JavaScript interpreter <i>SpiderMonkey</i> <ref value='r-Spidermonkey'/> seem to restrict the JavaScript call stack depth and cannot preprocess much larger examples. Fortunately, Firefox <ref value='r-Firefox'/> and the JavaScript interpreter <i>Rhino</i> <ref value='r-Rhino'/> do not appear to be restricted.
    </p>
  </chapter>

  <chapter>
    <title id='c-trees'> Building trees </title>

    <p>
      It is often convenient to represent input as a tree for further processing. For example, the code from the preceding section can be changed slightly to produce a tree for an arithmetic expression:
    </p>

    <jsm namespace='Tree' out='out'><source><![CDATA[
      var Tree = { Leaf:0, Add:0, Sub:0, Mul:0, Div:0 };
      
      with (Parser) with (Tree) {
        Parser.makeTreeClasses(Tree);
        Tree.af = new Parser.Factory(arithmetic);
      
        Tree.term =
          {{{
              n <- af.number();
              succeed(new Tree.Leaf(n));
          |||
              af.symbol('(');
              s <- sum;
              af.symbol(')');
              succeed(s);
          }}};
          
        term.apply(' 10 ;').value;
      }
    ]]></source></jsm>

    <p>
      <tt>term</tt> remains almost unchanged: a <tt><i>number</i></tt> is represented as a <tt>Leaf</tt> node and whatever <tt>sum</tt> computes is the result of <tt>term</tt>.
    </p>
    
    <jsm namespace='Tree' out='out'><source><![CDATA[
      with (Parser) with (Tree) {
        Tree.product =
          {{{
              l <- term;
              rs <-
                {{{
                    af.symbol('*');
                    r <- term;
                    succeed(function (x) { return new Tree.Mul(x, r); });
                |||
                    af.symbol('/');
                    r <- term;
                    succeed(function (x) { return new Tree.Div(x, r); });
                }}}.many();
              succeed(foldl(l, rs));
          }}};
                
        Tree.sum =
          {{{
              l <- product;
              rs <-
                {{{
                    af.symbol('+');
                    r <- product;
                    succeed(function (x) { return new Tree.Add(x, r); });
                |||
                    af.symbol('-');
                    r <- product;
                    succeed(function (x) { return new Tree.Sub(x, r); });
                }}}.many();
              succeed(foldl(l, rs));
          }}};
        
        Tree.expr =
          {{{
              s <- sum;
              af.eof();
              succeed(s);
          }}};

        expr.apply(' 10 - 20*30 / (40+50) ').value;
      }
    ]]></source></jsm>
    
    <p>
      The functions created in <tt>product</tt> and <tt>sum</tt> are changed to create tree nodes and connect their descendants rather than immediately evaluate the various operations. (The preprocessing area shows the entire tree builder, not just the translation of <tt>expr</tt>, <tt>sum</tt>, and <tt>product</tt>.)
    </p>
    <eg>
      Add or remove parentheses and otherwise change the arithmetic expression to see how the tree changes, e.g., to reflect precedence. Do not forget to preprocess whenever the arithmetic expression is changed.
    </eg>
    <p id='makeTreeClasses'>
      Tree nodes are so simple that a static factory method <tt>Parser.makeTreeClasses</tt> can arrange for each property of a collection such as <tt>Tree</tt> above to be a JavaScript object constructor; by convention only those properties are modified to be tree classes where the name starts with an upper-case letter:
    </p>
    
    <javascript namespace='Parser' seq='i'><![CDATA[
      Parser.makeTreeClasses = function (collection, trace) {
        for (var c in collection)
          if (!c.search(/^[A-Z]/)) {       // change property into tree class
            // constructor
            collection[c] = function () {
              this.content = arguments;
              if (trace) print(this);
            };
            // class name
            collection[c].prototype.className = c;
            // toString
            collection[c].prototype.toString = Parser.dumpTree;
          }
      };
    ]]></javascript>
    
    <p>
      <tt>makeTreeClasses</tt> installs a constructor which simply saves its <tt>arguments</tt> as <tt>content</tt>. If <tt>makeTreeClasses</tt> is called with a second argument which evaluates to <tt>true</tt>, the constructor will immediately display the new object; this is quite helpful to debug a tree builder. 
    </p>
    <p id='dumpTree'>
      <tt>makeTreeClasses</tt> stores the <tt>className</tt> as a shared instance variable. This makes it possible to implement a static <tt>dumpTree</tt> function  which is connected as <tt>toString</tt> for each of the generated classes: 
    </p>
    
    <javascript namespace='Parser' seq='j'><![CDATA[
      Parser.dumpTree = function () {
        var indent = '  ' +
            (arguments.length > 0 && 
             typeof arguments[0] == 'string' ? arguments[0] : ''),
          result = this.className + '\n';
        for (var a = 0; a < this.content.length; ++ a)
          if (this.content[a] == null)
            result += indent + 'null\n';
          else if (typeof this.content[a] != 'object')
            result += indent + this.content[a] + '\n';
          else if (this.content[a] instanceof Array) {
            result += indent + '[ ]\n';
            for (var n = 0; n < this.content[a].length; ++ n)
              result += indent+'  ' +
                this.content[a][n].toString(indent+'  ') + '\n';
          } else
            result += indent + this.content[a].toString(indent) + '\n';
        return result.replace(/\n$/, '');
      };
    ]]></javascript>

    <p>
      <tt>dumpTree</tt> returns the class name followed by an indented list of the values in <tt>content</tt>, if any. The list accounts for nested arrays and nested objects, but nested objects are expected to implement <tt>toString</tt> in a compatible fashion.
    </p>
    <eg>
      Add <tt>%</tt> as an operator returning the remainder after division. Add a minus sign operation.
    </eg>

  </chapter>
  
  <chapter>
    <title id='c-preprocessor'> Converting monadic notation </title>
    
    <p>
      The monadic notation is embedded in JavaScript and a JavaScript interpreter could be extended to deal with the notation directly. However, especially for prototyping purposes, it is much simpler to convert the notation into JavaScript prior to interpretation. This section discusses the significant parts of a preprocessor implementation; the complete code of the preprocessor is available for editing (and self-preprocessing) on a separate edit page.
    </p>
    <p>
       The monadic notation can be described by a grammar which concentrates on the monadic computations and obscures most of JavaScript: 
    </p>
    
    <pre>
      jsm:      term+ <i>eof</i>;
      term:     monad | <i>blanks</i> | <i>word</i> | <i>quoted</i>
          |     '(' term* ')' | '[' term* ']' | '{' term* '}' | <i>symbol</i>;
      monad:    '{{{' mvalues ('|||' mvalues)* '}}}';
      mvalues:  mvalue+;
      mvalue:   <i>blanks</i>? (<i>word</i> <i>blanks</i>? '&lt;-')? term+ ';' <i>blanks</i>?;
    </pre>
    
    <p>
      The grammar shows which parsers (such as <tt><i>blanks</i></tt>) have to be created with <tt>Parser.Factory</tt> and which literal symbols have to be accepted by one or more of these parsers. Describing the parser factory is the first step to implementing the preprocessor.
    </p>
    <p>
      <tt>jsm</tt> is JavaScript code and can specify monadic values using <tt>monad</tt> phrases. The preprocessor normalizes white space, i.e., it replaces comments and multiple blanks by single blanks and it preserves all newline characters for the benefit of JavaScript so that there is a trivial correspondence between input and output lines. Therefore the parser factory description does not contain a <tt>skip</tt> property:
    </p>
    
    <pre><![CDATA[
      JSM.scanner = {
        blanks:   /^(\s|\/\/.*|\/\*([^*]|\*+[^\/*])*\*+\/)+/,
        word:     /^[a-zA-Z_][a-zA-Z_0-9]*/,
        quoted:   /^("([^"\\\n]|\\.)*"|'([^'\\\n]|\\.)*'|\(\/([^\/\\\n]|\\.)*\/\))/,
        symbol:   /^(\{\{\{|\|\|\||\}\}\}|<-|[^a-zA-Z_])/,
        eof:      /^$/
      };
    ]]></pre>
    
    <p>
      <tt>blanks</tt> describes the white space and comments which have to be normalized for output. <tt>word</tt> describes identifiers to which the monadic notation can bind wrapped values. <tt>quoted</tt> describes strings and regular expressions so that their content cannot be mistaken for symbols. Finally, <tt>symbol</tt> describes the significant multi-character symbols and all single characters which cannot start an identifier -- this does include single digits which make up numbers.
    </p>
    <eg>
      Add digit sequences as <tt><i>number</i></tt> to the <tt>scanner</tt> description and to the <tt>term</tt> function discussed below.
    </eg>
    <p>
      A reserved list is specified to distinguish symbols which are significant for the grammar from the other symbols which the pattern matches:
    </p>
  
    <pre><![CDATA[
      JSM.scanner.symbol.reserved =
        [ '{{{', '|||', '}}}', '<-', ';', '(', ')', '[', ']', '{', '}' ];
    ]]></pre>

    <p>
      Unfortunately, just like strings, regular expressions may contain other characters which then loose their syntactic meaning, but regular expressions are harder to detect than strings because the leading slash may also appear alone as a division operator. To simplify the problem, the preprocessor requires that a regular expression must be enclosed in parentheses without intervening blanks.
    </p>
    
    <p>
      The grammar can now be translated into the notation which it describes using methods such as <tt>many</tt>, etc., for the suffixes. For example:
    </p>
    
    <pre><![CDATA[
      with (Parser) with (JSM)
        // mvalue: blanks? (word blanks? '<-')? term+ ';' blanks?
        JSM.mvalue =
          {{{
              pf.blanks().optional();
              {{{
                  pf.word();
                  pf.blanks().optional();
                  pf.symbol('<-');
              }}}.optional();
              term(true).some();
              pf.symbol(';');
              pf.blanks().optional();
          }}};
    ]]></pre>
    
    <p>
      Once a preprocessor is available, this code can be converted to JavaScript and executed to check if input conforms to the grammar. Unfortunately, for the initial version of the preprocessor this code had to be hand-translated using <tt>andThen</tt>, etc., in the style shown before.
    </p>
    <p>
      The code fragment hints at a complication: <tt>term</tt> is used three times in the grammar: at the level of JavaScript code in <tt>jsm</tt>, at the level of monadic code in <tt>mvalue</tt> as shown above, and recursively, enclosed by various parentheses in <tt>term</tt> itself:
    </p>
    
    <pre>
      term: monad | <i>blanks</i> | <i>word</i> | <i>quoted</i>
          | '(' term* ')' | '[' term* ']' | '{' term* '}' | <i>symbol</i>;
    </pre>
    
    <p>
      In <tt>mvalue</tt> a semicolon is significant but elsewhere it is not, i.e., the wildcard <tt><i>symbol</i></tt> cannot always match a semicolon. Fortunately, <tt>term</tt> is encoded as a monadic value, i.e., as a data item which can be created by a function with a parameter which controls whether or not a semicolon should be recognized when the data item is applied:
    </p>

    <pre><![CDATA[
      with (Parser) with (JSM)
        JSM.term = function (noSemicolon) {
          return (
            {{{
                monad;
            |||
                pf.blanks();
            |||
                // ...
            |||
                pf.symbol(';');
                noSemicolon ? fail('semicolon not expected')
                            : succeed(';');
            |||
                pf.symbol('(');
                term(false).many(); 
                pf.symbol(')');
            |||
                // ...
            |||
                pf.symbol(); // all non-reserved symbols
            }}});
        };
    ]]></pre>

    <p>
      JavaScript's handling of newlines can result in ambiguities -- this is why <tt>blanks</tt> are explicit in this grammar so that newlines can be passed from monadic notation to preprocessed JavaScript. Specifically, if <tt>return</tt> and an argument are separated by a newline, JavaScript will treat <tt>return</tt> alone as a statement! As the code above shows this is easily circumvented by enclosing a <tt>return</tt> argument in parentheses and inserting a newline, if any, only <i>after</i> the leading parenthesis.
    </p>
    <p>
      Once a grammar has been translated into monadic notation -- and executed to recognize some input if at all possible -- it needs to be extended with code which will represent or interpret the input. The preprocessor uses the factory method <tt>makeTreeClasses</tt> discussed in the previous section; the problem of translating the input is thus delegated to adding appropriate methods to these classes later. The necessary classes can be discovered top-down, simply by adding identifiers to bind interesting values returned by parsers and adding constructor calls in <tt>succeed</tt> calls to represent the interesting values. The monadic notation is ideally suited to this approach:
    </p>
    
    <pre><![CDATA[
      with (Parser) with (JSM)
        JSM.term = function (noSemicolon) {
          return (
            {{{
                monad;
            |||
                b <- pf.blanks();
                succeed(new Blank(b));
            |||
                // ...
            |||
                pf.symbol(';');
                noSemicolon ? fail('semicolon not expected')
                            : succeed(new Text(';'));
            |||
                      pf.symbol('(');
                ts <- term(false).many(); 
                      pf.symbol(')');
                succeed(new Paren('(', ts, ')'));
            |||
                // ...
            |||
                s <- pf.symbol(); // all non-reserved symbols
                succeed(new Text(s));
            }}});
        };
    ]]></pre>
    
    <p>
      <tt>Paren</tt> and <tt>Text</tt> can be implemented using <tt>makeTreeClasses</tt> but <tt>Blank</tt> is best coded manually because white space is normalized before it is passed through:
    </p>

    <pre><![CDATA[
      JSM.Blank = function (value) {
        this.value = value.replace(/./g, '');  // remove all but newlines
        if (!this.value) this.value = ' ';     // if no newlines: single blank
        if (trace) print(this);
      };

      JSM.Blank.prototype.toString = function () {
        switch (this.value) {
        case ' ':  return 'blank';
        case '\n': return 'newline';
        default:   return this.value.length+' newlines';
        }
      };
    ]]></pre>
    
    <p>
      The grammar and the monadic notation can be changed to perform some input rewriting or validation which cannot be conveniently expressed in the grammar itself. For example, the original rule
    </p>
    
    <pre>
      mvalue:  <i>blanks</i>? (<i>word</i> <i>blanks</i>? '&lt;-')? term+ ';' <i>blanks</i>?;
    </pre>

    <p>
      can be rewritten as
    </p>

    <pre>
      mvalue:  ( <i>blanks</i> <i>word</i> <i>blanks</i>? '&lt;-'
               |        <i>word</i> <i>blanks</i>? '&lt;-'
               | <i>blanks</i>?                 )  term+ ';' <i>blanks</i>?;
    </pre>
    
    <p>
      because this makes it simple to combine the first two optional pieces of white space before handing them to the constructor. Moreover, <tt>term</tt> can be <tt><i>blanks</i></tt> and this is reasonable wherever <tt>term</tt> is used in this grammar -- empty parentheses or braces, no JavaScript code at all, etc. -- except in the situation shown above: a monadic value cannot be empty. This needs to be checked before <tt>mvalue</tt> is accepted:
    </p>
    
    <pre><![CDATA[
      with (Parser) with (JSM)
        JSM.mvalue =
          {{{
              bw <- {{{
                        b1 <- pf.blanks();
                        w  <- pf.word();
                        b2 <- pf.blanks().optional(null);
                        pf.symbol('<-');
                        succeed([new Blank(b1 + (b2 ? b2 : '')), w]);
                    |||
                        w  <- pf.word();
                        b2 <- pf.blanks().optional(null);
                        pf.symbol('<-');
                        succeed([b2 ? new Blank(b2) : null, w]);
                    |||
                        b1 <- pf.blanks().optional(null);
                        succeed([b1 ? new Blank(b1) : null, null]);
                    }}};
              ts <- term(true).some();
              pf.symbol(';');
              b3 <- pf.blanks().optional(null);
              (function () {
                for (var n = 0; n < ts.length; ++ n)
                  if (!(ts[n] instanceof Blank))
                    return succeed(new Mvalue(bw[0], bw[1], ts,
                                     b3 ? new Blank(b3) : null));
                return fail('expecting monadic value');
              })();
          }}};
    ]]></pre>
    
    <p>
      <tt>bw</tt> is bound to a list containing <tt>null</tt> or <tt>Blank</tt> for the first two pieces of white space and <tt>null</tt> or a string for the identifier; <tt>ts</tt> is bound to a list of terms which will not include semicolons; finally, <tt>b3</tt> is bound to <tt>null</tt> or a string with the last piece of white space, if any. All that is left is a loop over the <tt>term</tt> list to see if it contains a non-<tt>Blank</tt> and if so, the <tt>mvalue</tt> parser succeeds and the <tt>Mvalue</tt> constructor can be called with a canonical representation of the possible descendants.
    </p>
    <p>
      Programming languages often make a difference between statements and expressions. The monadic notation requires that each computation produce a monadic value so that it can be unwrapped and bound to an identifier if one is specified. This means that a computation has to be a JavaScript expression, i.e., it can use conditional evaluation with <tt>?</tt> <tt>:</tt> but it cannot use a loop. However, the code above shows that one can always insert a parameterless, anonymous function containing statements and call the function immediately following the definition. Among the statements must be at least one <tt>return</tt> to create the required value.
    </p>
    <p>
      Finally, given a tree representing a JavaScript program with embedded <tt>monad</tt> phrases, code generation is implemented as a method <tt>gen</tt> in each of the classes from which the tree is built:
    </p>
    
    <pre><![CDATA[
      JSM.Blank.prototype.gen = function () {
        return this.value;
      };

      JSM.Paren.prototype.gen = function () {
        var content = this.content[1],
          result = '';
        for (var n = 0; n < content.length; ++ n)
          result += content[n].gen();
        return this.content[0] + result + this.content[2];
      };
    ]]></pre>
    
    <p>
      <tt>Text</tt> and <tt>Blank</tt> simply emit text or normalized white space, respectively. <tt>Paren</tt> emits the delimiters and uses a loop to generate code for the <tt>content</tt> between the delimiters. The hard part of the conversion is accomplished by <tt>Monad</tt>, <tt>Mvalues</tt>, and <tt>Mvalue</tt>. A <tt>Monad</tt> tree node contains a non-empty list of <tt>Mvalues</tt> and uses a loop to connect their code with <tt>orElse</tt> if needed:
    </p>

    <pre>
      <i>// Monad: Mvalues+
      //   Mvalues </i>.orElse( <i>Mvalues</i> )<i> ...</i>
      JSM.Monad.prototype.gen = function () {
        var content = this.content[0], 
          result = content[0].gen();
        for (var n = 1; n &lt; content.length; ++ n)
          result += '.orElse(' + content[n].gen() + ')';
        return result;
      };
    </pre>

    <p>
      <tt>Mvalues</tt> contains a non-empty list of <tt>Mvalue</tt> objects and starts a loop implemented by recursion so that they can implement the appropriate function parameter name and connect their code with <tt>andThen</tt> if needed:
    </p>

    <pre>
      <i>// Mvalues: Mvalue+</i>
      JSM.Mvalues.prototype.gen = function () {
        var content = this.content[0];
        return content[0].gen('', content, 1);
      };

      <i>// Mvalue: Blank? word? (Blank|Monad|Paren|Text)+ Blank?
      //   Blank? term+ Blank? </i>.andThen(function (<i>word?</i>) {<i> return ... </i>})<i>?</i>
      Mvalue.prototype.gen = function (head, content, next) {
        if (this.content[0]) head += this.content[0].gen();
        for (var n = 0; n &lt; this.content[2].length; ++ n)
          head += this.content[2][n].gen();
        if (this.content[3]) head += this.content[3].gen();
        if (next &lt; content.length) {
          head += '.andThen(function (';
          if (this.content[1]) head += this.content[1];
          head += ') { return (';
          head = content[next].gen(head, content, next+1);
          head += '); })'
        }
        return head;
      };
    </pre>
    
    <p>
      Normally the constructor calls are issued by the <tt>succeed</tt> clauses in the monadic notation; however, for testing purposes a tree can be built, displayed, and converted interactively:
    </p>
    
    <javascript namespace='JSM_Tree' out='out'><![CDATA[
      with (Parser) with (JSM) {
        new Monad([
            new Mvalues([
                new Mvalue(null, 'n', [
                    new Text('number')
                  ], null),
                new Mvalue(null, null, [
                    new Text('succeed'), 
                    new Paren('(', [new Text('n-0')], ')')
                  ], null)
              ])
          ]) // .gen()
      }
    ]]></javascript>
  
    <eg>
      Apply <tt>gen</tt> as indicated to see the generated code.
    </eg>
    <p>
      But for white space, this example generates the same code as the monadic notation:
    </p>   
    
    <jsm namespace='JSM_Tree'><source><![CDATA[
      {{{
          n <- number;
          succeed(n-0);
      }}}
    ]]></source></jsm>
    
    <p>
      The tree display is always very useful to design code generation methods. It is instructive to take the tree output above and walk through the <tt>gen</tt> methods for the various preprocessor tree classes.
    </p>
    
  </chapter>

  <chapter>
    <title id='c-interpreter'> A monadic interpreter </title>
    
    <p>
      This section discusses the implementation of a little imperative programming language. The implementation uses monadic classes and consists of a parser to build a tree to represent the target program and of an interpreter which evaluates the tree. The complete code of the system is available for editing and preprocessing on a separate edit page.
    </p>
    <p>
      The following program implements Euclid's algorithm to compute the greatest common divisor of two natural numbers. It uses typical features of such a little imperative programming language:
    </p>
    
    <language interpreter='Mini'><![CDATA[
      // greatest common divisor

      { x = 36
        y = 54
        while x <> y do
          if x > y then
            x = x - y
          else
            y = y - x
        print x
      }
    ]]></language>
        
    <p>
      The following function controls the processing of a source string written in the little language:
    </p>

    <javascript namespace='Mini' seq='d'><![CDATA[
      with (Mini) with (Memory) {

        Mini.interpret = function (source) {
          // compile
          var tree = prog.apply(source);
          if ('fail' in tree)
            print(tree.fail);

          else {
            // uncomment to see the tree
            // print(tree.value);

            // make interpreter
            var monad = tree.value.eval();

            // uncomment to see the interpreter
            // print(dump(monad));

            // execute, produce final environment
            // try new ClonedHash()
            return monad.apply(new Hash());
          }
        };
      }
    ]]></javascript>

    <eg>
      Change the little language example above to obtain the greatest common divisor of other pairs of natural numbers.
    </eg>
    <eg>
      Introduce an error, e.g., add a semicolon after <tt>36</tt>.
    </eg>
    <eg>
      Extend the example to compute the least common multiple of the two numbers.
    </eg>
    
    <p>
      The parser and tree builder for the little language is implemented just like the preprocessor discussed in the previous section. The grammar is an extension of the grammar for arithmetic expressions presented earlier:
    </p>
    <pre>
      term:        <i>number</i> | <i>word</i> | '(' sum ')';
      product:     term ('*' term | '/' term)*;
      sum:         product ('+' product | '-' product)*;
      comparison:  sum ('&lt;' sum | '&lt;=' sum
                |  '>' sum | '>=' sum | '&lt;>' sum | '=' sum);
      stmt:        sum | <i>word</i> '=' sum | 'print' sum
          |        'if' comparison 'then' stmt ('else' stmt)?
          |        'while' comparison 'do' stmt
          |        '{' stmt* '}'
          |        'try' stmt 'catch' (<i>word</i> ':')? stmt
          |        'raise' <i>quoted</i>;
      prog:        stmt <i>eof</i>;
    </pre>
    
    <p>
      The grammar can be translated into monadic notation as soon as the collection of regular expressions for the <tt>Parser.Factory</tt> has been defined:
    </p>
    <pre><![CDATA[
      Mini.scanner = {
        skip:    /^(\s|\/\/.*|\/\*([^*]|\*+[^\/*])*\*+\/)+/,  // Java comments
        word:    /^[a-zA-Z_][a-zA-Z_0-9]*/,                   // identifiers
        number:  /^[0-9]+/,                                   // integers
        quoted:  /^'[^'\n]*'/,                                // simple strings
        symbol:  /^(<=|>=|<>|.)/,
        eof:     /^$/
      };
      Mini.scanner.word.reserved = [
        'catch', 'do', 'else', 'if', 'print', 'raise', 'then', 'try', 'while'
      ];
    ]]></pre>
    
    <p>
      Once the grammar is translated, it is easy to see which classes are needed to represent a program. They can all be generated using <tt>makeTreeClasses</tt>:
    </p>    
    <pre><![CDATA[
      var Mini = {
        // arithmetic
        Add: 1, Div: 1, Mul: 1, Sub: 1,            // left right
        Leaf: 1,                                   // number
        Name: 1,                                   // string
        // comparisons
        Eq: 1, Ge: 1, Gt: 1, Le: 1, Lt: 1, Ne: 1,  // left right
        // statements
        Assign: 1,                                 // string sum
        Block: 1,                                  // stmt+
        Expr: 1,                                   // sum
        If: 1,                                     // comparison stmt stmt?
        Print: 1,                                  // sum
        While: 1,                                  // comparison stmt
        // exception handling
        Raise: 1,                                  // string
        Try: 1                                     // stmt name? stmt
      };
    ]]></pre>

    <eg>
      Change <tt>interpret</tt> above to display the tree which represents Euclid's algorithm. Execute the example and compare the tree to the <tt>Mini</tt> collection. Dump the interpreter.
    </eg>
    <eg>
      On the edit page, change <tt>scanner</tt> and <tt>stmt</tt> so that a block must be enclosed with words such as <tt>begin</tt> and <tt>end</tt>. Change the example and test.
    </eg>
    <eg>
      Test nested <tt>if</tt> statements. What happens to a "dangling" <tt>else</tt> clause? Use the edit page for <tt>Mini</tt> and make <tt>else</tt> mandatory.
    </eg>
    
    <p>
      The interpreter for the little language can be implemented with the <i>visitor</i> design pattern or by adding <tt>eval</tt> methods to the tree classes. An interpreter maps the features of the target language to the host language and it has to simulate missing features; e.g., Haskell does not support assignment and exception handling, i.e., if an interpreter is written in Haskell those features have to be simulated using monadic values.
    </p>
    <p>
      JavaScript supports both, global assignment and exception handling, and there is no obvious need for simulation. However, monadic values make it possible to separate host and target behavior. The interpreter discussed here uses a monadic class <tt>Memory</tt> which holds an environment of current variable names and values accessible through <tt>fetch</tt> and <tt>store</tt> methods as the evaluation proceeds. The environment can be implemented as a collection:
    </p>
    <pre><![CDATA[
      // collection-based environment.
      Mini.Hash = function () { };

      Mini.Hash.prototype.fetch = function (name) {
        return this[name];
      };

      Mini.Hash.prototype.store = function (name, value) {
        this[name] = value;
        return this;
      };
    ]]></pre>
    <p>
      All evaluation methods must return monadic values, for example:
    </p>
    <pre><![CDATA[
      // monad with an environment
      Mini.Memory = Monad.subclass();
      
      with (Mini) with (Memory)
        Leaf.prototype.eval = function () {
          var value = this.content[0];  // number
          return succeed(Number(value));
        };
    ]]></pre>
    <p>
      Evaluation methods for binary operators can be implemented with a factory function <tt>bin</tt> which receives the actual operation as a parameter and creates the evaluation method with monadic notation, for example:
    </p>
    <pre><![CDATA[
      Mini.bin = function (op) {       // operation as a function
        return function () {
          var left = this.content[0],  // left operand tree
            right = this.content[1];   // right operand tree
          return (
            {{{
                l <- left.eval();
                r <- right.eval();
                succeed(op(l, r));
            }}});
        };
      };
      
      with (Mini) with (Memory)
        Add.prototype.eval = bin(function (l, r) { return l + r; });
    ]]></pre>

    <p>
      <tt>Memory</tt> maintains an environment with <tt>fetch</tt> and <tt>store</tt> operations as state. Access to the state is combined with these operations to implement variable reference and assignment:
    </p>    
    <pre><![CDATA[
      with (Mini) with (Memory) {
        Name.prototype.eval = function () {
          var name = this.content[0];  // variable name
          return (
            {{{
                env <- get;
                succeed(env.fetch(name));
            }}});
        };
        
        Assign.prototype.eval = function () {
          var name = this.content[0],  // variable name
            sum = this.content[1];     // expression tree
          return (
            {{{
                value <- sum.eval();
                env <- get;
                put(undefined, env.store(name, value));
            }}});
        };
      }
    ]]></pre>
    <p>
      Once monadic values are employed, <tt>andThen</tt> must be used to sequence execution at the statement level. <i>if</i> can be mapped to conditional evaluation; a missing <i>else</i> part has to be fudged:
    </p>
    <pre><![CDATA[
      with (Mini) with (Memory)
        If.prototype.eval = function () {
          var c = this.content[0],  // condition tree
            t = this.content[1],    // then tree
            e = this.content[2];    // else tree
          return (
            {{{
                cond <- c.eval();
                cond ? t.eval() : (e ? e.eval() : succeed(undefined));        
            }}});
        };
    ]]></pre>
    <p>
      A sequence of statements requires a loop which in turn requires a recursive implementation:
    </p>
    <pre><![CDATA[
      with (Mini) with (Memory)
        Block.prototype.eval = function (_n) {
          var self = this,              // for closure
            n = _n ? _n : 0,            // for tail recursion
            content = this.content[0];  // the statement list
          if (n >= content.length)
            return succeed(undefined);  // completed block has no value
          else
            return (
              {{{
                  content[n].eval();    // sequencing must be by andThen
                  self.eval(n+1);
              }}});
        };
    ]]></pre>
    <p>
      Similarly, <i>while</i> must also be based on recursion:
    </p>
    <pre><![CDATA[
      with (Mini) with (Memory)
        While.prototype.eval = function () {
          var self = this,              // for closure
            c = this.content[0],        // condition tree
            body = this.content[1];     // body tree
          return (
            {{{
                cc <- c.eval();
                (function () {
                  if (!cc) return succeed(undefined);
                  return (
                    {{{
                        body.eval();
                        self.eval();
                    }}});
                })();      
            }}});
        };
    ]]></pre>
    <p>
      For a host language like JavaScript, which has all the features of the target language, this approach looks too complicated. However, a monadic class can be used to implement exception handling very elegantly and with a choice of semantics:
    </p>
    <language interpreter='Mini'><![CDATA[
      // division by zero
      
      // try
        1/0
      // catch 2
      // catch e: print e
    ]]></language>
    <eg>
      Uncomment <tt>try</tt> and the first <tt>catch</tt>.
    </eg>
    <eg>
      Uncomment <tt>try</tt> and the second <tt>catch</tt>.
    </eg>
    
    <p>
      Exception handling is based on <tt>fail</tt> and <tt>orElse</tt>, i.e., an operation such as division can use <tt>fail</tt> and cause an exception which a little language program can catch:
    </p>
    
    <pre><![CDATA[
      with (Mini) with (Memory)
        Div.prototype.eval = function () {
          var left = this.content[0],  // left operand tree
            right = this.content[1];   // right operand tree
          return (
            {{{
                l <- left.eval();
                r <- right.eval();
                r ? succeed(l / r) : fail('division by zero');
            }}});
          };
    ]]></pre>
    <p>
      A failure can be caught with <tt>orElse</tt>, i.e., with <tt>|||</tt> in monadic notation:
    </p>
    <pre><![CDATA[
      with (Mini) with (Memory)
        Try.prototype.eval = function () {
          var a = this.content[0],  // try body tree
            n = this.content[1],    // catch variable name if any
            b = this.content[2];    // catch body tree
          if (!n)  // no catch variable name
            return (
              {{{
                  a.eval();
              |||
                  b.eval();
              }}});
    ]]></pre>
    <p>
      The <tt>try</tt> body <tt>a</tt> is evaluated. If there is a failure, the <tt>catch</tt> body <tt>b</tt> is evaluated. Both use the same initial state (environment).
    </p>
    <p>
      It is even possible to make the error message from <tt>fail</tt> available as a variable value in the catch body. This is accomplished by re-implementing <tt><ref value='orElse'/></tt> with a function argument in the style of <tt>andThen</tt>:
    </p>
    <pre><![CDATA[
          function onError (a, b) { // re-implement orElse with a parameter
            return new Memory(
              function (state) {
                var result = a.apply(state);
                return 'fail' in result ? b(result.fail).apply(state) : result;
              }
            );
          }
    ]]></pre>
    <p>
      The alternative <tt>b</tt> must be wrapped as a function in order to provide the parameter scope for whatever value should be passed into the alternative. With <tt>onError</tt>, exception handling can be implemented so that the exception handler has access to the failure message: 
    </p>
    <pre><![CDATA[
          return onError(a.eval(),
            function (value) {
              return (
                {{{
                    env <- get;
                    put(undefined, env.store(n, value));
                    b.eval();
                }}});
            });
        };      
    ]]></pre>
    <p>
      The <tt>try</tt> body <tt>a</tt> is evaluated unconditionally. If it fails, the failure message is bound to the parameter <tt>value</tt> and from there to the variable name <tt>n</tt> in the environment before the <tt>catch</tt> body <tt>b</tt> is evaluated, i.e., <tt>b</tt> has access to the failure message with the variable name <tt>n</tt>.
    </p>
    <p>
      Perhaps the most interesting aspect of this implementation of exception handling is that the interaction between assignment and exception handling can be influenced by the implementation of the environment.
    </p>
    <language interpreter='Mini'><![CDATA[
      // exception handling
      
      try { 
        foo = 10
        bar = 20
        raise 'error'
      } catch
        foobar = 30
    ]]></language>
    <p>
      This example produces a final environment which contains values for <tt>foo</tt>, <tt>bar</tt>, <i>and</i> <tt>foobar</tt>, i.e., <tt>catch</tt> seems to have operated on the state at the point of <tt>raise</tt> and not -- as should be the case for a monad-based implementation -- on the state at <tt>try</tt>. This behavior is caused by <tt>Hash</tt>, the implementation of the environment: As shown above, <tt>catch</tt> is implemented using <tt>orElse</tt>, i.e., <tt>try</tt> and <tt>catch</tt> do start out with the same environment. However, <tt>Hash</tt> is a simple JavaScript collection, i.e., all changes are persistent. The persistence can be removed, e.g., by cloning the environment whenever it is changed (there are less expensive ways):
    </p>
    <pre><![CDATA[
      Mini.ClonedHash = function () { };

      Mini.ClonedHash.prototype.fetch = function (name) {
        return this[name];
      };

      Mini.ClonedHash.prototype.store = function (name, value) {
        var result = new Mini.ClonedHash();
        for (var n in this)
          if (this.hasOwnProperty(n))
            result[n] = this[n];
        result[name] = value;
        return result;
      };
    ]]></pre>
    <eg>
      Change <tt>interpret</tt> near the beginning of this section to use a <tt>ClonedHash</tt> environment and execute the example again.
    </eg>
    <p>
      If the environment is not persistent, the assignments in the <tt>try</tt> block are undone if there is an exception. While the implementation is expensive, the fact that this happens invisibly might make this variant of a <tt>try</tt> statement quite attractive, e.g., in a backtracking context.
    </p>
  </chapter>

  <chapter>
    <title id='c-summary'> Summary </title>
    
    <p>
      This web page discusses a functional programming style suggested by the <i>Monad</i> and <i>MonadPlus</i> classes of Haskell and introduced a notation to facilitate coding in JavaScript. The significant problem domain is parsing: support for monadic parsers exists for Haskell <ref value='r-Parsec'/>, Python <ref value='r-Pysec'/>, and other languages <ref value='r-OtherMonads'/>.
    </p>
    <p>
      This web page describes monadic LL(n) parsing with JavaScript. It contains the implementation of <tt>Monad</tt>, a base class for monadic JavaScript classes which wrap state functions. Subclasses can be created with the class method <tt><ref value='subclass'/></tt>. A subclass has the shared monadic value <tt><ref value='get'/></tt> to fetch the current state; the class methods <tt><ref value='put'/></tt> to create a monadic value to store a new value and state, <tt><ref value='succeed'/></tt> and <tt><ref value='fail'/></tt> to create monadic values reporting success and failure, and <tt><ref value='dump'/></tt> to serialize collections; and the methods <tt><ref value='apply'/></tt> to apply a monadic value, and <tt><ref value='andThen'/></tt> and <tt><ref value='orElse'/></tt> to create monadic values which apply monadic values sequentially or as alternatives.
    </p>
    <p>
      This web page contains the implementation of <tt>Parser</tt>, a <tt>Monad</tt> subclass of monadic parsers which can be combined to represent LL(n) grammars. <tt>Parser</tt> provides the additional combinators <tt><ref value='optional'/></tt>, <tt><ref value='some'/></tt>, and <tt><ref value='many'/></tt> to support representing EBNF grammars. <tt>Parser</tt> has a number of class methods. <tt><ref value='foldl'/></tt> supports left-associative assembly of a list of functions and <tt><ref value='makeTreeClasses'/></tt> converts certain property names in a collection into classes to represent trees which are connected to <tt><ref value='dumpTree'/></tt> for serialization. <tt><ref value='Parser.Factory'/></tt> is a class which is constructed with a collection of regular expressions and has methods to construct parsers which accept input matching the regular expressions.
    </p>
    <p>
      This web page contains a preprocessor <tt><ref value='c-preprocessor'/></tt> which extends JavaScript with a <ref value='notation'/> which simplifies specifying computations which involve monadic values. In particular, the notation facilitates translating LL(n) grammars into combinations of <tt>Parser</tt> values. From this perspective, <tt>JSM</tt> is a parser generator for JavaScript and the monadic notation extends JavaScript as input language for the parser generator. The web page contains three larger examples which use the preprocessor: a tree builder for a small programming language, an interpreter for the trees which implements error recovery and a functional approach to assignment, and finally the preprocessor itself. 
    </p>
    <p>
      <tt>Monad</tt> only requires functions as first-order values, <tt>Parser </tt> additionally benefits from regular expressions. Dynamic typing is convenient but not essential. Therefore, this web page can serve as a blueprint for quickly implementing parser generators in other programming languages where these features are available or can be simulated.
    </p>
    <p>
      Finally, all code in this web page can be edited and used interactively. Therefore, the support files of this web page can be used as a framework for interactive, literate programming in JavaScript.
    </p>
  </chapter>
  
  <files>
    <title id='Files'> Download Files </title>

    <file file='code/jsm.js'>
      command line program to run the preprocessor </file>
      
    <file file='code/mini.js'>
      command line program to compile and execute the little language </file>
      
    <file file='code/getStdin.js'>
      an attempt at reading source from standard input. </file>
      
    <file file='paper/'>
      infrastructure for interactive, literate JavaScript tutorials. </file>
      
  </files>

  <namespaces>
    <title id='Namespaces'> Download Namespaces </title>
    
    <namespace name='Monad'>
      abstract base class for monadic classes. </namespace>
    
    <namespace name='Axioms' requires='Monad'>
      examples illustrating the monad axioms. </namespace>
    
    <namespace name='Parser' requires='Monad'>
      monadic parser. </namespace>
    
    <namespace name='arithmetic'>
      patterns for arithmetic expressions. </namespace>
    
    <namespace name='language'>
      patterns for a small programming language. </namespace>
    
    <namespace name='Scanners' requires='arithmetic language Monad Parser'>
      examples illustrating parsing. </namespace>
    
    <namespace name='Expr' requires='arithmetic Monad Parser'>
      recognize arithmetic expressions. </namespace>
    
    <namespace name='Expr_badSum' requires='arithmetic Monad Parser Expr'>
      recognize arithmetic expressions (defective). </namespace>

    <namespace name='Eval' requires='arithmetic Monad Parser'>
      evaluate arithmetic expressions. </namespace>
    
    <namespace name='EvalM' requires='arithmetic Monad Parser'>
      evaluate arithmetic expressions (monadic notation). </namespace>
    
    <namespace name='Tree' requires='arithmetic Monad Parser'>
      represent arithmetic expressions as trees. </namespace>
    
    <namespace name='JSM' requires='Monad Parser'>
      preprocessor to convert monadic notation to JavaScript. </namespace>

    <namespace name='JSM_Tree' requires='Monad Parser JSM'>
      illustrate tree for mandic notation. </namespace>

    <namespace name='Mini' requires='Monad Parser'>
      interpreter for a small programming language. </namespace>

  </namespaces>

  <references>
    <title id='References'> References </title>
  
    <item id='r-Haskell'>
      <i>Haskell - HaskellWiki</i>,
      <url date='June 30, 2008'> http://haskell.org/ </url>.
    </item>

    <item id='r-CSharp'>
      <i>The C# Language</i>,
      <url date='June 30, 2008'> 
        http://msdn.microsoft.com/en-us/vcsharp/aa336809.aspx </url>.
    </item>

    <item id='r-Groovy'>
      <i>Groovy - Home</i>,
      <url date='June 30, 2008'> http://groovy.codehaus.org/ </url>.
    </item>

    <item id='r-Java-Closures'>
      <i>Closures for the Java Programming Language</i>,
      <url date='June 30, 2008'> http://javac.info/ </url>.
    </item>

    <item id='r-JavaScript'>
      <i>ECMAScript Language Specification</i>,
      Edition 3,
      <url date='June 30, 2008'>
        http://www.mozilla.org/js/language/E262-3.pdf </url>.
    </item>

    <item id='r-Python'>
      Guido van Rossum,
      <i>Python Reference Manual</i>,
      <url date='June 30, 2008'> http://docs.python.org/ref/ </url>.
    </item>

    <item id='r-Ruby'>
      <i>Ruby Home Page</i>,
      <url date='June 30, 2008'> http://www2.ruby-lang.org/en/ </url>.
    </item>

    <item id='r-Scheme'>
      <i>The Scheme Programming Language</i>,
      <url date='June 30, 2008'>
        http://www-swiss.ai.mit.edu/projects/scheme/ </url>.
    </item>

    <item id='r-Rhino'>
      <i>Rhino - JavaScript for Java</i>,
      <url date='June 30, 2008'> http://www.mozilla.org/rhino </url>.
    </item>

    <item id='r-Spidermonkey'>
      <i>SpiderMonkey (JavaScript-C) Engine</i>,
      <url date='June 30, 2008'> http://www.mozilla.org/js/spidermonkey </url>.
    </item>

    <item id='r-Hutton'>
      Graham Hutton,
      <i>Programming in Haskell</i>,
      Cambridge University Press, 2007.
    </item>

    <item id='r-Pascal'>
      Niklaus Wirth,
      <i>The Programming Language Pascal</i> (Revised Report),
      <url date='June 30, 2008'>
         http://www.standardpascal.org/The_Programming_Language_Pascal_1973.pdf
      </url>.
    </item>

    <item id='r-RFC'>
      <i>RFC-Editor Webpage</i>,
      <url date='June 30, 2008'> http://www.rfc-editor.org/ </url>.
    </item>

    <item id='r-Firefox'>
      <i>Firefox web browser</i>,
      <url date='June 30, 2008'> http://www.mozilla.com/firefox/ </url>.
    </item>

    <item id='r-Parsec'>
      Daan Leijen,
      <i>Parsec, a fast combinator parser</i>,
      <url date='June 30, 2008'>
        http://research.microsoft.com/users/daan/download/parsec/parsec.pdf
      </url>.
    </item>
  
    <item id='r-Pysec'>
      <i>Pysec: Monadic Combinatoric Parsing in Python</i>,
      <url date='June 30, 2008'>
        http://www.valuedlessons.com/2008/02/pysec-monadic-combinatoric-parsing-in.html 
      </url>.
      April 1, 2008
    </item>

    <item id='r-OtherMonads'>
      Monad (functional programming),
      <i>Wikipedia</i>,
      <url date='June 30, 2008'>
        http://en.wikipedia.org/wiki/Monads_in_functional_programming#External_links
      </url>.
    </item>

  </references>  
  
</document>