# 9. Evaluator

24 Aug 2013

As of March 2020, School of Haskell has been switched to read-only mode.

Now that we know how to parse arithmetic expressions, let's talk about evaluating our parse trees. First, let's implement the simplified evaluator which ignores symbolic variables. This turns out to be a very simple operation: you traverse the expression tree and evaluate every node. Our leaf nodes are literal numbers -- evaluation means extracting the number. For non-leaf nodes we first recurse into their children and then combine the results. Notice how recursive traversal fits very well with the recursive definition of the treee.

## The Easy Evaluator

The evaluator is implemented as a function `evaluate` that takes an expression tree and returns a number:

``evaluate :: Tree -> Double``

`Tree` has several constructors, so we'll pattern match on each of them. Let's start with the number node -- its value is just the number that was passed to its constructor:

``evaluate (NumNode x) = x``

The recursive nodes follow the following pattern: We call `evaluate` on each child to get the numbers and then perform a node-specific operation on them.

Here it is (You can run this code and do simple arithmetic. Try it!):

``````import Data.Char

data Operator = Plus | Minus | Times | Div
deriving (Show, Eq)

data Token = TokOp Operator
| TokAssign
| TokLParen
| TokRParen
| TokIdent String
| TokNum Double
| TokEnd
deriving (Show, Eq)

operator :: Char -> Operator
operator c | c == '+' = Plus
| c == '-' = Minus
| c == '*' = Times
| c == '/' = Div

tokenize :: String -> [Token]
tokenize [] = []
tokenize (c : cs)
| elem c "+-*/" = TokOp (operator c) : tokenize cs
| c == '='  = TokAssign : tokenize cs
| c == '('  = TokLParen : tokenize cs
| c == ')'  = TokRParen : tokenize cs
| isDigit c = number c cs
| isAlpha c = identifier c cs
| isSpace c = tokenize cs
| otherwise = error \$ "Cannot tokenize " ++ [c]

identifier :: Char -> String -> [Token]
identifier c cs = let (name, cs') = span isAlphaNum cs in
TokIdent (c:name) : tokenize cs'

number :: Char -> String -> [Token]
number c cs =
let (digs, cs') = span isDigit cs in
TokNum (read (c : digs)) : tokenize cs'

---- parser ----

data Tree = SumNode Operator Tree Tree
| ProdNode Operator Tree Tree
| AssignNode String Tree
| UnaryNode Operator Tree
| NumNode Double
| VarNode String
deriving Show

accept :: [Token] -> [Token]
accept [] = error "Nothing to accept"
accept (t:ts) = ts

expression :: [Token] -> (Tree, [Token])
expression toks =
let (termTree, toks') = term toks
in
(TokOp op) | elem op [Plus, Minus] ->
let (exTree, toks'') = expression (accept toks')
in (SumNode op termTree exTree, toks'')
TokAssign ->
case termTree of
VarNode str ->
let (exTree, toks'') = expression (accept toks')
in (AssignNode str exTree, toks'')
_ -> error "Only variables can be assigned to"
_ -> (termTree, toks')

term :: [Token] -> (Tree, [Token])
term toks =
let (facTree, toks') = factor toks
in
(TokOp op) | elem op [Times, Div] ->
let (termTree, toks'') = term (accept toks')
in (ProdNode op facTree termTree, toks'')
_ -> (facTree, toks')

factor :: [Token] -> (Tree, [Token])
factor toks =
(TokNum x)     -> (NumNode x, accept toks)
(TokIdent str) -> (VarNode str, accept toks)
(TokOp op) | elem op [Plus, Minus] ->
let (facTree, toks') = factor (accept toks)
in (UnaryNode op facTree, toks')
TokLParen      ->
let (expTree, toks') = expression (accept toks)
in
then error "Missing right parenthesis"
else (expTree, accept toks')
_ -> error \$ "Parse error on token: " ++ show toks

parse :: [Token] -> Tree
parse toks = let (tree, toks') = expression toks
in
if null toks'
then tree
else error \$ "Leftover tokens: " ++ show toks'

---- evaluator ----
-- show

evaluate :: Tree -> Double
evaluate (SumNode op left right) =
let lft = evaluate left
rgt = evaluate right
in
case op of
Plus  -> lft + rgt
Minus -> lft - rgt

evaluate (ProdNode op left right) =
let lft = evaluate left
rgt = evaluate right
in
case op of
Times -> lft * rgt
Div   -> lft / rgt

evaluate (UnaryNode op tree) =
let x = evaluate tree
in case op of
Plus  -> x
Minus -> -x

evaluate (NumNode x) = x

-- dummy implementation
evaluate (AssignNode str tree) = evaluate tree

-- dummy implementation
evaluate (VarNode str) = 0

main = (print . evaluate . parse . tokenize) "x1 = -15 / (2 + x2)"``````

I implemented `main` as a pipeline of functions, each passing its result to the next one. Notice that functions compose right to left reflecting the fact that they take arguments on the right and return results on the left.

How are we going to implement symbolic variables in our calculator? It seems obvious that `evaluate` acting on an assignment node must have a side effect -- it must modify the value of a variable. And this new value must be visible during the rest of the session, whenever the corresponding `VarNode` accesses it.

In functional programming we shun side effects so we have to use a different implementation strategy: we will thread the symbol table through every call to `evaluate`.

``evaluate :: Tree -> SymTab -> (Double, SymTab)``

When `evaluate` needs to modify the symbol table (add new or modify an existing variable), it constructs a new copy of the symbol table, with the modification included. It then returns that copy. We've seen this idiom already in the implementation of the parser -- we were threading the token list in a similar manner.

If you're worried about the performance impact of copying potentially large data structures, know that a properly implemented symbol table has its copy/modify operation comparable in cost with a modification of a mutable symbol table. I mentioned earlier that in functional languages you use persistent data structures, which can be cloned by modifying just a handful of pointers, internally. Incidentally, the use of persistent data structures has spread to imperative languages as well because they can be easily made thread safe. We'll use such a persistent data structure for our symbol table. (Keep in mind, though, that if you are 100% sure that the table is the bottleneck of some algorithm, you can always switch to `Data.HashTable` library.)

We'll be using an efficient implementation of a balanced binary tree for our symbol table, taken from the `Data.Map` library. There is a slight problem when using this library: it defines a function `lookup` that has the same name as the Prelude function that operates on lists. Since we want to call `lookup` with a map, we have to resolve this name conflict. The standard way to do this is to use `Data.Map` as a namespace by importing it as `qualified`:

``import qualified Data.Map``

To access `lookup` from `Data.Map`, we would have to qualify it as `Data.Map.lookup`. Since this is pretty verbose, we'll provide an alias `M` for this namespace:

``import qualified Data.Map as M``

Any name from `Data.Map` will now have to be prefixed with `M.`. In particular, we'll be able to call `M.lookup` without the fear of conflict.

Our symbol table is a `Map` that is specialized for strings as keys and doubles as values (notice how we qualify it as `M.Map`).

``type SymTab = M.Map String Double``

By the way, `Map` requires that the keys be comparable (they have to be instances of the `Cmp` type class, of which later). It so happens that our keys are `String`s, which are (lexicographically) comparable.

You may create an empty `Map` by calling the function `M.empty` or, if you want to pre-fill it, use the `M.fromList` instead:

``````M.fromList [("pi", pi), ("e", exp 1)]
``````

As you can see, `fromList` accepts a list of key/value pairs. The value `pi` and the function `exp` are defined in the Prelude.

Notice that no constructors of `Map` are available to the client; only functions `empty`, `fromList`, and `singleton` (the latter to create a single-element list). That's because the Haskell module system (of which soon) supports data hiding, which lets the creator of the library select what to export and what to keep private (here, the constructors of `Map`).

The symbol table must not only be threaded through all recursive invocations of `eval`. It must also be passed between invocations of the main tokenize/parse/evaluate loop. We want user-defined variables to store values throughout the session. Here's the modified top-level of our calculator that ensures the longevity of the symbol table:

``````main = do
loop (M.fromList [("pi", pi)])

loop symTab = do
str <- getLine
if null str
then
return ()
else
let toks = tokenize str
tree = parse toks
(val, symTab') = evaluate tree symTab
in do
print val
loop symTab'``````

We pass `symTab` to `evaluate` and obtain a new symbol table `symTab'`, together with the result. We then pass this modified symbol table, with user defined and modified symbolic variables, to the next iteration of the loop.

### Symbol Table Lookup

For uniformity, we will encapsulate both the symbol table lookup and insertion in functions that have a very similar signature to `evaluate` itself:

``evaluate :: Tree -> SymTab -> (a, SymTab)``

For generality, I replaced `Double` with a type parameter `a`. This will come in handy when defining `addSymbol`.

Here's the lookup (we substituted `String` for `Tree` in the signature):

``````lookUp :: String -> SymTab -> (Double, SymTab)
lookUp str symTab =
case M.lookup str symTab of
Just v -> (v, symTab)
Nothing -> error \$ "Undefined variable " ++ str``````

We are calling the function `M.lookup` from the `Map` package. This function may fail if the key is not present in the map. So let's talk about failure.

Programmers come up with lots of ways of implementing failure -- often very hacky. For instance you may return a "special" value, like -1, an empty string, or a null pointer. The problem with special values is that it's easy to forget to check for them in the calling code. And cases of inadvertently dereferencing a null pointer or accessing the minus-first element of an array are endemic.

In Haskell there are many ways of dealing with failure or, as it's sometimes called, with partial functions -- functions that are not defined for all possible arguments. The simplest such method is to return a `Maybe` value. `Maybe` is defined in the Prelude as:

``data Maybe a = Nothing | Just a``

It works for any type `a`, so it can be used to encapsulate any result. What's important is that you can hardly forget to check the result of a function returning a `Mabye`, because, in order to get at the value inside it, you are virtually forced to do pattern matching. We've done that in the implementation of `lookUp`: we pattern matched on the constructor `Just` to retrieve the value of the type `Double`; and we pattern-matched on `Nothing` to display an error. We'll implement a more sophisticated error checking later but for now, exiting with an error message will do.

You may find versions of `Mabye` in other languanges as well: `boost::optional` in C++, nullable in C#, `option` in ML, and so on.

### Symbol Table Update

Here's the function `addSymbol`, which inserts a new string/value pair into the symbol table. The function `M.insert` will overwrite the previous pair, if the key already exists in the map. That's why we can use `addSymbol` for both creating new symbolic variables and modifying them later.

``````addSymbol :: String -> Double -> SymTab -> ((), SymTab)
let symTab' = M.insert str val symTab
in ((), symTab')``````

Notice that, since there is no interesting value to be returned from `addSymbol`, I replaced `a` in the generic type signature with unit `()` (a.k.a., `void` in C-like languages).

As I explained before, the symbol table is not mutable: `insert` returns a new copy of it, which we then return from `addSymbol`, so it can be threaded through the rest of the code.

Why did I choose the symbol-table-threading interface for `lookUp` and `addSymbol`? It's because I have bigger unification plans for them. You might have noticed that the code is becoming -- seemingly unnecessarily -- complicated because of Haskell's insistence on purity and immutability. Why can't Haskell compromise like other functional languages? THe answer is that Haskell doesn't have to. I'm only introducing this complexity -- this imperative rot -- to later show you the ways you can get rid of it. In Haskell you can have the cake and eat it too -- It's called monads. In particular, this kind of threading of state from function to function can be easily hidden from view by using the appropriate monad. But monads are not magic! By the time we are through with the complex implementation, you'll know exactly what a monad hides under the hood. So bear with me a little longer.

### Implementing evaluate

The rest of the implementation of the evaluator is pretty straightforward. You just have to remember that every call might return a modified symbol table, and that you have to pass it to subsequent calls. Here's `evaluate` for the `SumNode`:

``````evaluate :: Tree -> SymTab -> (Double, SymTab)

evaluate (SumNode op left right) symTab =
let (lft, symTab')  = evaluate left symTab
(rgt, symTab'') = evaluate right symTab'
in
case op of
Plus  -> (lft + rgt, symTab'')
Minus -> (lft - rgt, symTab'')``````

The evaluators for `VarNode` and the `AssignNote` are easily implemented in terms of the primitives `lookUp` and `addSymbol`:

``````evaluate (VarNode str symTab) = lookUp str symTab

evaluate (AssignNode str tree) symTab =
let (v, symTab')  = evaluate tree symTab
(_, symTab'') = addSymbol str v symTab'
in (v, symTab'')``````

The complete implementation of the calculator is at the end of this tutorial.

## Module System

The calculator project is small enough to fit in a single file. Nevertheless, it's a good idea to split it into multiple modules. I have posted the multi-module version of the project on my github, where you can view or clone it. In that version I put lexer, parser, and evaluator in separate modules/files. There is also a main module with `main` and the main `loop`.

So far we've been only seeing library modules through `import` statements. Now we can see how these modules are created. A module corresponds to a file -- the name of the file (minus the extension .hs) is also the name of the module -- the compiler enforces it, except for single-file projects. The contents of each file starts with a module header, which contains the name of the module and an optional list of exports. If the exports are not listed, all top-level names are automatically exported.

Here's the header of the `Lexer` module from the file `Lexer.hs`.

``module Lexer (Operator(..), Token(..), tokenize, lookAhead, accept) where``

In parentheses, you see the list of exported definitions of data types and functions. When exporting a data type, you have the option to export none, some, or all of its constructors. For instance, I could have exported only `Plus` and `Minus` constructors of `Operator`, and no constructors of `Token`:

``module Lexer (Operator(Plus, Minus), Token, tokenize, lookAhead, accept) where``

If constructors are not exported, you can still use the type in other modules that import it, but you can't create or pattern match values of that type (that's exactly what to module `Data.Map` did).

It's common to export all constructors of a given type, and that's what the double-dot shortcut is for. Here we have `Operator(..)` and `Token(..)`.

When another module imports the module `Lexer`, all exported definitions become visible (unless you use selective imports, as below). Importing is not transitive, though. If `Parser` imports `Lexer` and `Main` imports `Parser`, the definitions from `Lexer` are not visible in `Main`. That's why the module `Main` had to import all of `Parser`, `Lexer`, and `Evaluator`:

``````module Main where

import qualified Data.Map as M
import Lexer (tokenize)
import Parser (parse)
import Evaluator (evaluate)``````

In `Main`, we only need one function from each module so we use selective imports by specifying a (comma-separated, if more than one) lists of imported declarations.

The module system of Haskell is much cleaner than, for instance, that of C++ or Java. Also, you don't need separate header files -- Haskell parser can quickly scan files for exports and imports.

## Expression Problem

In this particular project we've been dealing, so far, with multiple expression nodes and just one function, `evaluate`. In principle, though, we could come up with many other functions acting on nodes (see Ex 2). You can imagine a two-dimensional matrix of possibilites: the nodes on one axis, the functions on another. An interesting question is, how easy it is to extend this matrix. How easy it is to add more node types, and how easy it is to define new functions.

If we were implementing the evaluator in an object-oriented language, chances are we would define an interface `Node` with a virtual function `evaluate`. All specific nodes would be described by classes that inherit from `Node`. Our `evaluate` would then be overridden separately for each type of node.

Suppose you want to write an extensible library based on this scheme. In Haskell it would be easy for the clients of our library to implement new functions acting on nodes. Adding new node constructors, on the other hand, would require modifying the definition of `Tree` and adding a new pattern to all existing functions that operate on `Tree`, including the ones defined in the library module. Our library would be open to new functions but closed to new variations of data.

In an object oriented language, the opposite is true: adding a function requires the modification of the top `Node` and all its descendants -- a new virtual function has to be added to `Node`, and all its overrides in existing nodes have to be implemented. You can't do that without touching the library. However, adding a new node is easy: you just define a new class that inherits from `Node` and implement its virtual functions. You don't need to touch the library in order to add new nodes. Object-oriented libraries are open to new data types, but closed to new functions.

This tradeoff is a symptom of the well known expression problem faced by library writes. Fortunately, Haskell offers an elegant solution to the expression problem, which I will describe in the next tutorial (it requires type classes).

## Exercises

Ex 1. Implement function `translate`, which takes a dictionary and a list of strings and returns a list of translated strigs. If a string is not in a dictionary, it should be replaced with "whatchamacallit". For bonus points, try using the higher order `map` function from the Prelude, and the `where` clause. Remember that a function defined inside `where` has access to the arguments of the outer function.

``````import qualified Data.Map as M

type Dict = M.Map String String

translate :: Dict -> [String] -> [String]
translate = undefined

testTranslation :: Dict -> IO ()
testTranslation dict = do
print \$ translate dict ["where", "is", "the", "colosseum"]

testInsertion :: Dict -> IO Dict
testInsertion dict = do
return \$ M.insert "colosseum" "colosseo" dict

main =
let dict = M.fromList [("where", "dove"), ("is", "e"), ("the", "il")]
in do
testTranslation dict
dict'  <- testInsertion dict
testTranslation dict'
putStrLn "The original dictionary is unchanged:"
testTranslation dict``````

Ex 2. Implement function `paren` that takes an expression tree and turns it into a string with fully parenthesized expression. For instance, when acting on `testExpr` it should produce the string `(x = ((2.0 * (y = 5.0)) + 3.0))`

``````data Operator = Plus | Minus | Times | Div
deriving (Show, Eq)

data Tree = SumNode Operator Tree Tree
| ProdNode Operator Tree Tree
| AssignNode String Tree
| UnaryNode Operator Tree
| NumNode Double
| VarNode String
deriving Show

paren :: Tree -> String
paren = undefined

-- x = 2 * (y = 5) + 3
testExpr = AssignNode "x" (SumNode Plus
(ProdNode Times
(NumNode 2.0)
(AssignNode "y" (NumNode 5)))
(NumNode 3))

main = print \$ paren testExpr``````

Ex 3. The code below creates a frequency map of words in a given text. Fill in the implementation of `indexWords`, which counts the frequency of each word in a list of words; and `splitWords`, which splits a string into (lowercased) words, removing punctuation and newlines in the process. You might want to use the function `findWithDefault` from `Data.Map` and the function `words` from the Prelude. For bonus points try using `map` and `foldl`.

``````{-# START_FILE main.hs #-}
import qualified Data.Map as M
import Data.Char (toLower)
import Data.List (sortBy)

type Index = M.Map String Int

indexWords ::  Index -> [String] -> Index
indexWords = undefined

splitWords :: String -> [String]
splitWords = undefined

mostFrequent :: [String] -> [(String, Int)]
mostFrequent wrds =
let index = indexWords M.empty wrds
in take 9 (sortBy cmpFreq (M.toList index))
where
cmpFreq :: (String, Int) -> (String, Int) -> Ordering
cmpFreq (w1, n1) (w2, n2) = compare n2 n1

main = do
print \$ mostFrequent (splitWords text)

{-# START_FILE moby.txt #-}
Call me Ishmael. Some years ago - never mind how long precisely - having little
or no money in my purse, and nothing particular to interest me on shore, I thought
I would sail about a little and see the watery part of the world. It is a way
I have of driving off the spleen, and regulating the circulation. Whenever I find
myself growing grim about the mouth; whenever it is a damp, drizzly November
in my soul; whenever I find myself involuntarily pausing before coffin warehouses,
and bringing up the rear of every funeral I meet; and especially whenever
my hypos get such an upper hand of me, that it requires a strong moral principle
to prevent me from deliberately stepping into the street, and methodically
knocking people's hats off - then, I account it high time to get to sea as soon as I can.``````

## The Symbolic Calculator So Far

Below is the source code for the whole project:

``````{-# START_FILE Main.hs #-}
module Main where

import qualified Data.Map as M
import Lexer (tokenize)
import Parser (parse)
import Evaluator (evaluate)

main = do
loop (M.fromList [("pi", pi), ("e", exp 1.0)])

loop symTab = do
str <- getLine
if null str
then
return ()
else
let toks = tokenize str
tree = parse toks
(val, symTab') = evaluate tree symTab
in do
print val
loop symTab'
{-# START_FILE Lexer.hs #-}
module Lexer (Operator(..), Token(..), tokenize, lookAhead, accept) where

import Data.Char

data Operator = Plus | Minus | Times | Div
deriving (Show, Eq)

data Token = TokOp Operator
| TokAssign
| TokLParen
| TokRParen
| TokIdent String
| TokNum Double
| TokEnd
deriving (Show, Eq)

accept :: [Token] -> [Token]
accept [] = error "Nothing to accept"
accept (t:ts) = ts

tokenize :: String -> [Token]
tokenize [] = []
tokenize (c : cs)
| elem c "+-*/" = TokOp (operator c) : tokenize cs
| c == '='  = TokAssign : tokenize cs
| c == '('  = TokLParen : tokenize cs
| c == ')'  = TokRParen : tokenize cs
| isDigit c = number c cs
| isAlpha c = identifier c cs
| isSpace c = tokenize cs
| otherwise = error \$ "Cannot tokenize " ++ [c]

identifier :: Char -> String -> [Token]
identifier c cs = let (name, cs') = span isAlphaNum cs in
TokIdent (c:name) : tokenize cs'

number :: Char -> String -> [Token]
number c cs =
let (digs, cs') = span isDigit cs in
TokNum (read (c : digs)) : tokenize cs'

operator :: Char -> Operator
operator c | c == '+' = Plus
| c == '-' = Minus
| c == '*' = Times
| c == '/' = Div
{-# START_FILE Parser.hs #-}
module Parser (Tree(..), parse)where

import Lexer

data Tree = SumNode Operator Tree Tree
| ProdNode Operator Tree Tree
| AssignNode String Tree
| UnaryNode Operator Tree
| NumNode Double
| VarNode String
deriving Show

parse :: [Token] -> Tree
parse toks = let (tree, toks') = expression toks
in
if null toks'
then tree
else error \$ "Leftover tokens: " ++ show toks'

expression :: [Token] -> (Tree, [Token])
expression toks =
let (termTree, toks') = term toks
in
(TokOp op) | elem op [Plus, Minus] ->
let (exTree, toks'') = expression (accept toks')
in (SumNode op termTree exTree, toks'')
TokAssign ->
case termTree of
VarNode str ->
let (exTree, toks'') = expression (accept toks')
in (AssignNode str exTree, toks'')
_ -> error "Only variables can be assigned to"
_ -> (termTree, toks')

term :: [Token] -> (Tree, [Token])
term toks =
let (facTree, toks') = factor toks
in
(TokOp op) | elem op [Times, Div] ->
let (termTree, toks'') = term (accept toks')
in (ProdNode op facTree termTree, toks'')
_ -> (facTree, toks')

factor :: [Token] -> (Tree, [Token])
factor toks =
(TokNum x)     -> (NumNode x, accept toks)
(TokIdent str) -> (VarNode str, accept toks)
(TokOp op) | elem op [Plus, Minus] ->
let (facTree, toks') = factor (accept toks)
in (UnaryNode op facTree, toks')
TokLParen      ->
let (expTree, toks') = expression (accept toks)
in
then error "Missing right parenthesis"
else (expTree, accept toks')
_ -> error \$ "Parse error on token: " ++ show toks

{-# START_FILE Evaluator.hs #-}
module Evaluator (evaluate) where

import Lexer
import Parser
import qualified Data.Map as M

type SymTab = M.Map String Double

evaluate :: Tree -> SymTab -> (Double, SymTab)

evaluate (SumNode op left right) symTab =
let (lft, symTab') = evaluate left symTab
(rgt, symTab'') = evaluate right symTab'
in
case op of
Plus  -> (lft + rgt, symTab'')
Minus -> (lft - rgt, symTab'')

evaluate (ProdNode op left right) symTab =
let (lft, symTab') = evaluate left symTab
(rgt, symTab'') = evaluate right symTab'
in
case op of
Times -> (lft * rgt, symTab)
Div   -> (lft / rgt, symTab)

evaluate (UnaryNode op tree) symTab =
let (x, symTab') = evaluate tree symTab
in case op of
Plus  -> (x, symTab')
Minus -> (-x, symTab')

evaluate (NumNode x) symTab = (x, symTab)

evaluate (VarNode str) symTab = lookUp str symTab

evaluate (AssignNode str tree) symTab =
let (v, symTab')  = evaluate tree symTab
(_, symTab'') = addSymbol str v symTab'
in (v, symTab'')

lookUp :: String -> SymTab -> (Double, SymTab)
lookUp str symTab =
case M.lookup str symTab of
Just v -> (v, symTab)
Nothing -> error \$ "Undefined variable " ++ str

addSymbol :: String -> Double -> SymTab -> ((), SymTab)