Now that we are done with the [preliminaries](https://www.fpcomplete.com/user/bartosz/basics-of-haskell/3-pure-functions-laziness-io), I'd like to show you how to design and develop a small application -- a symbolic calculator. It's a console application: the user types in expressions that are evaluated, and the results are displayed. To make it more interesting, the calculator supports symbolic variables that can be assigned and re-assigned and used in expressions. Here's an example of a user session: ![Calculator session](http://www.bartosz.com/images/SoH/Calc.png) In fact you can run the calculator right here, on the spot: ``` active haskell import Text.Parsec import Text.Parsec.String import Text.Parsec.Token import Text.Parsec.Expr import Text.Parsec.Language import qualified Data.Map as M import qualified Control.Monad.State as S import Control.Monad.Error import Control.Monad.Identity -- Lexer def = emptyDef { identStart = letter , identLetter = alphaNum , opStart = oneOf "+-*/=" , opLetter = oneOf "+-*/=" } lexer :: TokenParser () lexer = makeTokenParser def -- Expression tree data Expression = Constant Double | Identifier String | Addition Expression Expression | Subtraction Expression Expression | Multiplication Expression Expression | Division Expression Expression | Negation Expression | Assignment Expression Expression deriving Show -- Parser parseNumber :: Parser Expression parseNumber = do v <- naturalOrFloat lexer case v of Left i -> return $ Constant $ fromIntegral i Right n -> return $ Constant n parseIdentifier :: Parser Expression parseIdentifier = do i <- identifier lexer return $ Identifier i parseExpression :: Parser Expression parseExpression = (flip buildExpressionParser) parseTerm [ [ Prefix (reservedOp lexer "-" >> return Negation) , Prefix (reservedOp lexer "+" >> return id) ] , [ Infix (reservedOp lexer "*" >> return Multiplication) AssocLeft , Infix (reservedOp lexer "/" >> return Division) AssocLeft ] , [ Infix (reservedOp lexer "+" >> return Addition) AssocLeft , Infix (reservedOp lexer "-" >> return Subtraction) AssocLeft ] , [ Infix (reservedOp lexer "=" >> return Assignment) AssocRight ] ] parseTerm :: Parser Expression parseTerm = parens lexer parseExpression <|> parseNumber <|> parseIdentifier parseInput :: Parser Expression parseInput = do whiteSpace lexer ex <- parseExpression eof return ex -- Evaluator type SymTab = M.Map String Double type Evaluator a = S.StateT SymTab (ErrorT String Identity) a runEvaluator :: Evaluator Double -> SymTab -> Either String (Double, SymTab) runEvaluator calc symTab = runIdentity $ runErrorT $ S.runStateT calc symTab eval :: Expression -> Evaluator Double eval (Constant x) = return x eval (Identifier i) = do symtab <- S.get case M.lookup i symtab of Nothing -> fail $ "Undefined variable " ++ i Just e -> return e eval (Addition eLeft eRight) = do lft <- eval eLeft rgt <- eval eRight return $ lft + rgt eval (Subtraction eLeft eRight) = do lft <- eval eLeft rgt <- eval eRight return $ lft - rgt eval (Multiplication eLeft eRight) = do lft <- eval eLeft rgt <- eval eRight return $ lft * rgt eval (Division eLeft eRight) = do lft <- eval eLeft rgt <- eval eRight return $ lft / rgt eval (Negation e) = do val <- eval e return $ -val eval (Assignment (Identifier i) e) = do val <- eval e S.modify (M.insert i val) return val eval (Assignment _ _) = fail "Left of assignment must be an identifier" defaultVars :: M.Map String Double defaultVars = M.fromList [ ("e", exp 1) , ("pi", pi) ] --runEvaluator returns Either String (Double, SymTab Double) calculate :: SymTab -> String -> (String, SymTab) calculate symTab s = case parse parseInput "" s of Left err -> ("error: " ++ (show err), symTab) Right exp -> case runEvaluator (eval exp) symTab of Left err -> ("error: " ++ err, symTab) Right (val, newSymTab) -> (show val, newSymTab) loop :: SymTab -> IO () loop symTab = do line <- getLine if null line then return () else do let (result, symTab') = calculate symTab line putStrLn result loop symTab' main = loop defaultVars -- show -- Enter expressions, one per line. Empty line to quit -- ``` This is not the implementation I'll be describing. I wrote this one using Haskell libraries such as Parsec (one of the standard parsing libraries) and several monad transformers. In this series of tutorials I'd like to implement the same functionality from scratch, so you'll be able to clearly see each step and learn the language in the process. ## The Design - At the very high level, the calculator is a loop that gets a line of text from the user and then calculates and displays the result. - The calculation is done is three steps: - Lexical analysis: The string is converted to tokens - Parsing: Building an expression tree - Evaluation: The expression is evaluated. In the first phase of implementation we won't worry about error handling and symbolic variables -- we'll add them later. Notice that there's nothing Haskell-specific in this design -- it's just a piece of good old software engineering. Some people worry that programming in Haskell means re-learning everything from scratch. This is based on seriously underestimating the amount of software engineering that is common to all programming tasks -- independent of the language. ## Designing with Types We haven't talked about types yet because, even though Haskell is a strongly typed language, it has a powerful type inference system. This feature makes quick prototyping easy. Sometimes you just want to write a function and not worry about types. The compiler will figure them out. (C++11 introduced a modicum of type inference with the keyword `auto`.) On the other hand, software *design* in Haskell often starts with types. Let's try this approach. ### 1. Lexical analyzer Lexical analyzer is implemented as a function `tokenize` that takes a string (of type `String`) and returns a list of tokens. We'll define the `Token` data type later. A list of tokens has the type `[Token]` -- the square brackets are used to create lists (both list *types*, like `[Int]`, and list *literals*, like `[1, 2, 3]`). Finally, a *function* type is constructed with an arrow `->` between the type of the argument and the type of the result (we'll get to multi-argument functions later). Putting all this together, we can write the Haskell type signature for the function `tokenize` as follows: ``` haskell tokenize :: String -> [Token] ``` This is read as: Tokenize is a function taking a string and returning a list of tokens. The double colon is used to introduce a type signature. Type names must always start with capital letters, as in `String` or `Double` (except for names constructed from special characters, like the list type, `[]`). ### 2. Parser The parsing function takes a list of tokens and produces an expression. We'll define the `Expression` type later. For now, this is the type of `parse`: ``` haskell parse :: [Token] -> Expression ``` ### 3. Evaluator We'll make `evaluate` take an `Expression` and return a value of the built in type `Double` (double precision floating point number). ``` haskell evaluate :: Expression -> Double ``` We can define dummy data types for `Token` and `Expression`, and dummy function bodies; and fire up the compiler to typecheck our design: ``` active haskell data Token data Expression tokenize :: String -> [Token] tokenize = undefined parse :: [Token] -> Expression parse = undefined evaluate :: Expression -> Double evaluate = undefined main :: IO () main = putStrLn "It works, so are we done yet?" ``` You might wonder how `undefined` plays with the type checker. It turns out that the type of `undefined` is the *bottom* of the type hierarchy, which means it can be implicitly converted to any type. For instance, in the definition of `tokenize` the type of `undefined` becomes the function type: `String->[Token]`. The type of `main` is always `IO ()`: the type of `IO` monadic action that produces no result (only side effects). The type `()` itself is called *unit* -- loosely corresponding to `void` in C-like languages. ## Recursion Our design calls for a loop that accepts user input and displays the results. All loops in Haskell are implemented either using recursion or using (higher-order) functions whose implementation uses recursion. You might be concerned about the performance or recursion and the possibility of blowing the stack -- in most cases this is not a problem since the compiler is able to turn most recursions into loops. This is called *tail recursion* optimization, where the recursive call at the very end of a function is simply turned into a goto to the beginning of the function. More serious performance concerns arise occasionally from Haskell's laziness but we'll talk about it later. With that in mind, we are ready to implement the top-level loop of our calculator: ``` active haskell main :: IO () main = do line <- getLine putStrLn line main ``` You can think of `main` as first calling `getLine`, storing the result in the variable `line`, then calling `putStrLn` with that `line`, and then calling itself again. This will create an infinite loop, but no stack will be hurt in the process, since this is a typical case of tail recursion. Of course, what really happens when the program is running is slightly different because of the `IO` monad and general laziness. So it would be more appropriate to say that `main` is an `IO` action that is a sequence of three other actions: the ones returned by `getLine`, `putStrLn`, and `main`. The last action, when the time comes to execute it, will produce three new actions, etc. But everything happens on the need to run basis, so the inner `main` will not be evaluated until the (blocking) action produced by `getLine` delivers its result. Thinking about recursion rather than looping might initially seem unnatural, but it quickly becomes second nature. The main reason Haskell doesn't use loops is because of immutability: Loops in imperative languages usually have some kind of mutable counter or a mutable pointer. It's relatively easy to replace those loops with recursion. But first we need to learn about conditionals: We have to be able to break out of recursion at some point. ### Conditional A conditional in Haskell is just a simple `if`, `then`, `else` construct. The `else` is mandatory. Anything between `if` and `then` is the condition (you don't even have to surround it with parentheses), and it must evaluate to a Boolean. You can pretty much use the familiar equality and comparison operators, `>`, `>=`, `<`, `<=`, `==`, to create Boolean values; except for the not-equal operator which is `/=`. You can also combine them using `&&` and `||` for logical *and* and *or*, and `not` for *not*. However, unlike in imperative languages, the Haskell if/then/else is not a statement but an expression (similar to C's (`?:`) construct). It evaluates to either the `then` or the `else` expression, both of which have to be of the same type. For instance: ``` active haskell main = do putStrLn "Enter a number" str <- getLine print (if read str >= 1 then 1 else 0) ``` Here, the if/then/else expression that is the argument to `print` evaluates to either 1 or 0. You might be wandering about the short-circuitting properties of if/then/else or the binary Boolean operators `&&` and `||`. In most languages the property of not evaluating the branch that is not taken has to be built into the language as a special feature. Otherwise constructs like: ``` x = (p != nullptr) ? *p : 0 ``` wouldn't work properly. In Haskell, short-circuiting is just the side effect of laziness. The branch not taken is never used, so it won't be evaluated. Several explanations are in order: I used the function `read` to turn a string into a value. It's an interesting function -- it's overloaded on the return type. Here, the compiler deduced that an integral value was needed because it was compared to another integral value, 1. (Try experimenting with this code by inputing a floating point number. Then change the 1 in the if clause to 1.0 and see if the behavior changes.) We are now ready to convert a simple imperative loop that prints numbers from 0 to 4 to Haskell. Here's the C++ loop: ``` for (int i = 0; i < 5; ++i) std::cout << i << std::endl ``` And here's its recursive counterpart written in Haskell: ``` active haskell loop :: Int -> IO () loop n = do if n < 5 then do putStrLn (show n) loop (n + 1) else return () main :: IO () main = loop 0 ``` The Haskell code looks straightforward, although it's more verbose than its C++ counterpart. That's because I made it control-driven -- which is closer to the imperative version -- rather than data-driven. In Haskell one should really try to think at a higher abstraction level. Here, the goal is to print a list of integers from 0 to 4, so it would be more natural to start with such a list: `[0, 1, 2, 3, 4]` or, using a handy shorthand, `[0..4]`; and apply a function to it. We'll see examples of this approach later. Let's talk about types: `loop` returns a "void" `IO` action, so both branches of the if must also return an `IO ()` action. The first branch is a sequence of two actions (hence the use of `do` in that branch), the last of which is indeed of the type `IO ()` (that's the result of calling `loop`). The second branch is more interesting. At first sight you might not even notice anything out of the ordinary: Well, it does return a unit value `()`, which is of the type unit `()`. But how does this value become an `IO ()` action? The trick is that `return` is not a built-in keyword, it's actually an important monadic function (every monad has it). The `return` function turns whatever value it's given into a monadic value: here it turns `()` into `IO ()`. It could also turn "Hello!" into `IO String`, etc. We'll see more examples of using `return` to "return" a value from a `do` block in the future. Also notice the use of the `Int` type -- it's a fixed precision integer. In the [next installment](https://www.fpcomplete.com/user/bartosz/basics-of-haskell/5-tokenizer-data-types) we'll start implementing the lexical analyzer and learn more about data types. ## Exercises **Ex 1**. Print squares of numbers from 1 to 10. ``` active haskell loop :: Int -> IO () loop n = undefined main :: IO () main = loop 1 ``` @@@ Show solution ``` active haskell loop :: Int -> IO () loop n = do if n <= 10 then do putStrLn (show (n * n)) loop (n + 1) else return () main :: IO () main = loop 1 ``` @@@ **Ex 2**. No exposition of recursion is complete without factorial. Use the following property: Factorial of n is n times the factorial of (n - 1), and the factorial of 0 is 1. ``` active haskell fact :: Int -> Int fact n = undefined main = print (fact 20) ``` @@@ Show solution ``` active haskell fact :: Int -> Int fact n = if n > 0 then n * fact (n - 1) else 1 main = print (fact 20) ``` @@@ **Ex 3**. The evaluation of factorial starts returning incorrect results right about n = 21 because of the `Int` overflow. Try implementing a version that uses the infinite precision `Integer` instead of `Int`. ``` active haskell fact :: Int -> Int fact n = if n > 0 then n * fact (n - 1) else 1 fullFact :: ... fullFact n = ... main = do print (fact 23) print (fullFact 23) ``` @@@ Show solution ``` active haskell fact :: Int -> Int fact n = if n > 0 then n * fact (n - 1) else 1 fullFact :: Integer -> Integer fullFact n = if n > 0 then n * fullFact (n - 1) else 1 main = do print (fact 23) print (fullFact 23) ``` @@@ **Ex 4**. No exposition of recursion is complete without Fibonacci numbers. (I'm using these mathematical examples because we haven't learned about data structures. In general, Haskell is not just about math.) Use the following property of Fibonacci numbers: The n'th Fibonacci number is the sum of the (n-1)'st and the (n-2)'nd, and the first and second Fibonacci numbers are both 1. ``` active haskell fib :: Int -> Int fib n = undefined main = print (fib 20) ``` @@@ Show solution ``` active haskell fib :: Int -> Int fib n = if n > 2 then fib (n - 1) + fib (n - 2) else 1 main = print (fib 20) ``` @@@