# Introduction

In this tutorial you will learn how to parse log-like files and how to
render a log to a file. Many applications use logs to keep track of some
useful information to be analysed later on. Parsing a log-like file it
is an easy parsing task in comparison with parsing, say, a programming
language, but it is an useful practice for a Haskell parser beginner.
Most of the code in this tutorial is editable and runnable, so take
advantage and experiment with the code yourself.

While log files do not have a specific format, we are going to output
them as CSV tables. An specification of CSV can be found in the
[RFC 4180](http://tools.ietf.org/html/rfc4180).

Among the many parser libraries in Haskell we have chosen
[_attoparsec_](http://hackage.haskell.org/package/attoparsec)
in this tutorial. Why? Firstly, because it is easy to use and secondly
because it is fast. The other popular choice is
[_parsec_](http://hackage.haskell.org/package/parsec).
_Parsec_ has a similar interface to _attoparsec_, but share also some differences.
For example, a parser in _parsec_ can be used as a monad transformer, allowing
you to add custom states. Also, when a parsing error arises, _parsec_
gives you a lot more information than _attoparsec_. The lack of these
features in _attoparsec_ is precisely what makes it faster.

# Writing a parser

Writing a parser involves _teaching_ our computer how to read something.
If a human see the string `"25"` it will quickly concludes that the string
contains a number. In fact, probably you read it as "twenty five" instead of
"two five". However, for the computer it is just a string of characters.
In Haskell, we would have to write a function from `String` (or `Text` or `ByteString`,
depending on the input type) to `Integer` in order to use it as a number.
This is what parsing means. But, how we accomplish such task?
Well, say that an application has sent to us the following `ByteString`:

``` haskell
"131.45.68.123"
```

It is the IP of a user that just connected to our server! In our code,
we have the following type definition:

``` haskell
import Data.Word

data IP = IP Word8 Word8 Word8 Word8 deriving Show
```

It is a type we defined for IP's. The `Word8` type represents 8-bit unsigned integer values.
Now it would be great if we could parse the input `131.45.68.123` to the value
`IP 131 45 68 123`. The first thing we look is how IP's are written. They follow this pattern:

* An 8-bit integer.
* A _dot_.
* An 8-bit integer.
* A _dot_.
* An 8-bit integer.
* A _dot_.
* An 8-bit integer.

When we write a parser in Haskell, what we actually do is following the pattern of the input format
from left to right. In this case, the function `parseIP` defines a parser for our type `IP` following
the pattern we just described. Note that the `decimal` parser succeeds for any unsigned integral number
(`Word8` in this example).

``` active haskell
{-# LANGUAGE OverloadedStrings #-}

-- This attoparsec module is intended for parsing text that is
-- represented using an 8-bit character set, e.g. ASCII or ISO-8859-15.
import Data.Attoparsec.Char8
import Data.Word

-- | Type for IP's.
data IP = IP Word8 Word8 Word8 Word8 deriving Show

parseIP :: Parser IP
parseIP = do
  d1 <- decimal
  char '.'
  d2 <- decimal
  char '.'
  d3 <- decimal
  char '.'
  d4 <- decimal
  return $ IP d1 d2 d3 d4

main :: IO ()
main = print $ parseOnly parseIP "131.45.68.123"
```

Note that the output of `parseOnly`, the function that applies the parser
`parseIP` to the input `"131.45.68.123"` returns a value of type
`Either String IP`. This is because parsing is not a _total_ function, meaning
that not every input has an output. For example, parsing the string
`"foo"` cannot result in any IP. As a consequence, the parser fails.
Each time the parser fails, it will return `Left str`, where `str`
is a value of type `String` describing the error (in _attoparsec_, not
very descriptive actually). If the parser ends successfuly, it will
return `Right x`, where `x` is the parsed value.

As you can see, the approach to define a parser is to use simpler parsers
and combine them write parsers for more complex expressions. In the following
example, you will see how to parse a log file, including IP's. We will re-use
the recently created parser.

# Parsing logs

In this section, we develop a parser for log files that mixes content of
different types. We use an example to guide the process.

## Step 1: Define types

Say we have an online shop where we sell computer items
like mouses, keyboards, monitors and speakers. Each time a product is sold,
our application saves some information in a log file, containing the time
when the product was sold, the IP of the client and the name of the product.
Each log entry may be represented by the following type:

``` haskell
import Data.Time

data Product = Mouse | Keyboard | Monitor | Speakers deriving Show

data LogEntry =
  LogEntry { -- A local time contains the date and the time of the day.
             -- For example: 2013-06-29 11:16:23.
             entryTime :: LocalTime
           , entryIP   :: IP
           , entryProduct   :: Product
             } deriving Show
```

The log file will therefore contain a list of elements of type `LogEntry`.

``` haskell
-- | Type synonym of a list of log entries.
type Log = [LogEntry]
```

## Step 2: Follow the syntax

The log file, or anything that we can parse, follows a specific syntax.
For example, here is our today log:

```
2013-06-29 11:16:23 124.67.34.60 keyboard
2013-06-29 11:32:12 212.141.23.67 mouse
2013-06-29 11:33:08 212.141.23.67 monitor
2013-06-29 12:12:34 125.80.32.31 speakers
2013-06-29 12:51:50 101.40.50.62 keyboard
2013-06-29 13:10:45 103.29.60.13 mouse
```

Each line contains a log entry. The idea is to write a parser for log entries,
and iterate it line by line to get the list of every log entry.
The elements contained in each entry would be of type `LocalTime`, `IP` and
`Product`. We have to write parsers for each one and combine them.
Fortunately, we already have a parser for IP's that we can re-use.
Let's write a parser for the time stamps.

We notice that the format followed in our log is:

```
yyyy-MM-dd hh:mm:ss
```

Following this specification, we can easily write the parser as follows.

``` active haskell
{-# LANGUAGE OverloadedStrings #-}

import Data.Time
import Data.Attoparsec.Char8

timeParser :: Parser LocalTime
timeParser = do
  y  <- count 4 digit
  char '-'
  mm <- count 2 digit
  char '-'
  d  <- count 2 digit
  char ' '
  h  <- count 2 digit
  char ':'
  m  <- count 2 digit
  char ':'
  s  <- count 2 digit
  return $
    LocalTime { localDay = fromGregorian (read y) (read mm) (read d)
              , localTimeOfDay = TimeOfDay (read h) (read m) (read s)
                }

main :: IO ()
main = print $ parseOnly timeParser "2013-06-30 14:33:29"
```

Note the use of `count` and `digit`. The parser `digit` will get the
following character, in case that this character is a digit, and will fail otherwise.
The combinator `count` repeats a parser a certain number of times. Since in our format,
a year is written with 4 characters, we use `count 4 digit` meaning _read 4 digits from the input_.
The same rationale applies to the rest of the code. At the end, we return a value of type
`LocalTime`.

## Parsing alternatives

Lastly, we need a parser for `Product` values. This one is even easier, but it also have something
new. A product is represented by a word. Each word is different, so there is no single syntax to
read. We have different choices. It is either `keyboard` or `mouse` or `monitor` or `speaker`.
This _or_, separating different alternatives, it is represented in attoparsec by the `<|>` combinator.
The `<|>` operator combines two parsers of the _same type_ in one that first tries to use the first
argument parser. If this one ends without failure, it returns its result. If it fails, it tries with
the second one, returning any result it gives.
This would be the `Product` parser:

``` active haskell
{-# LANGUAGE OverloadedStrings #-}

import Data.Attoparsec.Char8
import Control.Applicative

data Product = Mouse | Keyboard | Monitor | Speakers deriving Show

productParser :: Parser Product
productParser =
     (string "mouse"    >> return Mouse)
 <|> (string "keyboard" >> return Keyboard)
 <|> (string "monitor"  >> return Monitor)
 <|> (string "speakers" >> return Speakers)

main :: IO ()
main = do
  print $ parseOnly productParser "mouse"
  print $ parseOnly productParser "mouze"
  print $ parseOnly productParser "monitor"
  print $ parseOnly productParser "keyboard"
```

Note that we have to import the `Control.Applicative` module to use the `<|>` combinator.
Also note that when we try to parse `mouze` we get a cryptic error message (_not enough
bytes_) that does not say much about the parsing error. This is one trade-off of attoparsec
in order to get better performance than parsec. The API of parsec is very similar to the
one of attoparsec, but parsec reports much more information when a parsing error arises.

## Step 3: Combine small parsers to build a bigger one

It is time to combine our parsers into one that can read a whole log entry. We only have
to invoke them in order.

``` active haskell
{-# LANGUAGE OverloadedStrings #-}

import Data.Word
import Data.Time
import Data.Attoparsec.Char8
import Control.Applicative

-----------------------
-------- TYPES --------
-----------------------

-- | Type for IP's.
data IP = IP Word8 Word8 Word8 Word8 deriving Show

data Product = Mouse | Keyboard | Monitor | Speakers deriving Show

data LogEntry =
  LogEntry { entryTime :: LocalTime
           , entryIP   :: IP
           , entryProduct   :: Product
             } deriving Show

-----------------------
------- PARSING -------
-----------------------

-- | Parser of values of type 'IP'.
parseIP :: Parser IP
parseIP = do
  d1 <- decimal
  char '.'
  d2 <- decimal
  char '.'
  d3 <- decimal
  char '.'
  d4 <- decimal
  return $ IP d1 d2 d3 d4

-- | Parser of values of type 'LocalTime'.
timeParser :: Parser LocalTime
timeParser = do
  y  <- count 4 digit
  char '-'
  mm <- count 2 digit
  char '-'
  d  <- count 2 digit
  char ' '
  h  <- count 2 digit
  char ':'
  m  <- count 2 digit
  char ':'
  s  <- count 2 digit
  return $
    LocalTime { localDay = fromGregorian (read y) (read mm) (read d)
              , localTimeOfDay = TimeOfDay (read h) (read m) (read s)
                }

-- | Parser of values of type 'Product'.
productParser :: Parser Product
productParser =
     (string "mouse"    >> return Mouse)
 <|> (string "keyboard" >> return Keyboard)
 <|> (string "monitor"  >> return Monitor)
 <|> (string "speakers" >> return Speakers)
-- show
-- | Parser of log entries.
logEntryParser :: Parser LogEntry
logEntryParser = do
  -- First, we read the time.
  t <- timeParser
  -- Followed by a space.
  char ' '
  -- And then the IP of the client.
  ip <- parseIP
  -- Followed by another space.
  char ' '
  -- Finally, we read the type of product.
  p <- productParser
  -- And we return the result as a value of type 'LogEntry'.
  return $ LogEntry t ip p

----------------------
-------- TEST --------
----------------------

main :: IO ()
main = print $ parseOnly logEntryParser "2013-06-29 11:16:23 124.67.34.60 keyboard"
-- /show
```

In order to read the entire log file, we just need to iterate `logEntryParser` until
the end of the file is reached. The combinator `many` will perform a parser _zero_
or more times, returning a list of continuous successful parsings. It will stop whenever
the given parser fails. For example, `many digit` applied to the string `"123abc"` will
return `"123"` and will leave `"abc"` as remainding input. Also, `many digit` applied to
the string `"abc"` will return the empty list without consuming any input.

In conclusion, here is our log file parser.

``` haskell
type Log = [LogEntry]

logParser :: Parser Log
logParser = many $ logEntryParser <* endOfLine
```

The `endOfLine` parser succeeds only when the remaining input starts with an end
of line. The `<*` combinator applies the parser from the left, then the parser from
the right, and then returns the result of the first parser. We use it to get the result
from `logEntryParser` instead of `endOfLine`, which returns `()`.

## Full log file parser

``` active haskell
{-# START_FILE sellings.log #-}
2013-06-29 11:16:23 124.67.34.60 keyboard
2013-06-29 11:32:12 212.141.23.67 mouse
2013-06-29 11:33:08 212.141.23.67 monitor
2013-06-29 12:12:34 125.80.32.31 speakers
2013-06-29 12:51:50 101.40.50.62 keyboard
2013-06-29 13:10:45 103.29.60.13 mouse

{-# START_FILE Main.hs #-}
{-# LANGUAGE OverloadedStrings #-}

import Data.Word
import Data.Time
import Data.Attoparsec.Char8
import Control.Applicative
-- We import ByteString qualified because the function
-- 'Data.ByteString.readFile' would clash with
-- 'Prelude.readFile'.
import qualified Data.ByteString as B

-----------------------
------ SETTINGS -------
-----------------------

-- | File where the log is stored.
logFile :: FilePath
logFile = "sellings.log"

-----------------------
-------- TYPES --------
-----------------------

-- | Type for IP's.
data IP = IP Word8 Word8 Word8 Word8 deriving Show

data Product = Mouse | Keyboard | Monitor | Speakers deriving Show

-- | Type for log entries.
--   Add, remove of modify fields to fit your own log file.
data LogEntry =
  LogEntry { entryTime :: LocalTime
           , entryIP   :: IP
           , entryProduct   :: Product
             } deriving Show

type Log = [LogEntry]

-----------------------
------- PARSING -------
-----------------------

-- | Parser of values of type 'IP'.
parseIP :: Parser IP
parseIP = do
  d1 <- decimal
  char '.'
  d2 <- decimal
  char '.'
  d3 <- decimal
  char '.'
  d4 <- decimal
  return $ IP d1 d2 d3 d4

-- | Parser of values of type 'LocalTime'.
timeParser :: Parser LocalTime
timeParser = do
  y  <- count 4 digit
  char '-'
  mm <- count 2 digit
  char '-'
  d  <- count 2 digit
  char ' '
  h  <- count 2 digit
  char ':'
  m  <- count 2 digit
  char ':'
  s  <- count 2 digit
  return $
    LocalTime { localDay = fromGregorian (read y) (read mm) (read d)
              , localTimeOfDay = TimeOfDay (read h) (read m) (read s)
                }

-- | Parser of values of type 'Product'.
productParser :: Parser Product
productParser =
     (string "mouse"    >> return Mouse)
 <|> (string "keyboard" >> return Keyboard)
 <|> (string "monitor"  >> return Monitor)
 <|> (string "speakers" >> return Speakers)

-- | Parser of log entries.
logEntryParser :: Parser LogEntry
logEntryParser = do
  -- First, we read the time.
  t <- timeParser
  -- Followed by a space.
  char ' '
  -- And then the IP of the client.
  ip <- parseIP
  -- Followed by another space.
  char ' '
  -- Finally, we read the type of product.
  p <- productParser
  -- And we return the result as a value of type 'LogEntry'.
  return $ LogEntry t ip p

logParser :: Parser Log
logParser = many $ logEntryParser <* endOfLine

----------------------
-------- MAIN --------
----------------------

main :: IO ()
main = B.readFile logFile >>= print . parseOnly logParser
```

## Changes in the log

After some time logging our sales, we have the idea of adding a new field
to each log entry. We ask each customer how he/she found about us and keep
this information in our log. We happily update the logger but quickly notice
that the parser does not work anymore. Apart from changing the `LogEntry` type
we have to modify the parser to work with the new values. We allow our users
to specify the following options:

``` haskell
data Source = Internet | Friend | NoAnswer deriving Show
```

We would report `NoAnswer` in the case that our customer did not answered.
Quickly we write a parser very similar to `productParser`.

``` active haskell
{-# LANGUAGE OverloadedStrings #-}

import Data.Attoparsec.Char8
import Control.Applicative

data Source = Internet | Friend | NoAnswer deriving Show

sourceParser :: Parser Source
sourceParser =
      (string "internet" >> return Internet)
  <|> (string "friend" >> return Friend)
  <|> (string "noanswer" >> return NoAnswer)
  
main :: IO ()
main = print $ parseOnly sourceParser "internet"
```

After checking that this parser works, we add it to our `logEntryParser`,
upgrading the type definition of `LogEntry` adding the field `source`.

``` active haskell
{-# START_FILE sellings.log #-}
2013-06-29 16:40:15 154.41.32.99 monitor internet
2013-06-29 16:51:12 103.29.60.13 keyboard internet
2013-06-29 17:13:21 121.95.68.21 speakers friend
2013-06-29 18:20:10 190.80.70.60 mouse noanswer
2013-06-29 18:51:23 102.42.52.64 speakers friend
2013-06-29 19:01:11 78.46.64.23 mouse internet

{-# START_FILE Main.hs #-}
{-# LANGUAGE OverloadedStrings #-}

import Data.Word
import Data.Time
import Data.Attoparsec.Char8
import Control.Applicative
-- We import ByteString qualified because the function
-- 'Data.ByteString.readFile' would clash with
-- 'Prelude.readFile'.
import qualified Data.ByteString as B

-----------------------
------ SETTINGS -------
-----------------------

-- | File where the log is stored.
logFile :: FilePath
logFile = "sellings.log"

-----------------------
-------- TYPES --------
-----------------------

-- | Type for IP's.
data IP = IP Word8 Word8 Word8 Word8 deriving Show

data Product = Mouse | Keyboard | Monitor | Speakers deriving Show

data Source = Internet | Friend | NoAnswer deriving Show

-- show
data LogEntry =
  LogEntry { entryTime :: LocalTime
           , entryIP   :: IP
           , entryProduct   :: Product
             -- Addition of the 'Source' field
           , source    :: Source
             } deriving Show
-- /show

type Log = [LogEntry]

-----------------------
------- PARSING -------
-----------------------

-- | Parser of values of type 'IP'.
parseIP :: Parser IP
parseIP = do
  d1 <- decimal
  char '.'
  d2 <- decimal
  char '.'
  d3 <- decimal
  char '.'
  d4 <- decimal
  return $ IP d1 d2 d3 d4

-- | Parser of values of type 'LocalTime'.
timeParser :: Parser LocalTime
timeParser = do
  y  <- count 4 digit
  char '-'
  mm <- count 2 digit
  char '-'
  d  <- count 2 digit
  char ' '
  h  <- count 2 digit
  char ':'
  m  <- count 2 digit
  char ':'
  s  <- count 2 digit
  return $
    LocalTime { localDay = fromGregorian (read y) (read mm) (read d)
              , localTimeOfDay = TimeOfDay (read h) (read m) (read s)
                }

-- | Parser of values of type 'Product'.
productParser :: Parser Product
productParser =
     (string "mouse"    >> return Mouse)
 <|> (string "keyboard" >> return Keyboard)
 <|> (string "monitor"  >> return Monitor)
 <|> (string "speakers" >> return Speakers)

sourceParser :: Parser Source
sourceParser =
      (string "internet" >> return Internet)
  <|> (string "friend" >> return Friend)
  <|> (string "noanswer" >> return NoAnswer)

-- show
-- | Parser of log entries.
logEntryParser :: Parser LogEntry
logEntryParser = do
  t <- timeParser
  char ' '
  ip <- parseIP
  char ' '
  p <- productParser
  -- Addition of the 'Source' field
  char ' '
  s <- sourceParser
  --
  return $ LogEntry t ip p s
-- /show

logParser :: Parser Log
logParser = many $ logEntryParser <* endOfLine

----------------------
-------- MAIN --------
----------------------

main :: IO ()
main = B.readFile logFile >>= print . parseOnly logParser
```

### Making the changed parser compatible with the old format

However, this parser only works in the new data, and we do not
want to lose the information we gathered before. The solution
is to add an _optional_ field in the parser and, when no value
is found, return a default value (like `NoAnswer`). The `option`
attoparsec combinators has exactly this purpose.

``` active haskell
{-# START_FILE sellings.log #-}
2013-06-29 11:16:23 124.67.34.60 keyboard
2013-06-29 11:32:12 212.141.23.67 mouse
2013-06-29 11:33:08 212.141.23.67 monitor
2013-06-29 12:12:34 125.80.32.31 speakers
2013-06-29 12:51:50 101.40.50.62 keyboard
2013-06-29 13:10:45 103.29.60.13 mouse
2013-06-29 16:40:15 154.41.32.99 monitor internet
2013-06-29 16:51:12 103.29.60.13 keyboard internet
2013-06-29 17:13:21 121.95.68.21 speakers friend
2013-06-29 18:20:10 190.80.70.60 mouse noanswer
2013-06-29 18:51:23 102.42.52.64 speakers friend
2013-06-29 19:01:11 78.46.64.23 mouse internet

{-# START_FILE Main.hs #-}
{-# LANGUAGE OverloadedStrings #-}

import Data.Word
import Data.Time
import Data.Attoparsec.Char8
import Control.Applicative
-- We import ByteString qualified because the function
-- 'Data.ByteString.readFile' would clash with
-- 'Prelude.readFile'.
import qualified Data.ByteString as B

-----------------------
------ SETTINGS -------
-----------------------

-- | File where the log is stored.
logFile :: FilePath
logFile = "sellings.log"

-----------------------
-------- TYPES --------
-----------------------

-- | Type for IP's.
data IP = IP Word8 Word8 Word8 Word8 deriving Show

data Product = Mouse | Keyboard | Monitor | Speakers deriving Show

data Source = Internet | Friend | NoAnswer deriving Show

data LogEntry =
  LogEntry { entryTime :: LocalTime
           , entryIP   :: IP
           , entryProduct   :: Product
           , source    :: Source
             } deriving Show

type Log = [LogEntry]

-----------------------
------- PARSING -------
-----------------------

-- | Parser of values of type 'IP'.
parseIP :: Parser IP
parseIP = do
  d1 <- decimal
  char '.'
  d2 <- decimal
  char '.'
  d3 <- decimal
  char '.'
  d4 <- decimal
  return $ IP d1 d2 d3 d4

-- | Parser of values of type 'LocalTime'.
timeParser :: Parser LocalTime
timeParser = do
  y  <- count 4 digit
  char '-'
  mm <- count 2 digit
  char '-'
  d  <- count 2 digit
  char ' '
  h  <- count 2 digit
  char ':'
  m  <- count 2 digit
  char ':'
  s  <- count 2 digit
  return $
    LocalTime { localDay = fromGregorian (read y) (read mm) (read d)
              , localTimeOfDay = TimeOfDay (read h) (read m) (read s)
                }

-- | Parser of values of type 'Product'.
productParser :: Parser Product
productParser =
     (string "mouse"    >> return Mouse)
 <|> (string "keyboard" >> return Keyboard)
 <|> (string "monitor"  >> return Monitor)
 <|> (string "speakers" >> return Speakers)

sourceParser :: Parser Source
sourceParser =
      (string "internet" >> return Internet)
  <|> (string "friend" >> return Friend)
  <|> (string "noanswer" >> return NoAnswer)

-- show
-- | Parser of log entries.
logEntryParser :: Parser LogEntry
logEntryParser = do
  t <- timeParser
  char ' '
  ip <- parseIP
  char ' '
  p <- productParser
  -- Look for the field 'Source' and return
  -- a default value ('NoAnswer') when missing.
  -- The arguments of 'option' are default value
  -- followed by the parser to try.
  s <- option NoAnswer $ char ' ' >> sourceParser
  --
  return $ LogEntry t ip p s
-- /show

logParser :: Parser Log
logParser = many $ logEntryParser <* endOfLine

----------------------
-------- MAIN --------
----------------------

main :: IO ()
main = B.readFile logFile >>= print . parseOnly logParser
```

## Merging data from different logs

Our company is growing fast and we decide to open a new online shop based
in French to extend our customer range to Europe. However, after some time,
we note that our engineer in French is using a different log format.

```
154.41.32.99 29/06/2013 15:32:23 4 internet
76.125.44.33 29/06/2013 16:56:45 3 noanswer
123.45.67.89 29/06/2013 18:44:29 4 friend
100.23.32.41 29/06/2013 19:01:09 1 internet
151.123.45.67 29/06/2013 20:30:13 2 internet
```

It seems that each log entry stores the information in the following order:

* IP.
* Date (in a different format).
* A number representing the product sold.
* The "how you knew from us" field that we called Source before.

Therefore, our new `logEntryParser2` must parse the input in that order.
We note that the date is in a different order (in most Europe countries
is usual to write the day before the month) and is separated by the
`/` symbol instead of `-`. Also, they are using ID's to identify products
instead of writing the whole name.

### Step 1: Write the new parser

Firstly, we write functions to get the ID from a `Product` and viceversa.
Deriving an `Enum` instance for `Product` gives us an automatic implementation of
the methods `toEnum` and `fromEnum`. These functions are a correspondence between
a subset of the integers (type `Int`) and our type (`Product` in this case).
The automatic derivation associates the integer `0` to the first constructor, `1` to the
second, `2` to the third, and so on. Therefore, we can define functions `product(To/From)ID`
as follows.

``` active haskell
-- | Different kind of products are numbered from 1 to 4, in the given
--   order.
data Product = Mouse | Keyboard | Monitor | Speakers deriving (Enum,Show)

productFromID :: Int -> Product
productFromID n = toEnum (n-1)

productToID :: Product -> Int
productToID p = fromEnum p + 1

main :: IO ()
main = do
  print $ productFromID 1
  print $ productFromID 3
  print $ productToID Keyboard
  print $ productToID $ productFromID 4
```

A parser of products would accept a single digit and will apply `productFromID`
to get the `Product` result.

``` active haskell
{-# LANGUAGE OverloadedStrings #-}

import Data.Attoparsec.Char8
import Control.Applicative

data Product = Mouse | Keyboard | Monitor | Speakers deriving (Enum,Show)

productFromID :: Int -> Product
productFromID n = toEnum (n-1)

-- show
productParser2 :: Parser Product
productParser2 = productFromID . read . (:[]) <$> digit

main :: IO ()
main = print $ parseOnly productParser2 "4"
-- /show
```

The `entryTime` field also needs a new parser. The process, however, is equivalent
to the previous one. We just need to parse the input in a different order and use
the new delimiters.

``` active haskell
{-# LANGUAGE OverloadedStrings #-}

import Data.Time
import Data.Attoparsec.Char8

timeParser2 :: Parser LocalTime
timeParser2 = do
  d  <- count 2 digit
  char '/'
  mm <- count 2 digit
  char '/'
  y  <- count 4 digit
  char ' '
  h  <- count 2 digit
  char ':'
  m  <- count 2 digit
  char ':'
  s  <- count 2 digit
  return $
    LocalTime { localDay = fromGregorian (read y) (read mm) (read d)
              , localTimeOfDay = TimeOfDay (read h) (read m) (read s)
                }

main :: IO ()
main = print $ parseOnly timeParser2 "29/06/2013 15:32:23"
```

The rest of the fields are unchanged, so we are ready to write the full parser
of the new log entries. Again, this is just invoking the defined parsers in the
correct order.

``` active haskell
{-# LANGUAGE OverloadedStrings #-}

import Data.Word
import Data.Time
import Data.Attoparsec.Char8
import Control.Applicative

-----------------------
-------- TYPES --------
-----------------------

-- | Type for IP's.
data IP = IP Word8 Word8 Word8 Word8 deriving Show

data Product = Mouse | Keyboard | Monitor | Speakers deriving (Show,Enum)

productFromID :: Int -> Product
productFromID n = toEnum (n-1)

data Source = Internet | Friend | NoAnswer deriving Show

data LogEntry =
  LogEntry { entryTime :: LocalTime
           , entryIP   :: IP
           , entryProduct   :: Product
           , source    :: Source
             } deriving Show

type Log = [LogEntry]

-----------------------
------- PARSING -------
-----------------------

-- | Parser of values of type 'IP'.
parseIP :: Parser IP
parseIP = do
  d1 <- decimal
  char '.'
  d2 <- decimal
  char '.'
  d3 <- decimal
  char '.'
  d4 <- decimal
  return $ IP d1 d2 d3 d4

timeParser2 :: Parser LocalTime
timeParser2 = do
  d  <- count 2 digit
  char '/'
  mm <- count 2 digit
  char '/'
  y  <- count 4 digit
  char ' '
  h  <- count 2 digit
  char ':'
  m  <- count 2 digit
  char ':'
  s  <- count 2 digit
  return $
    LocalTime { localDay = fromGregorian (read y) (read mm) (read d)
              , localTimeOfDay = TimeOfDay (read h) (read m) (read s)
                }

productParser2 :: Parser Product
productParser2 = productFromID . read . (:[]) <$> digit

sourceParser :: Parser Source
sourceParser =
      (string "internet" >> return Internet)
  <|> (string "friend" >> return Friend)
  <|> (string "noanswer" >> return NoAnswer)

-- show
logEntryParser2 :: Parser LogEntry
logEntryParser2 = do
  ip <- parseIP
  char ' '
  t <- timeParser2
  char ' '
  p <- productParser2
  char ' ' 
  s <- sourceParser
  return $ LogEntry t ip p s
  
main :: IO ()
main = print $ parseOnly logEntryParser2 "54.41.32.99 29/06/2013 15:32:23 4 internet"
-- /show
```

Once we have a function to read log entries we do the same as above to iterate the
parser line by line through the log file.

``` haskell
logParser2 :: Parser Log
logParser2 = many $ logEntryParser2 <* endOfLine
```

### Step 2: Merge both logs conserving order

Currently we have two log files, but we want all the data together.
The proposed solution is to parse one file, parse the other file, and
merge both of them. The merging can be done since both parsers have
the same _type_ of output (`Log`). A `Log` is a list of log entries, so
we could just append both lists and we will have all the data together.
However, since both files are sorted by `entryTime`, it would be much nicer
if the merged file is also sorted by `entryTime`.

Given two sorted lists, it is easy to merge them into one sorted list
in _linear time_. This is the procedure used to merge in the _mergesort_ algorithm.

``` active haskell
merge :: Ord a => [a] -> [a] -> [a]
merge xs [] = xs
merge [] ys = ys
merge (x:xs) (y:ys) =
  if x <= y
     then x : merge xs (y:ys)
     else y : merge (x:xs) ys

main :: IO ()
main = print $ merge [1,3,5,7] [2,4,6,8]
```

To use `merge`, the elements of the list must be of a type instance of the
`Ord` class. `Log` is a list of `LogEntry`, so we have to write an `Ord`
instance for `LogEntry`. We use `entryTime` as a reference to compare different
log entries, since our interest is to sort log entries by time.

``` haskell
instance Ord LogEntry where
  le1 <= le2 = entryTime le1 <= entryTime le2
```

Now we are ready to merge both log files into one single result of type `Log`.

``` active haskell
{-# START_FILE sellings.log #-}
2013-06-29 11:16:23 124.67.34.60 keyboard
2013-06-29 11:32:12 212.141.23.67 mouse
2013-06-29 11:33:08 212.141.23.67 monitor
2013-06-29 12:12:34 125.80.32.31 speakers
2013-06-29 12:51:50 101.40.50.62 keyboard
2013-06-29 13:10:45 103.29.60.13 mouse
2013-06-29 16:40:15 154.41.32.99 monitor internet
2013-06-29 16:51:12 103.29.60.13 keyboard internet
2013-06-29 17:13:21 121.95.68.21 speakers friend
2013-06-29 18:20:10 190.80.70.60 mouse noanswer
2013-06-29 18:51:23 102.42.52.64 speakers friend
2013-06-29 19:01:11 78.46.64.23 mouse internet

{-# START_FILE sellings2.log #-}
154.41.32.99 29/06/2013 15:32:23 4 internet
76.125.44.33 29/06/2013 16:56:45 3 noanswer
123.45.67.89 29/06/2013 18:44:29 4 friend
100.23.32.41 29/06/2013 19:01:09 1 internet
151.123.45.67 29/06/2013 20:30:13 2 internet

{-# START_FILE Main.hs #-}
{-# LANGUAGE OverloadedStrings #-}

import Data.Word
import Data.Time
import Data.Attoparsec.Char8
import Control.Applicative
import qualified Data.ByteString as B

-- show
-----------------------
------ SETTINGS -------
-----------------------
-- | File where the log is stored.
logFile :: FilePath
logFile = "sellings.log"

-- | Second file where the log is stored.
logFile2 :: FilePath
logFile2 = "sellings2.log"
-- /show

-----------------------
-------- TYPES --------
-----------------------

-- | Type for IP's.
data IP = IP Word8 Word8 Word8 Word8 deriving (Eq,Show)

-- | Type for products.
data Product = Mouse | Keyboard | Monitor | Speakers deriving (Eq,Show,Enum)

productFromID :: Int -> Product
productFromID n = toEnum (n-1)

data Source = Internet | Friend | NoAnswer deriving (Eq,Show)

data LogEntry =
  LogEntry { entryTime :: LocalTime
           , entryIP   :: IP
           , entryProduct   :: Product
           , source    :: Source
               -- We derive Eq since is needed to be able
               -- to write an instance of Ord.
             } deriving (Eq, Show)

instance Ord LogEntry where
  le1 <= le2 = entryTime le1 <= entryTime le2

type Log = [LogEntry]

-----------------------
------- PARSING -------
-----------------------

-- | Parser of values of type 'IP'.
parseIP :: Parser IP
parseIP = do
  d1 <- decimal
  char '.'
  d2 <- decimal
  char '.'
  d3 <- decimal
  char '.'
  d4 <- decimal
  return $ IP d1 d2 d3 d4

-- | Parser of values of type 'LocalTime'.
timeParser :: Parser LocalTime
timeParser = do
  y  <- count 4 digit
  char '-'
  mm <- count 2 digit
  char '-'
  d  <- count 2 digit
  char ' '
  h  <- count 2 digit
  char ':'
  m  <- count 2 digit
  char ':'
  s  <- count 2 digit
  return $
    LocalTime { localDay = fromGregorian (read y) (read mm) (read d)
              , localTimeOfDay = TimeOfDay (read h) (read m) (read s)
                }

-- | Parser of values of type 'Product'.
productParser :: Parser Product
productParser =
     (string "mouse"    >> return Mouse)
 <|> (string "keyboard" >> return Keyboard)
 <|> (string "monitor"  >> return Monitor)
 <|> (string "speakers" >> return Speakers)

sourceParser :: Parser Source
sourceParser =
      (string "internet" >> return Internet)
  <|> (string "friend" >> return Friend)
  <|> (string "noanswer" >> return NoAnswer)

-- | Parser of log entries.
logEntryParser :: Parser LogEntry
logEntryParser = do
  t <- timeParser
  char ' '
  ip <- parseIP
  char ' '
  p <- productParser
  s <- option NoAnswer $ char ' ' >> sourceParser
  return $ LogEntry t ip p s

logParser :: Parser Log
logParser = many $ logEntryParser <* endOfLine

timeParser2 :: Parser LocalTime
timeParser2 = do
  d  <- count 2 digit
  char '/'
  mm <- count 2 digit
  char '/'
  y  <- count 4 digit
  char ' '
  h  <- count 2 digit
  char ':'
  m  <- count 2 digit
  char ':'
  s  <- count 2 digit
  return $
    LocalTime { localDay = fromGregorian (read y) (read mm) (read d)
              , localTimeOfDay = TimeOfDay (read h) (read m) (read s)
                }

productParser2 :: Parser Product
productParser2 = productFromID . read . (:[]) <$> digit

logEntryParser2 :: Parser LogEntry
logEntryParser2 = do
  ip <- parseIP
  char ' '
  t <- timeParser2
  char ' '
  p <- productParser2
  char ' ' 
  s <- sourceParser
  return $ LogEntry t ip p s

logParser2 :: Parser Log
logParser2 = many $ logEntryParser2 <* endOfLine

-----------------------
------- MERGING -------
-----------------------

merge :: Ord a => [a] -> [a] -> [a]
merge xs [] = xs
merge [] ys = ys
merge (x:xs) (y:ys) =
  if x <= y
     then x : merge xs (y:ys)
     else y : merge (x:xs) ys

-- show
----------------------
-------- MAIN --------
----------------------

main :: IO ()
main = do
  file1 <- B.readFile logFile
  file2 <- B.readFile logFile2
          -- We are using the Either monad here.
  let r = do xs <- parseOnly logParser  file1
             ys <- parseOnly logParser2 file2
             return $ merge xs ys
  case r of
   Left err -> putStrLn $ "A parsing error was found: " ++ err
   Right log -> mapM_ print log
-- /show
```

## Extracting information from the log file

Once the log file is parsed, we can extract information from it.
Following the previous example, we can check what is the product sold with more frequency
or where most users found our webshop.

Let's calculate the product that has been sold more times. We may create an association list containing
pairs (product,number of sales) for each product. It would have the following type:

``` haskell
type Sales = [(Product,Int)]
```

Given a list like this, we can check how many times a product has been sold.

``` haskell
import Data.Maybe (fromMaybe)

salesOf :: Product -> Sales -> Int
salesOf p xs = fromMaybe 0 $ lookup p xs
```

We can also add one sale more to the list.

``` haskell
addSale :: Product -> Sales -> Sales
-- If we have no sales, we add the product with 1 sale.
addSale p [] = [(p,1)]
addSale p ((x,n):xs) = if p == x then (x,n+1):xs
                                 else (x,n) : addSale p xs
```

Calculating the most sold product can be done using `maximumBy` (from
the `Data.List` module) to compare the elements of the list using the
second component of each pair.

``` haskell
import Data.List (maximumBy)

-- | Given a list of sales, returns the most sold product along with
--   its number of sales.
mostSold :: Sales -> Maybe (Product,Int)
mostSold [] = Nothing
mostSold xs = Just $ maximumBy (\x y -> snd x `compare` snd y) xs
```

We need to use `Maybe` to handle the event when nothing has been sold yet.

The last task remainding is to build a list of type `Sales` from a value of
`Log` type. Since each log entry contains one product, we can use a fold in
the log list using `addSale` for each entry product, adding all these items
to the empty list.

``` haskell
sales :: Log -> Sales
sales = foldr (addSales . entryProduct) []
```

Using now the same data as before, we output the product with more sales.

``` active haskell
{-# START_FILE sellings.log #-}
2013-06-29 11:16:23 124.67.34.60 keyboard
2013-06-29 11:32:12 212.141.23.67 mouse
2013-06-29 11:33:08 212.141.23.67 monitor
2013-06-29 12:12:34 125.80.32.31 speakers
2013-06-29 12:51:50 101.40.50.62 keyboard
2013-06-29 13:10:45 103.29.60.13 mouse
2013-06-29 16:40:15 154.41.32.99 monitor internet
2013-06-29 16:51:12 103.29.60.13 keyboard internet
2013-06-29 17:13:21 121.95.68.21 speakers friend
2013-06-29 18:20:10 190.80.70.60 mouse noanswer
2013-06-29 18:51:23 102.42.52.64 speakers friend
2013-06-29 19:01:11 78.46.64.23 mouse internet

{-# START_FILE sellings2.log #-}
154.41.32.99 29/06/2013 15:32:23 4 internet
76.125.44.33 29/06/2013 16:56:45 3 noanswer
123.45.67.89 29/06/2013 18:44:29 4 friend
100.23.32.41 29/06/2013 19:01:09 1 internet
151.123.45.67 29/06/2013 20:30:13 2 internet

{-# START_FILE Main.hs #-}
{-# LANGUAGE OverloadedStrings #-}

import Data.Word
import Data.Time
import Data.Attoparsec.Char8
import Control.Applicative
import qualified Data.ByteString as B
import Data.List (maximumBy)
import Data.Maybe (fromMaybe)

-----------------------
------ SETTINGS -------
-----------------------

-- | File where the log is stored.
logFile :: FilePath
logFile = "sellings.log"

-- | Second file where the log is stored.
logFile2 :: FilePath
logFile2 = "sellings2.log"

-----------------------
-------- TYPES --------
-----------------------

-- | Type for IP's.
data IP = IP Word8 Word8 Word8 Word8 deriving (Eq,Show)

-- | Type for products.
data Product = Mouse | Keyboard | Monitor | Speakers deriving (Eq,Show,Enum)

productFromID :: Int -> Product
productFromID n = toEnum (n-1)

data Source = Internet | Friend | NoAnswer deriving (Eq,Show)

data LogEntry =
  LogEntry { entryTime :: LocalTime
           , entryIP   :: IP
           , entryProduct   :: Product
           , source    :: Source
               -- We derive Eq since is needed to be able
               -- to write an instance of Ord.
             } deriving (Eq, Show)

instance Ord LogEntry where
  le1 <= le2 = entryTime le1 <= entryTime le2

type Log = [LogEntry]

-----------------------
------- PARSING -------
-----------------------

-- | Parser of values of type 'IP'.
parseIP :: Parser IP
parseIP = do
  d1 <- decimal
  char '.'
  d2 <- decimal
  char '.'
  d3 <- decimal
  char '.'
  d4 <- decimal
  return $ IP d1 d2 d3 d4

-- | Parser of values of type 'LocalTime'.
timeParser :: Parser LocalTime
timeParser = do
  y  <- count 4 digit
  char '-'
  mm <- count 2 digit
  char '-'
  d  <- count 2 digit
  char ' '
  h  <- count 2 digit
  char ':'
  m  <- count 2 digit
  char ':'
  s  <- count 2 digit
  return $
    LocalTime { localDay = fromGregorian (read y) (read mm) (read d)
              , localTimeOfDay = TimeOfDay (read h) (read m) (read s)
                }

-- | Parser of values of type 'Product'.
productParser :: Parser Product
productParser =
     (string "mouse"    >> return Mouse)
 <|> (string "keyboard" >> return Keyboard)
 <|> (string "monitor"  >> return Monitor)
 <|> (string "speakers" >> return Speakers)

sourceParser :: Parser Source
sourceParser =
      (string "internet" >> return Internet)
  <|> (string "friend" >> return Friend)
  <|> (string "noanswer" >> return NoAnswer)

-- | Parser of log entries.
logEntryParser :: Parser LogEntry
logEntryParser = do
  t <- timeParser
  char ' '
  ip <- parseIP
  char ' '
  p <- productParser
  s <- option NoAnswer $ char ' ' >> sourceParser
  return $ LogEntry t ip p s

logParser :: Parser Log
logParser = many $ logEntryParser <* endOfLine

timeParser2 :: Parser LocalTime
timeParser2 = do
  d  <- count 2 digit
  char '/'
  mm <- count 2 digit
  char '/'
  y  <- count 4 digit
  char ' '
  h  <- count 2 digit
  char ':'
  m  <- count 2 digit
  char ':'
  s  <- count 2 digit
  return $
    LocalTime { localDay = fromGregorian (read y) (read mm) (read d)
              , localTimeOfDay = TimeOfDay (read h) (read m) (read s)
                }

productParser2 :: Parser Product
productParser2 = productFromID . read . (:[]) <$> digit

logEntryParser2 :: Parser LogEntry
logEntryParser2 = do
  ip <- parseIP
  char ' '
  t <- timeParser2
  char ' '
  p <- productParser2
  char ' ' 
  s <- sourceParser
  return $ LogEntry t ip p s

logParser2 :: Parser Log
logParser2 = many $ logEntryParser2 <* endOfLine

-----------------------
------- MERGING -------
-----------------------

merge :: Ord a => [a] -> [a] -> [a]
merge xs [] = xs
merge [] ys = ys
merge (x:xs) (y:ys) =
  if x <= y
     then x : merge xs (y:ys)
     else y : merge (x:xs) ys

----------------------
------ COUNTING ------
----------------------

type Sales = [(Product,Int)]

salesOf :: Product -> Sales -> Int
salesOf p xs = fromMaybe 0 $ lookup p xs

addSale :: Product -> Sales -> Sales
addSale p [] = [(p,1)]
addSale p ((x,n):xs) = if p == x then (x,n+1):xs
                                 else (x,n) : addSale p xs
                        
-- | Given a list of sales, returns the most sold product along with
--   its number of sales.
mostSold :: Sales -> Maybe (Product,Int)
mostSold [] = Nothing
mostSold xs = Just $ maximumBy (\x y -> snd x `compare` snd y) xs

sales :: Log -> Sales
sales = foldr (addSale . entryProduct) []

----------------------
-------- MAIN --------
----------------------

-- show
main :: IO ()
main = do
  file1 <- B.readFile logFile
  file2 <- B.readFile logFile2
  let r = do xs <- parseOnly logParser  file1
             ys <- parseOnly logParser2 file2
             return $ merge xs ys
  case r of
   Left err -> putStrLn $ "A parsing error was found: " ++ err
   Right log ->
     case mostSold (sales log) of
       Nothing -> putStrLn "We didn't sell anything yet."
       Just (p,n) -> putStrLn $ "The product with more sales is " ++ show p
                  ++ " with " ++ show n ++ " sales."
-- /show
```

# From log file to CSV

CSV (Comma Separated Values) files store tabular data and can be used from a large number of applications.
In fact, one of the advantages of using the CSV format is that data stored in this format can be imported
and exported from very different programs. After gathering all the log file information, we are going to
render a CSV table containing it. Then, we will develop a parser to get the data back into Haskell.

## Rendering to CSV

The process of rendering to CSV is straightforward. Rendering is in general simpler than parsing, and  CSV rendering is not an exception.

We define rendering methods for each type, as we defined parsers for each type.
Sometimes, the renderer looks similar to the parser (see `renderIP` below).

Some functions useful when rendering:

* `<>`: This operator from `Data.Monoid` appends values of types instance of the `Monoid` class.
  `ByteString` is one of them.
* `foldMap`: Apply a function over the elements of a structure instance of the `Foldable` class to
  values of a type instance of the `Monoid` class then append all the results.
* `fromString`: It takes a String and return it as a value of any type in the `IsString` class,
  defined at `Data.String`.

``` active haskell
{-# START_FILE sellings.log #-}
2013-06-29 11:16:23 124.67.34.60 keyboard
2013-06-29 11:32:12 212.141.23.67 mouse
2013-06-29 11:33:08 212.141.23.67 monitor
2013-06-29 12:12:34 125.80.32.31 speakers
2013-06-29 12:51:50 101.40.50.62 keyboard
2013-06-29 13:10:45 103.29.60.13 mouse
2013-06-29 16:40:15 154.41.32.99 monitor internet
2013-06-29 16:51:12 103.29.60.13 keyboard internet
2013-06-29 17:13:21 121.95.68.21 speakers friend
2013-06-29 18:20:10 190.80.70.60 mouse noanswer
2013-06-29 18:51:23 102.42.52.64 speakers friend
2013-06-29 19:01:11 78.46.64.23 mouse internet

{-# START_FILE sellings2.log #-}
154.41.32.99 29/06/2013 15:32:23 4 internet
76.125.44.33 29/06/2013 16:56:45 3 noanswer
123.45.67.89 29/06/2013 18:44:29 4 friend
100.23.32.41 29/06/2013 19:01:09 1 internet
151.123.45.67 29/06/2013 20:30:13 2 internet

{-# START_FILE Main.hs #-}
{-# LANGUAGE OverloadedStrings #-}

import Data.Word
import Data.Time
import Data.Attoparsec.Char8
import Control.Applicative
-- show
import Data.ByteString.Char8 (ByteString,singleton)
import qualified Data.ByteString as B
import qualified Data.ByteString.Char8 as BC
import Data.String
import Data.Char (toLower)
import Data.Monoid hiding (Product)
import Data.Foldable (foldMap)
-- /show

-----------------------
------ SETTINGS -------
-----------------------

-- | File where the log is stored.
logFile :: FilePath
logFile = "sellings.log"

-- | Second file where the log is stored.
logFile2 :: FilePath
logFile2 = "sellings2.log"

-----------------------
-------- TYPES --------
-----------------------

-- | Type for IP's.
data IP = IP Word8 Word8 Word8 Word8 deriving (Eq,Show)

-- | Type for products.
data Product = Mouse | Keyboard | Monitor | Speakers deriving (Eq,Show,Enum)

productFromID :: Int -> Product
productFromID n = toEnum (n-1)

data Source = Internet | Friend | NoAnswer deriving (Eq,Show)

data LogEntry =
  LogEntry { entryTime :: LocalTime
           , entryIP   :: IP
           , entryProduct   :: Product
           , source    :: Source
               -- We derive Eq since is needed to be able
               -- to write an instance of Ord.
             } deriving (Eq, Show)

instance Ord LogEntry where
  le1 <= le2 = entryTime le1 <= entryTime le2

type Log = [LogEntry]

-----------------------
------- PARSING -------
-----------------------

-- | Parser of values of type 'IP'.
parseIP :: Parser IP
parseIP = do
  d1 <- decimal
  char '.'
  d2 <- decimal
  char '.'
  d3 <- decimal
  char '.'
  d4 <- decimal
  return $ IP d1 d2 d3 d4

-- | Parser of values of type 'LocalTime'.
timeParser :: Parser LocalTime
timeParser = do
  y  <- count 4 digit
  char '-'
  mm <- count 2 digit
  char '-'
  d  <- count 2 digit
  char ' '
  h  <- count 2 digit
  char ':'
  m  <- count 2 digit
  char ':'
  s  <- count 2 digit
  return $
    LocalTime { localDay = fromGregorian (read y) (read mm) (read d)
              , localTimeOfDay = TimeOfDay (read h) (read m) (read s)
                }

-- | Parser of values of type 'Product'.
productParser :: Parser Product
productParser =
     (string "mouse"    >> return Mouse)
 <|> (string "keyboard" >> return Keyboard)
 <|> (string "monitor"  >> return Monitor)
 <|> (string "speakers" >> return Speakers)

sourceParser :: Parser Source
sourceParser =
      (string "internet" >> return Internet)
  <|> (string "friend" >> return Friend)
  <|> (string "noanswer" >> return NoAnswer)

-- | Parser of log entries.
logEntryParser :: Parser LogEntry
logEntryParser = do
  t <- timeParser
  char ' '
  ip <- parseIP
  char ' '
  p <- productParser
  s <- option NoAnswer $ char ' ' >> sourceParser
  return $ LogEntry t ip p s

logParser :: Parser Log
logParser = many $ logEntryParser <* endOfLine

timeParser2 :: Parser LocalTime
timeParser2 = do
  d  <- count 2 digit
  char '/'
  mm <- count 2 digit
  char '/'
  y  <- count 4 digit
  char ' '
  h  <- count 2 digit
  char ':'
  m  <- count 2 digit
  char ':'
  s  <- count 2 digit
  return $
    LocalTime { localDay = fromGregorian (read y) (read mm) (read d)
              , localTimeOfDay = TimeOfDay (read h) (read m) (read s)
                }

productParser2 :: Parser Product
productParser2 = productFromID . read . (:[]) <$> digit

logEntryParser2 :: Parser LogEntry
logEntryParser2 = do
  ip <- parseIP
  char ' '
  t <- timeParser2
  char ' '
  p <- productParser2
  char ' ' 
  s <- sourceParser
  return $ LogEntry t ip p s

logParser2 :: Parser Log
logParser2 = many $ logEntryParser2 <* endOfLine

-----------------------
------- MERGING -------
-----------------------

merge :: Ord a => [a] -> [a] -> [a]
merge xs [] = xs
merge [] ys = ys
merge (x:xs) (y:ys) =
  if x <= y
     then x : merge xs (y:ys)
     else y : merge (x:xs) ys

-- show
-----------------------
------ RENDERING ------
-----------------------

-- | Character that will serve as field separator.
--   It should not be one of the characters that
--   appear in the fields.
sepChar :: Char
sepChar = ','

-- | Rendering of IP's to ByteString.
renderIP :: IP -> ByteString
renderIP (IP a b c d) =
     -- Function @show@ creates a String and
     -- fromString makes it a ByteString.
     fromString (show a)
  <> singleton '.'
  <> fromString (show b)
  <> singleton '.'
  <> fromString (show c)
  <> singleton '.'
  <> fromString (show d)

-- | Render a log entry to a CSV row as ByteString.
renderEntry :: LogEntry -> ByteString
renderEntry le =
     fromString (show $ entryTime le)
  <> singleton sepChar
  <> renderIP (entryIP le)
  <> singleton sepChar
     -- We use @fmap toLower@ to write the product name
     -- in lowercase letters.
  <> fromString (fmap toLower $ show $ entryProduct le)
  <> singleton sepChar
  <> fromString (fmap toLower $ show $ source le)

-- | Render a log file to CSV as ByteString.
renderLog :: Log -> ByteString
renderLog = foldMap $ \le -> renderEntry le <> singleton '\n'

----------------------
-------- MAIN --------
----------------------

main :: IO ()
main = do
  file1 <- B.readFile logFile
  file2 <- B.readFile logFile2
          -- We are using the Either monad here.
  let r = do xs <- parseOnly logParser  file1
             ys <- parseOnly logParser2 file2
             return $ merge xs ys
  case r of
   Left err -> putStrLn $ "A parsing error was found: " ++ err
   Right log -> BC.putStrLn $ renderLog log
-- /show
```
## Parsing from CSV

Again, as with log files, we use _attoparsec_ for parsing.
Note that the CSV format is similar to the log format, except
in how fields are separated. Therefore, we can re-use our
field parsers.

We start defining a parser for rows, and then we iterate it
using `many` exactly as before.

``` active haskell
{-# START_FILE sellings.csv #-}
2013-06-29 11:16:23 , 124.67.34.60  , keyboard , noanswer
2013-06-29 11:32:12 , 212.141.23.67 , mouse    , noanswer
2013-06-29 11:33:08 , 212.141.23.67 , monitor  , noanswer
2013-06-29 12:12:34 , 125.80.32.31  , speakers , noanswer
2013-06-29 12:51:50 , 101.40.50.62  , keyboard , noanswer
2013-06-29 13:10:45 , 103.29.60.13  , mouse    , noanswer
2013-06-29 15:32:23 , 154.41.32.99  , speakers , internet
2013-06-29 16:40:15 , 154.41.32.99  , monitor  , internet
2013-06-29 16:51:12 , 103.29.60.13  , keyboard , internet
2013-06-29 16:56:45 , 76.125.44.33  , monitor  , noanswer
2013-06-29 17:13:21 , 121.95.68.21  , speakers , friend
2013-06-29 18:20:10 , 190.80.70.60  , mouse    , noanswer
2013-06-29 18:44:29 , 123.45.67.89  , speakers , friend
2013-06-29 18:51:23 , 102.42.52.64  , speakers , friend
2013-06-29 19:01:09 , 100.23.32.41  , mouse    , internet
2013-06-29 19:01:11 , 78.46.64.23   , mouse    , internet
2013-06-29 20:30:13 , 151.123.45.67 , keyboard , internet

{-# START_FILE Main.hs #-}
{-# LANGUAGE OverloadedStrings #-}

import Data.Word
import Data.Time
import Data.Attoparsec.Char8
import Control.Applicative
import qualified Data.ByteString as B
-- show
-----------------------
------ SETTINGS -------
-----------------------

-- | File where the CSV is stored.
csvFile :: FilePath
csvFile = "sellings.csv"

-- | Character that will serve as field separator.
--   It should not be one of the characters that
--   appear in the fields.
sepChar :: Char
sepChar = ','
-- /show

-----------------------
-------- TYPES --------
-----------------------

-- | Type for IP's.
data IP = IP Word8 Word8 Word8 Word8 deriving (Eq,Show)

-- | Type for products.
data Product = Mouse | Keyboard | Monitor | Speakers deriving (Eq,Show,Enum)

data Source = Internet | Friend | NoAnswer deriving (Eq,Show)

data LogEntry =
  LogEntry { entryTime :: LocalTime
           , entryIP   :: IP
           , entryProduct   :: Product
           , source    :: Source
               -- We derive Eq since is needed to be able
               -- to write an instance of Ord.
             } deriving (Eq, Show)

type Log = [LogEntry]

-- show
-----------------------
------- PARSING -------
-----------------------
-- /show

-- | Parser of values of type 'IP'.
parseIP :: Parser IP
parseIP = do
  d1 <- decimal
  char '.'
  d2 <- decimal
  char '.'
  d3 <- decimal
  char '.'
  d4 <- decimal
  return $ IP d1 d2 d3 d4

-- | Parser of values of type 'LocalTime'.
timeParser :: Parser LocalTime
timeParser = do
  y  <- count 4 digit
  char '-'
  mm <- count 2 digit
  char '-'
  d  <- count 2 digit
  char ' '
  h  <- count 2 digit
  char ':'
  m  <- count 2 digit
  char ':'
  s  <- count 2 digit
  return $
    LocalTime { localDay = fromGregorian (read y) (read mm) (read d)
              , localTimeOfDay = TimeOfDay (read h) (read m) (read s)
                }
                
-- | Parser of values of type 'Product'.
productParser :: Parser Product
productParser =
     (string "mouse"    >> return Mouse)
 <|> (string "keyboard" >> return Keyboard)
 <|> (string "monitor"  >> return Monitor)
 <|> (string "speakers" >> return Speakers)

sourceParser :: Parser Source
sourceParser =
      (string "internet" >> return Internet)
  <|> (string "friend" >> return Friend)
  <|> (string "noanswer" >> return NoAnswer)

-- show
rowParser :: Parser LogEntry
rowParser = do
  -- Parser of field separators. It skips space characters before
  -- and after the CSV separator char.
  -- Characters considered as space are simple whitespaces and tabs.
  let spaceSkip = many $ satisfy $ inClass [ ' ' , '\t' ]
      sepParser = spaceSkip >> char sepChar >> spaceSkip
  -- Skip spaces at the beginning of the line.
  spaceSkip
  t  <- timeParser
  sepParser
  ip <- parseIP
  sepParser
  p  <- productParser
  sepParser
  s  <- sourceParser
  -- Skip remaining spaces at the end of the line
  spaceSkip
  return $ LogEntry t ip p s

csvParser :: Parser Log
csvParser = many $ rowParser <* endOfLine

----------------------
-------- MAIN --------
----------------------

main :: IO ()
main = do
  file <- B.readFile csvFile
  case parseOnly csvParser file of
    Left err -> putStrLn $ "Error while parsing CSV file: " ++ err
    Right log -> mapM_ print log
-- /show
```

## Using CSV across applications

Use `renderLog` and `Data.ByteString.Char8.writeFile` to write a CSV table
using your log information. However, if you are using a character set different
from ASCII or ISO-8859-15, you should consider using the type `Text` instead
of `ByteString`. Almost the only change you have to do
is to change the import of `Data.Attoparsec.Char8` to `Data.Attoparsec.Text`
(both modules export similar interfaces and are interchangeable) and
adapt the types of the renderer.

Once you have written your data in CSV format, import it from another application.
Use the table, make any changes that you may want and modified data back in Haskell
by parsing the CSV output of your application. Make sure your application and the
Haskell parser are using the same column separator.

# Final App: Read several log files, merge data and render it in CSV

We now present a runnable application that read a list of log files, merge them and
return the result as a CSV table. The files may be read from a local file or an URL.

``` active haskell
{-# START_FILE sellings.log #-}
2013-06-29 11:16:23 124.67.34.60 keyboard
2013-06-29 11:32:12 212.141.23.67 mouse
2013-06-29 11:33:08 212.141.23.67 monitor
2013-06-29 12:12:34 125.80.32.31 speakers
2013-06-29 12:51:50 101.40.50.62 keyboard
2013-06-29 13:10:45 103.29.60.13 mouse
2013-06-29 16:40:15 154.41.32.99 monitor internet
2013-06-29 16:51:12 103.29.60.13 keyboard internet
2013-06-29 17:13:21 121.95.68.21 speakers friend
2013-06-29 18:20:10 190.80.70.60 mouse noanswer
2013-06-29 18:51:23 102.42.52.64 speakers friend
2013-06-29 19:01:11 78.46.64.23 mouse internet

{-# START_FILE sellings2.log #-}
2013-06-29 15:32:23 154.41.32.99 speakers internet
2013-06-29 16:56:45 76.125.44.33 monitor noanswer
2013-06-29 18:44:29 123.45.67.89 speakers friend
2013-06-29 19:01:09 100.23.32.41 mouse internet
2013-06-29 20:30:13 151.123.45.67 keyboard internet

{-# START_FILE Main.hs #-}
{-# LANGUAGE OverloadedStrings #-}

import Data.Word
import Data.Time
import Data.Attoparsec.Char8
import Control.Applicative
import Data.Either (rights)
import Data.Monoid hiding (Product)
import Data.String
import Data.Char (toLower)
import Data.Foldable (foldMap)
-- ByteString stuff
import Data.ByteString.Char8 (ByteString,singleton)
import qualified Data.ByteString as B
import qualified Data.ByteString.Char8 as BC
import Data.ByteString.Lazy (toChunks)
-- HTTP protocol to perform downloads
import Network.HTTP.Conduit

----------------------
------- FILES --------
----------------------

data File = URL String | Local FilePath

-- | Files where the logs are stored.
--   Modify this value to read logs from
--   other sources.
logFiles :: [File]
logFiles =
  [ Local "sellings.log"
  , Local "sellings2.log"
  , URL "http://daniel-diaz.github.io/misc/sellings3.log"
    ]

getFile :: File -> IO ByteString
-- simpleHttp gets a lazy bytestring, while we
-- are using strict bytestrings.
getFile (URL str) = mconcat . toChunks <$> simpleHttp str
getFile (Local fp) = B.readFile fp

-----------------------
-------- TYPES --------
-----------------------

-- | Type for IP's.
data IP = IP Word8 Word8 Word8 Word8 deriving (Eq,Show)

-- | Type for products.
data Product = Mouse | Keyboard | Monitor | Speakers deriving (Eq,Show,Enum)

productFromID :: Int -> Product
productFromID n = toEnum (n-1)

data Source = Internet | Friend | NoAnswer deriving (Eq,Show)

-- | Each log entry in the log file is represented by a value
--   of this type. Modify the fields of 'LogEntry' accordingly
--   to your log file of interest. However, 'entryTime' is a
--   reasonable field and is also used for merging.
data LogEntry =
  LogEntry { entryTime :: LocalTime
           , entryIP   :: IP
           , entryProduct   :: Product
           , source    :: Source
             } deriving (Eq, Show)

instance Ord LogEntry where
  le1 <= le2 = entryTime le1 <= entryTime le2

type Log = [LogEntry]

-----------------------
------- PARSING -------
-----------------------

-- | Parser of values of type 'IP'.
parseIP :: Parser IP
parseIP = do
  d1 <- decimal
  char '.'
  d2 <- decimal
  char '.'
  d3 <- decimal
  char '.'
  d4 <- decimal
  return $ IP d1 d2 d3 d4

-- | Parser of values of type 'LocalTime'.
timeParser :: Parser LocalTime
timeParser = do
  y  <- count 4 digit
  char '-'
  mm <- count 2 digit
  char '-'
  d  <- count 2 digit
  char ' '
  h  <- count 2 digit
  char ':'
  m  <- count 2 digit
  char ':'
  s  <- count 2 digit
  return $
    LocalTime { localDay = fromGregorian (read y) (read mm) (read d)
              , localTimeOfDay = TimeOfDay (read h) (read m) (read s)
                }

-- | Parser of values of type 'Product'.
productParser :: Parser Product
productParser =
     (string "mouse"    >> return Mouse)
 <|> (string "keyboard" >> return Keyboard)
 <|> (string "monitor"  >> return Monitor)
 <|> (string "speakers" >> return Speakers)

sourceParser :: Parser Source
sourceParser =
      (string "internet" >> return Internet)
  <|> (string "friend" >> return Friend)
  <|> (string "noanswer" >> return NoAnswer)

-- | Parser of log entries.
logEntryParser :: Parser LogEntry
logEntryParser = do
  t <- timeParser
  char ' '
  ip <- parseIP
  char ' '
  p <- productParser
  s <- option NoAnswer $ char ' ' >> sourceParser
  return $ LogEntry t ip p s

logParser :: Parser Log
logParser = many $ logEntryParser <* endOfLine

-----------------------
------- MERGING -------
-----------------------

merge :: Ord a => [a] -> [a] -> [a]
merge xs [] = xs
merge [] ys = ys
merge (x:xs) (y:ys) =
  if x <= y
     then x : merge xs (y:ys)
     else y : merge (x:xs) ys

-----------------------
------ RENDERING ------
-----------------------

-- | Character that will serve as field separator.
--   It should not be one of the characters that
--   appear in the fields.
sepChar :: Char
sepChar = ','

-- | Rendering of IP's to ByteString.
renderIP :: IP -> ByteString
renderIP (IP a b c d) =
     fromString (show a)
  <> singleton '.'
  <> fromString (show b)
  <> singleton '.'
  <> fromString (show c)
  <> singleton '.'
  <> fromString (show d)

-- | Render a log entry to a CSV row as ByteString.
renderEntry :: LogEntry -> ByteString
renderEntry le =
     fromString (show $ entryTime le)
  <> singleton sepChar
  <> renderIP (entryIP le)
  <> singleton sepChar
  <> fromString (fmap toLower $ show $ entryProduct le)
  <> singleton sepChar
  <> fromString (fmap toLower $ show $ source le)

-- | Render a log file to CSV as ByteString.
renderLog :: Log -> ByteString
renderLog = foldMap $ \le -> renderEntry le <> singleton '\n'

----------------------
-------- MAIN --------
----------------------

main :: IO ()
main = do
  files <- mapM getFile logFiles
  let -- Parsed logs
      logs :: [Log]
      logs = rights $ fmap (parseOnly logParser) files
      -- Merged log
      mergedLog :: Log
      mergedLog = foldr merge [] logs
  BC.putStrLn $ renderLog mergedLog
```

# Conclusion

Parsing is one of the tasks that Haskell is really good at.
The parser code is much clearer and easier to write than in traditional languages and it may run
[faster than a C++ parser](http://newartisans.com/2012/08/parsing-with-haskell-and-attoparsec).
I invite you to try to parse bigger things. Following the
[API reference](http://hackage.haskell.org/package/attoparsec)
it should not be hard. As an example, Bryan O'Sullivan wrote an HTTP parser
[here](https://bitbucket.org/bos/attoparsec/src/tip/examples/RFC2616.hs).
I think it is easy to read once you know [how HTTP is defined](http://tools.ietf.org/html/rfc4180).