# Moore for Less

28 May 2015

As of March 2020, School of Haskell has been switched to read-only mode.

I was playing around while stuck on a plane this morning, and realized a few things that had previously escaped me about Moore machines.

In previous posts, I've talked about the notion of an (infinite) Moore machine.

Here we have a machine where each state `Moore a b` has a label `b`, and given an input `a` we transition along an edge to a new state. Unlike a traditional Moore machine, we may well have an infinite number of states, which removes all those pesky limitations on what you can recognize with such a machine.

``data Moore a b = Moore b (a -> Moore a b)``

In this form it can be seen to be an `Cofree` comonad.

``data Cofree f a = Cofree a (f (Cofree f a))``

We can see that `Moore a b` is isomorphic to `Cofree ((->) a) b`.

## Soul of a Nu Machine

An equivalent definition of a Moore machine that I've covered before is to switch to an explicit 'state' type.

We can derive that definition from the more direct definition above by one of several different means.

Probably the most straightforward way to do so is to exploit the fact that `Cofree f a = Fix (Compose ((,) a) f)`, where `Fix f` is the greatest fixed point of `f`, and look at the two different ways to encode the greatest fixed point in Haskell.

The usual definition of `Fix` is given by:

``data Fix f = In { out :: f (Fix f) }``

This definition exploits the fact that the least and greatest fixed points are the same, but we can also just use the direct definition as the greatest fixed point, which we get when we write down the definition of an anamorphism, which we can use to build a member of the greatest fixed point of `f`:

``ana :: Functor f => (s -> f s) -> s -> Nu f``

and just take its type signature as the definition of `Nu`.

``````data Nu f where
Nu :: (s -> f s) -> s -> Nu f

ana = Nu ``````

`Nu` represents the greatest fixed point in a much more portable way than `Fix`. `Fix` only really works as a greatest fixed point due to laziness in Haskell. In a strict language these two defintions are not equivalent.

Substituting `Nu` into `Cofree` instead of `Fix` we get:

``````Cofree f b =
Nu (Compose ((,) b) f) =
∃s. (s -> Compose ((,) b) f s, s) =
∃s. (s -> (b, f s), s) =
∃s. (s -> b, s -> f s, s)``````

In the particular case of a Moore machine:

``````Moore a b =
Cofree ((->) a) b =
∃s. (s -> b, s -> a -> s, s)``````

This corresponds to the data type:

``````data Moore a b where
Moore :: (s -> b) -> (s -> a -> s) -> s -> Moore a b``````

I've talked about this data type before, as has Gabriel Gonzalez. I have it packaged up in my `folds` package, and he has a version of it in his `foldl` package.

## Distributive Functors, Represent!

Now for something new: In category theory a representable functor `f` is a functor for which there exists an object `x` such that we can equip our functor with a natural isomorphism between `f a` and `(x -> a)`. It describes all of the arrows out of some object `x`.

I have long had this definition split across two packages.

We first have the "haskell 98" `distributive` package, which provides

``````class Functor g => Distributive g where
distribute :: Functor f => f (g a) -> g (f a)
collect :: Functor f => (a -> g b) -> f a -> g (f b)``````

`Distributive` is effectively a co-`Traversable`. Given that the package is Haskell 98 / 2010, it can't supply us `x`. For every `Distributive` functor such an `x` should exist though. In the `adjunctions` package, you can get your hands on it with:

``````class Distributive f => Representable f where
type Rep f
tabulate :: (Rep f -> a) -> f a
index :: f a -> Rep f -> a``````

Here, `x = Rep f`, and `tabulate` and `index` are inverses.

## Cofree Anyone?

There is an obvious representation for `((->) x)`, as `(x -> a)` is clearly isomorphic to `(x -> a)`!

``````instance Representable ((->) x) where
type Rep ((->) x) = x
tabulate = id
index = id``````

With this we can just randomly embellish the definition of a cofree comonad by explicitly choosing `s` to be the representation of some representable functor, and just not telling me what it is.

``````data Cofree f a where
Cofree :: Representable k => k a -> k (f (Rep k)) -> Rep k -> Cofree f a

instance Functor (Cofree f) where
fmap f (Cofree k u s) = Cofree (fmap f k) u s

extract (Cofree k _ s) = index k s
duplicate (Cofree k u s) = Cofree (tabulate (Cofree k u)) u s``````

## Representable Machines

Now we're equipped to play with a similarly modified definition of a Moore machine.

``````data Moore a b where
Moore :: Representable k => k b -> k (a -> Rep k) -> Rep k -> Moore a b``````

Above and beyond what we can say about `Cofree` in general, we know something else in the `Moore` case, namely that both our unknown representable functor `k` and `(->) a` are `Distributive`, so we can freely interchange them in the definition above. This yields the following definition, in terms of which all subsequent instances are defined:

``````data Moore a b where
Moore :: Representable k => k b -> (a -> k (Rep k)) -> Rep k -> Moore a b``````

Since the `u` argument isn't used at all in the definition of the `Comonad` for `Cofree f` and is just silently passed along, these instances for `Moore a` work either way.

``````instance Functor (Moore a) where
fmap f (Moore k u b) = Moore (fmap f k) u b

extract (Moore k _ s)   = index k s
duplicate (Moore k u s) = Moore (tabulate (Moore k u)) u s
extend f (Moore k u s)  = Moore (tabulate (f . Moore k u)) u s``````

Due to the `Representable ((->) x)` instance no power is lost, but now some choices of `k` might be suitable for memoization, acting as a trie to hold onto the results rather than recomputing them each time they are asked.

## Doing Two Things at Once

While I've put the `Comonad` for `Moore` to good use in previous posts, much of the original motivation for using `Moore` was to give us the ability to describe how to fuse together multiple passes over the data.

To derive the `Applicative` for our new machine we'll need suitable functors to use to memoize all of our states. Rather than using functions, let's look for some things that are better behaved.

The next simpler instance after the naïve function instance above is:

``````instance Representable Identity where
type Rep Identity = ()
tabulate f = Identity (f ())
index (Identity a) () = a``````

Next, given two representable functors, their composition is also representable:

``````instance (Representable f, Representable g) => Representable (Compose f g) where
type Rep (Compose f g) = (Rep f, Rep g)
index (Compose fg) (i,j) = index (index fg i) j
tabulate = Compose . tabulate . fmap tabulate . curry``````

So let's put these instances to work:

``````instance Applicative (Moore a) where
pure a = Moore (Identity a) (\_ -> Identity ()) ()
Moore kf uf sf <*> Moore ka ua sa =
Moore (Compose ((<\$> ka) <\$> kf))
(\x -> Compose \$ (\y -> (,) y <\$> ua x) <\$> uf x)
(sf, sa)

(<@>) = (<*>)``````

This is just the definition we used to use for the `foldl`-style `Moore` machine, but now instead of using functions from our state, we just use representable functors that have our states as their representations.

Finally, as a small but useful aside, a `Moore` machine is a `Profunctor`, so we can map contravariantly over the inputs as well as covariantly over the results.

``````instance Profunctor Moore where
dimap f g (Moore k u s) = Moore (g <\$> k) (u . f) s``````

## Stepping Lightly

So how do we run the machine?

We can feed our machine in one of two ways. We can define

``````step1 :: a -> Moore a b -> b
step1 a (Moore k u s) = index k (index (u a) s)``````

and rely on the ability to `extend (step a)` to change out all of the labels on our machine. This unfortunately builds up and tears down a whole representable functor worth of data.

On the other hand, we can simply move the start state and get a whole new machine:

``````step :: a -> Moore a b -> Moore a b
step a (Moore k u s) = Moore k u (index (u a) s)``````

The choice between `step a` and `extend (step1 a)` is indistinguishable to the outside observer except in terms of performance. While `step` isn't a Cokleisli arrow, it is much faster.

## Running a Tab

That said, we don't have to run the machine one step at a time!

Dan Piponi once wrote an article on recognizing a regular language with a monoid. What he did was build a data type to represent the tabulation of transitions in his DFA.

The `a -> f (Rep f)` in the body of our Moore machine suggests what such a tabulation might look like, generically. It takes `a` to a structure that contains references to all the new states.

``````newtype Tab f = Tab { getTab :: f (Rep f) }

instance Representable f => Monoid (Tab f) where
mempty = Tab \$ tabulate id
mappend (Tab fs) (Tab gs) = Tab (index gs <\$> fs)``````

You can view this as a form of `Endo (Rep f)` that happens to be able to memoize the results of each argument to the function if `f` is sufficiently "nice".

We can now feed our machine a whole `Foldable` container at a time.

``````feed :: Foldable f => f a -> Moore a b -> Moore a b
feed as (Moore k u s) = Moore k u (index (getTab \$ foldMap (Tab . u) as) s)``````

## Compressive Parsing

The next trick is finding the right container type to fold over to make use of the memoized internal states.

For this, I'll turn to an old package of mine, `compressed`, which supplies a rather peculiar `LZ78` container type. LZ78 is a compression scheme by Lempel and Ziv that has a number of nice properties for my purposes.

``````data Token a = Token {-# UNPACK #-} !Int a deriving (Eq, Ord)
data LZ78 a
= Cons {-# UNPACK #-} !(Token a) (LZ78 a)
| Nil``````

The idea is this:

1.) You start with a dictionary that maps integers to a list of values and which contains a single entry that maps 0 to the empty string.

2.) Now you receive (or generate) a series of `(Int, value)` pairs, where each `Int` represents an existing slot in the dictionary, and the value represents something you want to `snoc` onto the end of it to make a fresh dictionary entry.

More advanced versions of this scheme collect old entries, but we can define a particularly naive LZ78 encoder / decoder very easily.

We can encode using a variety of different constraint types and times depending on how we represent the dictionary during construction.

In O(n2) we can construct an LZ78 stream using a list internally, but no more than an `Eq` constraint on `a`.

``````encodeEq :: Eq a => [a] -> LZ78 a
encodeEq = go [] 1 0 where
go _ _ _ [] = Nil
go _ _ p [c] = Cons (Token p c) Nil
go d f p (c:cs) = let t = Token p c in case List.lookup t d of
Just p' -> go d f p' cs
Nothing -> Cons t (go ((t, f):d) (succ f) 0 cs)``````

With a `Map` our time upgrades to O(n log n) with an `Ord` constraint.

``````encodeOrd :: Ord a => [a] -> LZ78 a
encodeOrd = go Map.empty 1 0 where
go _ _ _ [] = Nil
go _ _ p [c] = Cons (Token p c) Nil
go d f p (c:cs) = let t = Token p c in case Map.lookup t d of
Just p' -> go d f p' cs
Nothing -> Cons t (go (Map.insert t f d) (succ f) 0 cs)``````

We can also turn to a `HashMap` if we have `Hashable` inputs.

``````encode :: (Hashable a, Eq a) => [a] -> LZ78 a
encode = go HashMap.empty 1 0 where
go _ _ _ [] = Nil
go _ _ p [c] = Cons (Token p c) Nil
go d f p (c:cs) = let t = Token p c in case HashMap.lookup t d of
Just p' -> go d f p' cs
Nothing -> Cons t (go (HashMap.insert t f d) (succ f) 0 cs)``````

But regardless of how it was constructed, we can `decode` with `Foldable`.

``````instance Foldable LZ78 where
foldMap f = go (Seq.singleton mempty) mempty where
go _ m Nil = m
go s m (Cons (Token w c) ws) = m `mappend` go (s |> v) v ws where
v = Seq.index s w `mappend` f c``````

The key here is that the decompression scheme never actually looks at the values it decodes, so it is possible to decompress directly in any target `Monoid` you want.

When you do so you'll gain some sharing of intermediate values.

Other compression schemes may also be useful depending on your application.

## The Story So Far

We can open up such a machine and borrow its internal type of tabulations to generate anything that can be generated by such a machine in parallel or incrementally.

It is possible to run a `Representable` `Moore` machine directly on compressed inputs and pay proportionally to the size of the compressed data, not the decompressed data.

In category theory we have the notion of an adjunction.

Given two functors `F : D -> C`, and `G : C -> D`, when `F a -> b` is naturally isomorphic to `a -> G b` we describe this situation in one of several equivalent ways, we say that `F -| G`, `G` is right adjoint to `F`, or `F is left adjoint to G`, and if the categories matter, sometimes we'll write `F -| G :: C -> D`.

Different authors have slightly different conventions on the latter and may give the signature for F instead of G.

• The left (or right) adjoint of a functor is unique up to isomorphism if it exists at all.

• All adjunctions `F -| G :: C -> Hask` have the property that `G` is representable and `F ()` represents `G`

• Since adjoints are unique the fact that the right adjoint is isomorphic to `(F () -> a)` lets us go back across the `(,) (F ()) -| (->) (F ())` adjunction and use the uniqueness of adjoints to see `F a` is isomorphic to `(,) (F ()) a`. In other words, `F` contains exactly one `a`. On top of that, every left adjoint `F :: Hask -> Hask` looks like `F = (,) x` for some `x`.

• Representations of representable functors are unique up to isomorphism.

With all of these constraints, if we write down a class describing adjunctions from `Hask -> Hask` it is rather poorly inhabited! All instances of this class are isomorphic (for some `s`) to the canonical `(,) s -| (->) s` adjunction that gives rise to the `State` monad and `Store` comonad!

``````class (Functor f, Representable g) => f -| g where
leftAdjunct :: (f a -> b) -> a -> g b
rightAdjunct :: (a -> g b) -> f a -> b``````

First consider that if `f` and `g` are `Representable` then `Product f g` is isomorphic to `(->) (Either (Rep f) (Rep g))` and we can also look at a couple of our recently explored instances:

• `Identity` is isomorphic to `(->) ()`

• If `f` and `g` are `Representable` then `Compose f g` is isomorphic to `(->) (Rep f, Rep g)`

If you squint at these a bit the representation looks a lot like the "logarithm" of the data type in question. Exponents become products, products become sums, etc. Conor McBride is fond of calling representable functors "Napierian" and using `Log` instead of `Rep`.

Since every adjunction gives us a representable functor (the right adjoint), and every representable functor is the right adjoint in some adjunction, and all of this stuff is the same up to isomorphism, we could rephrase everything we just wrote in terms of an adjunction from `Hask -> Hask`.

We get a couple of options (all of equivalent expressive power) for how to arrange things:

``````Moore :: (f -| g) => g b -> f (a -> g (f ())) -> Moore a b
Moore :: (f -| g) => g b -> f (g (a -> f ())) -> Moore a b
Moore :: (f -| g) => f (g b) -> (a -> g (f ())) -> Moore a b``````

I think the last one of those is the most interesting. When `f -| g`, then `f·g` is a comonad and `g·f` is a monad. Here our Moore machine is pairing up some comonad with a function that gives a monadic action that is intimately related to that comonad.

Since all adjunctions from `Hask -> Hask` look like the representable cases we've already explored we haven't yet gained anything from this, but we could work with adjunctions that visit other categories than `Hask`.

## I'm My Own Grandpa

There is a nice adjunction from `Hask -> Hask`op that we use in Haskell a great deal.

Using backwards arrows to denote arrows in `Hask`op, any such adjunction would look like a statement that `f a <- b` is isomorphic to `a -> g b`. `f` and `g` here are contravariant functors, and if we turn this around we get that `b -> f a` is isomorphic to `a -> g b`.

``(-> r) -| (-> r)``

When composed in one direction we get a monad in `Hask`, when composed the other way around we get a comonad in `Hask`op. This is why `Cont r` has no comonadic equivalent. It is its own inverse.

`(b -> a -> r)` be isomorphic to `(a -> b -> r)`, and this is witnessed by `flip` in both directions.

## Less Like Moore

Now, we can mechanically massage that last definition of a Moore machine to go round trip through `Hask`op.

``Wat :: (b -> r -> r) -> a -> (() -> r) -> r``

The `() ->` in there adds no value, so we can apply it to `()` to get more or less Rich Hickey's notion of a transducer.

``type Transducer a b = forall r. (b -> r -> r) -> (a -> r -> r)``

Hickeys's transducers derive from the type signature of `foldl`:

``````>>> :t foldl
foldl :: (b -> a -> b) -> b -> [a] -> b
>>> :t foldl.foldl
foldl.foldl :: (b -> a -> b) -> b -> [[a]] -> b
...``````

and you can convert from that form to this form with a couple of judiciously placed `flip`s.

So in this sense a `Transducer` is a "generalized Moore machine". The generalization here is powerful enough to allow the transducer to emit multiple `b`s per `a`.

## Representing Transducers

But now we have functions to and from some arbitrary `r` in our `Transducer` definition and we can replay this same motivating trick we used to exploit representations on that definition as well.

``type Transducer a b = forall f. Representable f => (b -> f (Rep f)) -> a -> f (Rep f)``

which is equivalent to

``type Transducer a b = forall f. Representable f => (b -> Tab f) -> a -> Tab f``

In fact we can apply such a transducer to a Moore machine to get one that turns each `a` into potentially several `b`s, and makes that whole chain of transitions in the state diagram at once.

``````transduce :: Moore b c -> Transducer a b -> Moore a c
transduce (Moore k u s) t = Moore k (getD #. t (D #. u)) s``````

Finally, if you work for all tabulations of a function, nothing stops you from working for a monoid through `Endo m`, so you might as well just go to:

``type Transducer a b = forall f. Monoid m => (b -> m) -> a -> m``

but this is just what `lens` calls a `Fold`

``type Fold a b = forall f. (Contravariant f, Applicative f) => (b -> f b) -> a -> f a``

If you are (legally) both `Contravariant` and `Functor` then your argument must be phantom, and using `contramap` and `fmap` you can freely change it to anything you want, so this is isomorphic to the last definition.

## Open Thoughts

Similar changes can be applied to a coiterative comonad generated by a comonad, which looks very similar to the `Mealy` machine even if it has wildly different semantics. But given that coincidence, what does such a `Comonad` mean for a `Mealy` machine that has a `Monoid` on its input type? How would such a machine have to work? What does it do?

Just like we ultimately massaged the transducer into a form where it was obvious we could make the same machinery work for any `Monoid` once it supported tabulated endomorphisms, can do find a series of direct generalizations that take us from a `Moore` machine to one that uses an intermediate `Monoid`? Either like the `M` machine in `folds` or using a Monoid action to update the state instead of accepting just any new state.

We built a tabulation of a deterministic not-necessarily-finite automaton. What about a non-deterministic automaton? For that we can make a set of representations using the trie we get for a representable functor `f`.

``````newtype Set f = Set { getSet :: f Bool }

instance Representable f => Monoid (Set f) where
mempty = Set \$ tabulate (const False)
mappend (Set as) (Set bs) = Set \$ tabulate \$ \i -> index as i || index bs i

singleton :: (Representable f, Eq (Rep f)) => Rep f -> Set f
singleton i = Set \$ tabulate (i==)

insert :: (Representable f, Eq (Rep f)) => Rep f -> Set f -> Set f
insert i (Set is) = Set \$ tabulate \$ \j -> index is j || (i==j)

contains :: Set f -> Rep f -> Bool
contains (Set is) i = index is i ``````

but it'd probably be better to use a real "Set" in practice for most applications. You need something like `Foldable f` as well as `Representable` in order to make the `Monoid` for

``newtype N f = N { getN :: f (Set f) }``

This would start to limit the domain to at-most-countably-infinite automata.

Given that can we define a nice compiler that takes a regular expression builds an NFA state replete with the appropriate functor as it goes, and then converts it to a DFA?

The compressive parsing technique provided by `LZ78` above works for lots of monoids, not just this one.

For instance, we can modify the code in Efficient Parallel and Incremental Parsing of Practical Context-Free Languages to work with a `Monoid` rather than the notion of a sequence algebra they use there. (A sequence-algebra can be converted to `Monoid` by using a finger-tree to peel off one symbol of work on one side.)

This would let us parse context-free languages using this same machinery.

The main body of code here is available in this gist

(Also, the code above probably needs a couple of tweaks. Notably, it should use a strict pair for the internal state.)

-Edward Kmett May 28, 2015