A whirlwind tour of conduits

As of March 2020, School of Haskell has been switched to read-only mode.

While talking with people on IRC, I've encountered enough confusion around conduits to realize that people may not know just how simple they are. For example, if you know how to use generators in a language like Python, then you know pretty much everything you need to know about conduits.

The basics

Let's take a look at them step-by-step, and I hope you'll see just how easy they are to use. We're also going to look at them without type signatures first, so that you get an idea of the usage patterns, and then we'll investigate the types and see what they mean.

Everything in conduit begins with the Source, which yields data as it is demanded. The dumbest possible form of source is an empty source:

empty = return ()

The next dumbest is a source that yields only a single value:

single = yield 1

In order to use any Source, I must ultimately connected it with a Sink. Sinks are nothing more than code which awaits values from a Source. Let's look at an example in Python, where these concepts are features of the language itself:

def my_generator():
    for i in range(1, 10):
        yield i

for j in my_generator():
    print j

Here we have a generator (aka Source): a function which simply yields values. This generator is being passed to for statement that consumes the values from it and binds them one by one to a variable j. It then prints each value after it is consumed.

The equivalent code using conduit employs a different syntax, but the general "shape" of the code is the same:

import Control.Monad
import Control.Monad.IO.Class (liftIO)
import Control.Monad.Loops (whileJust_)
import Data.Conduit

myGenerator = forM_ [1..9] yield

main = myGenerator $$
           whileJust_ await $ \j -> 
               liftIO $ print j

I can make the code a little bit closer to Python's example (making the call to await implicit) if I use Data.Conduit.List:

import Control.Monad
import Control.Monad.IO.Class (liftIO)
import Control.Monad.Loops (whileJust_)
import Data.Conduit
import qualified Data.Conduit.List as CL

myGenerator = forM_ [1..9] yield

main = myGenerator $$ 
           CL.mapM_ $ \j -> 
               liftIO $ print j

Just regular code

Neither Sources nor Sinks have to be special functions, however. They are just regular code written in the ConduitM monad transformer:

import Data.Conduit
import Control.Monad.IO.Class (liftIO)

main = do
    (do yield 10
        yield 20
        yield 30)
        $$
        (do liftIO . print =<< await
            liftIO . print =<< await
            liftIO . print =<< await
            liftIO . print =<< await)

Each time await is called, it returns a value that was yielded by the source wrapped in Just, or it returns Nothing to indicate the source has no more values to offer.

There, now you know the basics of the conduit library.

Conduits

Between sources and sinks, there is a third kind of conduit, which is actually called just Conduit. A Conduit sits between sources and sinks, and is able to call both yield and await, applying some kind of transformation or filter to the data coming from the source, before it reaches the sink. In order to use a Conduit, you must fuse it to either a source or a sink, creating a new source/sink which has the action of the Conduit bound to it. For example:

import Data.Conduit
import Control.Monad.IO.Class (liftIO)
import Control.Monad.Loops (whileJust_)

main = do
    (do yield 10
        yield 20
        yield 30)
        $=
        (do whileJust_ await $ \x ->
                yield (x * 2))
        $$
        (do liftIO . print =<< await
            liftIO . print =<< await
            liftIO . print =<< await
            liftIO . print =<< await)

This example fuses a conduit that doubles the incoming values from the source to its left. We could equivalently have fused it with the sink to the right. In most cases it doesn't matter whether you fuse to sources or to sinks; it mainly comes into play when you are using such fusion to create building blocks that will be used later.

Use the types, Luke

Now that we have the functionality of conducts down, let's take a look at their types so that any errors you may encounter are less confusing.

A source has the type Source m Foo, where m is the base monad and Foo is the type of what you want to pass to yield.

A sink has the corresponding type Sink m Foo a, to indicate that await returns values of type Maybe Foo, while the monadic operation of the sink returns a value of type a.

A conduit between these two would have type Conduit Foo m Foo.

You're probably going to see the type ConduitM in your types errors too, since the above three are all synonyms for it. It's a more general type that these three specialized types. The correspondences are:

type Source m o    = ConduitM () o m ()
type Sink i m r    = ConduitM i Void m r
type Conduit i m o = ConduitM i o m ()

The Void you see in there is just enforcing the fact that sinks cannot call yield.

What's next?

Beyond this, most of the conduit library is a bunch of combinators to make them more convenient to use. In a lot of cases, you can reduce conduit code down to something which is just as brief and succinct as what you might write in languages with native support for such operations. It's a testiment to Haskell, rather, that it doesn't need to be a syntactic feature to be both useful and concise.

And what about pipes, and the other competing libraries in this space? In many ways they are each equivalent to what I've described above. If you want to use pipes, just write respond and request instead of yield and await, and you're pretty much good to go! The operators for binding and fusing are different too, but what they accomplish is likewise the same.

If you're interested in learning more about conduit and how to use it, check out the author's own tutorial.