Interactive code snippets not yet available for SoH 2.0, see our Status of of School of Haskell 2.0 blog post

# Sum Types

16 Feb 2013

``**Question**: What do all these languages have in common:    ``
• C
• C++
• C#
• Java
• Javascript
• Objective C
• Perl
• PHP
• Python
• Ruby

Answer: These languages all lack user-defined sum types

If you don't know what a sum type is then you are not alone, as many people who never stray from mainstream languages unknowingly miss out on this incredibly useful and fundamental language feature.

# Boolean Types

In Haskell, the simplest sum type is the `Bool` type:

``data Bool = False | True``

This type definition says that a `Bool` can be one of two values: either `False` or `True`. A "sum type" is any type that has multiple possible representations, and we use `|` to separate each representation. We can use these representations in our functions as if they were ordinary values:

``````isZero :: Int -> Bool
isZero 0 = True
isZero _ = False

toBeOrNotToBe :: Bool -> String
toBeOrNotToBe True  = "Be"
toBeOrNotToBe False = "Not to be"

main = putStrLn (toBeOrNotToBe True)``````

Notice how `Bool` is not a built in type. Rather, we define it within the Haskell language as an ordinary unprivileged type right here. This already departs significantly from mainstream languages, which must either:

• interpret boolean values as integers, or
• add language support for booleans

`Bool` is not an `Int`, which means that we cannot accidentally use a `Bool` where we expect an `Int` or vice versa:

``main = print (True + 1)  -- Type error!``

We gain type safety when the language lets us define our own sum types rather than faking it with existing types like integers or strings.

Also, `Bool` has only two representations: `True` and `False`. In contrast, if we define a `Bool` in terms of an integer, we typically have multiple representations for `True` (any non-zero number), and we must define and remember the mapping between integers and boolean values.

# Constructors

We can define our own types, so why not define a safe wrapper around division?

``````data DivisionResult = DivisionByZero | Success Double
deriving (Show)

safeDivide :: Double -> Double -> DivisionResult
safeDivide x y =
if y == 0
then DivisionByZero
else Success (x / y)

-- Try dividing by zero
main = print (safeDivide 4 2)``````

This time we add a small twist to our sum type! If the division succeeds, we return the successful value. If the division fails, we return no value.

Each alternative in a sum type is a constructor, and these constructors can hold values. In the above example, `DivisionByZero` and `Success` are the two alternative constructors:

• `DivisionByZero` is an empty constructor that holds no values
• `Success` is a non-empty constructor that holds one value of type `Double`

We only return a value if we succeed, so we guarantee that anybody that consumes the result of our function cannot access an incorrect or meaningless value, either accidentally or intentionally, if our function fails.

# Pattern Matching

We use sum types by "pattern matching" on them. For example, we might want to convert the result of safe division to a `String`:

``````data DivisionResult = DivisionByZero | Success Double
deriving (Show)

safeDivide :: Double -> Double -> DivisionResult
safeDivide x y =
if y == 0
then DivisionByZero
else Success (x / y)

-- show
divisionToString :: DivisionResult -> String
divisionToString r = case r of
DivisionByZero -> "Division failed"
Success x      -> show x

main = putStrLn \$ divisionToString \$ safeDivide 4 2
-- /show``````

The `case` syntax is the fundamental way to pattern match. You provide it a value to inspect, and you define a result for each constructor that you might receive. In the above example, we told `case` to inspect the `r` value, which had type `DivisionResult`. This means that `r` could either be a `DivisionByZero` or a `Success` constructor with a value bound inside.

``````divisionToString r = case r of
{-hi-}DivisionByZero{-/hi-} -> "Division failed"
{-hi-}Success x{-/hi-}      -> show x``````

`case` lets you match against non-empty constructors like `Success` and extract the value stored inside. This ensures that the only way we can access the value is if division succeeded. This protects against run-time failures where we try to access a non-existent result if division were to fail. This differs from other languages which will provide a result no matter what, but then also supply an additional flag signifying whether or not that result is valid.

``````wrongDivide :: Double -> Double -> (Bool, Double)
wrongDivide x y = (y == 0, x / y)

-- Oops, we we were lazy and didn't check the boolean flag:
main = let (_, r) = wrongDivide 2 0 in print r``````

Also, pattern matching forces us to handle all possible alternatives. We can only use a value of type `DivisionResult` if we handle both the success and failure scenarios. Contrast this with some programming languages which use sentinel values of the same type as the result to signal failure, which can easily be ignored by the programmer or are indistinguishable from legitimate results:

``````wrongDivide :: Double -> Double -> Double
wrongDivide x y =
if (y == 0)
then 0
else x / y

-- Does 0 mean it failed or computed 0?
main = print (wrongDivide 0 5)``````

More sophisticated mainstream languages will signal errors out of band using exceptions, but these are implemented as a language feature, rather than as an ordinary type. Notice a common pattern: mainstream languages require language extensions to implement features that ordinary sum types would have provided.

# Syntactic Sugar

Haskell offers syntactic sugar for pattern matching where you can instead write multiple function definitions, one for each constructor alternative:

``````data DivisionResult = DivisionByZero | Success Double
deriving (Show)

safeDivide :: Double -> Double -> DivisionResult
safeDivide x y =
if y == 0
then DivisionByZero
else Success (x / y)

-- show
-- Old version
divisionToString :: DivisionResult -> String
divisionToString r = case r of
DivisionByZero -> "Division failed"
Success x      -> show x

-- New version
divisionToString2 :: DivisionResult -> String
divisionToString2 DivisionByZero = "Division failed"
divisionToString2 (Success x)    = show x

main = putStrLn \$ divisionToString2 \$ safeDivide 4 2
-- /show``````

The compiler desugars new version into a form identical to the old version.

We can use this syntactic sugar to implement an elegant version of the `not` function:

``````import Prelude hiding (not)

-- show
not :: Bool -> Bool
not False = True
not True  = False

main = print (not True)
-- /show``````

# Either

Haskell defines the `Either` type, which is also a sum type:

``data Either a b = Left a | Right b``

The `Either` type is the canonical sum type from which we can build all other sum types. In fact, computer scientists often use `(+)` to refer to `Either`.

We call `Either` THE sum type because if the `a` type is inhabited by `M` possible values, and the `b` type is inhabited by `N` possible values, then `Either a b` is inhabited by `M + N` possible values.

Let's use a simple example to convince ourselves of this. If the `Bool` type is inhabited by `2` values, and the `()` type is inhabited by `1` value, then the `Either Bool ()` type must be inhabited by `2 + 1 = 3` values:

``````val1 :: Either Bool ()
val1 = Left False

val2 :: Either Bool ()
val2 = Left True

val3 :: Either Bool ()
val3 = Right ()``````

There are exactly three unique values that have type `Either Bool ()`: two values of type `Bool` wrapped within a `Left` constructor and one value of type `()` wrapped within a `Right` constructor, for a total of three values.

We can simulate any type with multiple constructors using `Either`. For example, we could simulate `Bool` using `Either () ()`:

``````type Bool = Either () ()

false :: Bool
false = Left ()

true :: Bool
true = Right ()``````

Note that the `()` value is inhabited by exactly one value: `()`, therefore `Either () ()` must have `1 + 1 = 2` possible values, just like `Bool`.

If we had a type with more than two constructors:

``data Trool = False | True | DontKnow``

... we can still simulate it using `Either`:

``````type Trool = (Either () (Either () ()))
-- i.e. Trool = 1 + (1 + 1) = 3

false :: Trool
false = Left ()

true :: Trool
true = Right (Left ())

dontKnow :: Trool
dontKnow = Right (Right ())``````

# Product Types

If we have sum types, then perhaps we also have product types, too. A product type is just a tuple, or a constructor with more than one argument:

``````-- A product of an Integer and  String
(4, "Hello") :: (Integer, String)

-- A data type that is a product of a Char, an Integer, and Bool
data Multiple = M Char Integer Bool``````

Exercise: Why do you think they are called product types?

Exercise: What do you think the "canonical" product is?

Almost every mainstream language has some sort of product type:

C:

``````// A product of a double and a double
struct point {
double x;
double y;
};``````

Python:

``````# float x
# float y
# A product of a float and a float
(x, y)``````

Java:

``````// The product of a double and a double
class Point {
double x;
double y;
}``````

In other words, mainstream languages are rich in product types, yet conspicuously deficient in sum types. Notice that product types lack the ability to:

• create new primitive data types that are not numbers, bools, or strings
• represent a choice between two alternatives

# Conclusion

Many language features that you would normally expect a language author to implement are easily written using sum types. In fact, languages that provide sum types tend to have smaller and more orthogonal feature sets since they provide users much greater flexibility to implement their own features.

Once you understand the usefulness of sum types, you will wonder why mainstream languages go to remarkable lengths in order to avoid implementing them. Sum types are arguably the most striking feature missing from most languages.