Unlifted Structures

An ArrayArray# is just an Array# with a modified invariant. It points directly to other unlifted ArrayArray#'s or ByteArray#'s.

While those live in #, they are garbage collected objects, so this all lives on the heap.

They were added to make some of the Data Parallel Haskell stuff fast when it has to deal with nested arrays.

I'm currently abusing them as a placeholder for a better thing.

Edit: In the reddit thread for this post I wound up explaining a bit about what all the fiddly #'s mean. If you are having trouble following along, you might want to start there.

The Problem

Consider the scenario where you naively write a classic doubly-linked list (DLL) in Haskell.

data DLL = DLL (IORef (Maybe DLL)) (IORef (Maybe DLL))

Chasing from one DLL to the next requires following 3 pointers on the heap!

DLL ~> IORef (Maybe DLL) ~> MutVar# RealWorld (Maybe DLL) ~> Maybe DLL ~> DLL

That is 3 levels of indirection!

We can trim one easily by simply unpacking the IORef with -funbox-strict-fields or UNPACK.

We can trim another by adding a Nil constructor to DLL and worsening our representation.

data DLL = DLL !(IORef DLL) !(IORef DLL) | Nil

but now we're still stuck with one level of indirection

DLL ~> MutVar# RealWorld DLL ~> DLL

This means that every operation we perform on this structure will be about half of the speed of an implementation in most other languages assuming we're memory bound on loading things into cache!

Making Progress

I have been working on a number of data structures where the indirection of going from something in * out to an object in # which contains the real pointer to my target and coming back effectively doubles my runtime.

We need to go out to the MutVar# because we are allowed to put the MutVar# onto the mutable list when we dirty it. There is a well defined write-barrier there.

I could change out the representation to use

data DLL = DLL (MutableArray# RealWorld DLL) | Nil

I can just store two pointers in the MutableArray# every time, but this doesn't help much directly. It has reduced the amount of distinct addresses in memory from 3 per object to 2.

I still have to go out to the heap from my DLL and get to the array object and then chase it to the next DLL and chase that to the next array. I do get my two pointers together in memory though. I'm paying for a card marking table as well, which I don't particularly need with just two pointers, but we can shed that with the SmallMutableArray# machinery added back in 7.10, which is just the old array code as a new data type, which can speed things up a bit when you don't have very big arrays:

data DLL = DLL (SmallMutableArray# RealWorld DLL) | Nil

But what if I wanted my object itself to live in # and have two mutable fields and be able to share the same write barrier?

An ArrayArray# points directly to other unlifted array types. What if we have one # -> * wrapper on the outside to deal with the impedence mismatch between the imperative world and Haskell, and then just let the ArrayArray#'s hold other ArrayArray#s?

data DLL = DLL (MutableArrayArray# RealWorld)

now I need to make up a new Nil, which I can just make be a special MutableArrayArray# I allocate on program startup. I can even abuse pattern synonyms. Alternately I can exploit the internals further to make this cheaper.

Then I can use the readMutableArrayArray# and writeMutableArrayArray# calls to directly access the preceding and next entry in the linked list.

So now we have one DLL wrapper which just 'bootstraps me' into a strict world, and everything there lives in #.

next :: DLL -> IO DLL
next (DLL m) = IO $ \s -> case readMutableArrayArray# s of 
   (# s', n #) -> (# s', DLL n #)

It turns out GHC is quite happy to optimize all of that code to keep things unboxed. The DLL wrappers get removed pretty easily when they are known strict and you chain operations of this sort!

In each of these calls we leap from box to box like Maru the cat, and it is the compiler's job to make the boxes all go away.

With this I've made a strict little universe and shoved it out on the heap.

But as it stands, we don't have a portal back to the "real world".

Unboxed

Now I have one outermost indirection pointing to an array that points directly to other arrays.

I'm stuck paying for a card marking table per object, but I can fix that by duplicating the code for MutableArrayArray# and using a SmallMutableArray#. I can hack up primops that let me store a mixture of SmallMutableArray# fields and normal ones in the data structure. Operationally, I can just unsafeCoerce# the existing SmallMutableArray# primitives to change the kind of one of the arguments it takes!

This is almost ideal, but not quite. I often have fields that would be best left unboxed.

data DLL = DLL !Int !(IORef DLL) !(IORef DLL) | Nil

was able to unpack the Int, but we lost that. We can currently at best point one of the entries of the SmallMutableArray# at a boxed or add a MutableByteArray# for all of our misc. data and shove the Int in question in there.

If I were to implement a HAMT, like HashMap this way I need to store masks and administrivia as I walk down the tree. Having to go off to the side costs me almost the entire win from avoiding the first pointer chase!

The other day I posted about this to the ghc-devs@ mailing list, and Ryan Yates suggested that he may be able to help.

If we had a heap object we could construct that had n words with unsafe access and m pointers to other heap objects, one that could put itself on the mutable list when any of those pointers changed then I could shed this last factor of two in all circumstances.

Prototype

Over the last few days I've put together a small prototype implementation with a few non-trivial imperative data structures for things like Tarjan's link-cut trees, the list labeling problem and order-maintenance.

https://github.com/ekmett/structs

Notable bits:

Data.Struct.Internal.LinkCut provides an implementation of link-cut trees in this style.

Data.Struct.Internal provides the rather horrifying guts that make it go fast.

Once compiled with -O or -O2, if you look at the core, almost all the references to the LinkCut or Object data constructor get optimized away, and we're left with beautiful strict code directly mutating our underlying representation.

-Edward Kmett

August 27th, 2015

comments powered by Disqus