ArrayArray# is just an
Array# with a modified invariant. It points directly to other unlifted
While those live in
#, they are garbage collected objects, so this all lives on the heap.
They were added to make some of the Data Parallel Haskell stuff fast when it has to deal with nested arrays.
I'm currently abusing them as a placeholder for a better thing.
Edit: In the reddit thread for this post I wound up explaining a bit about what all the fiddly
#'s mean. If you are having trouble following along, you might want to start there.
Consider the scenario where you naively write a classic doubly-linked list (DLL) in Haskell.
data DLL = DLL (IORef (Maybe DLL)) (IORef (Maybe DLL))
Chasing from one DLL to the next requires following 3 pointers on the heap!
DLL ~> IORef (Maybe DLL) ~> MutVar# RealWorld (Maybe DLL) ~> Maybe DLL ~> DLL
That is 3 levels of indirection!
We can trim one easily by simply unpacking the IORef with
We can trim another by adding a
Nil constructor to
DLL and worsening our representation.
data DLL = DLL !(IORef DLL) !(IORef DLL) | Nil
but now we're still stuck with one level of indirection
DLL ~> MutVar# RealWorld DLL ~> DLL
This means that every operation we perform on this structure will be about half of the speed of an implementation in most other languages assuming we're memory bound on loading things into cache!
I have been working on a number of data structures where the indirection of going from something in
* out to an object in
# which contains the real pointer to my target and coming back effectively doubles my runtime.
We need to go out to the
MutVar# because we are allowed to put the
MutVar# onto the mutable list when we dirty it. There is a well defined write-barrier there.
I could change out the representation to use
data DLL = DLL (MutableArray# RealWorld DLL) | Nil
I can just store two pointers in the
MutableArray# every time, but this doesn't help much directly. It has reduced the amount of distinct addresses in memory from 3 per object to 2.
I still have to go out to the heap from my
DLL and get to the array object and then chase it to the next
DLL and chase that to the next array. I do get my two pointers together in memory though. I'm paying for a card marking table as well, which I don't particularly need with just two pointers, but we can shed that with the
SmallMutableArray# machinery added back in 7.10, which is just the old array code as a new data type, which can speed things up a bit when you don't have very big arrays:
data DLL = DLL (SmallMutableArray# RealWorld DLL) | Nil
But what if I wanted my object itself to live in # and have two mutable fields and be able to share the same write barrier?
ArrayArray# points directly to other unlifted array types. What if we have one
# -> * wrapper on the outside to deal with the impedence mismatch between the imperative world and Haskell, and then just let the
ArrayArray#'s hold other
data DLL = DLL (MutableArrayArray# RealWorld)
now I need to make up a new
Nil, which I can just make be a special
MutableArrayArray# I allocate on program startup. I can even abuse pattern synonyms. Alternately I can exploit the internals further to make this cheaper.
Then I can use the
writeMutableArrayArray# calls to directly access the preceding and next entry in the linked list.
So now we have one
DLL wrapper which just 'bootstraps me' into a strict world, and everything there lives in
next :: DLL -> IO DLL next (DLL m) = IO $ \s -> case readMutableArrayArray# s of (# s', n #) -> (# s', DLL n #)
It turns out GHC is quite happy to optimize all of that code to keep things unboxed. The
DLL wrappers get removed pretty easily when they are known strict and you chain operations of this sort!
In each of these calls we leap from box to box like Maru the cat, and it is the compiler's job to make the boxes all go away.
With this I've made a strict little universe and shoved it out on the heap.
But as it stands, we don't have a portal back to the "real world".
Now I have one outermost indirection pointing to an array that points directly to other arrays.
I'm stuck paying for a card marking table per object, but I can fix that by duplicating the code for
MutableArrayArray# and using a
SmallMutableArray#. I can hack up primops that let me store a mixture of
SmallMutableArray# fields and normal ones in the data structure. Operationally, I can just
unsafeCoerce# the existing
SmallMutableArray# primitives to change the kind of one of the arguments it takes!
This is almost ideal, but not quite. I often have fields that would be best left unboxed.
data DLL = DLL !Int !(IORef DLL) !(IORef DLL) | Nil
was able to unpack the
Int, but we lost that. We can currently at best point one of the entries of the
SmallMutableArray# at a boxed or add a
MutableByteArray# for all of our misc. data and shove the
Int in question in there.
If I were to implement a HAMT, like
HashMap this way I need to store masks and administrivia as I walk down the tree. Having to go off to the side costs me almost the entire win from avoiding the first pointer chase!
The other day I posted about this to the
ghc-devs@ mailing list, and Ryan Yates suggested that he may be able to help.
If we had a heap object we could construct that had n words with unsafe access and m pointers to other heap objects, one that could put itself on the mutable list when any of those pointers changed then I could shed this last factor of two in all circumstances.
Over the last few days I've put together a small prototype implementation with a few non-trivial imperative data structures for things like Tarjan's link-cut trees, the list labeling problem and order-maintenance.
Data.Struct.Internal.LinkCut provides an implementation of link-cut trees in this style.
Data.Struct.Internal provides the rather horrifying guts that make it go fast.
Once compiled with -O or -O2, if you look at the core, almost all the references to the
Object data constructor get optimized away, and we're left with beautiful strict code directly mutating our underlying representation.
August 27th, 2015