Functional Sets, Part 3: Balancing

Elm 0.18 · Updated February 15, 2018 · 8 Minute Read · ∞ Permalink


Last time, we covered how rotation could keep our trees balanced. But how do we automatically apply our rotations?

Rotation Recap

When we rotate a tree, we move the subtrees and heads around so that the tree has a different root, but the same order. It looks like this:

A Tree Rotating Left and Right

A Tree Rotating Left and Right Wikipedia

Last time we created two functions, rotateLeft and rotateRight, to do this rotation for us.

AVL Trees

So, OK, we can rotate our trees. That’s a nice first step! But we don’t want to have to rotate left or right manually, we just want insert to take care of it for us. So how do we get from here to there?

Fortunately, tons of smart people have thought about this problem, and we have tons of options! Red-black trees, scapegoat trees, splay trees, treaps… but we’re going to use AVL trees (which take their name from their inventors: Adelson-Velsky and Landis.)

AVL trees balance by making sure the height of each subtree is about the same. In our example from before, the balanced tree had a height of 2, while the unbalanced tree had a height of 5.

A balanced tree of 1 through 5 has a height of 2.

A balanced tree of 1 through 5 has a height of 2.

Take note that we’re using the number of levels of nodes to define the height, not how many nodes there are (this is called the weight.) This means that:

When we insert, we’ll look at the new heights of the trees after insertion. If the difference between them is greater than 1 (so if the left is taller than the right, or the reverse), we’ll rotate!

We want to keep track of these heights without recomputing them every time, so we’ll change our implementation of Set to look like this:

type Set comparable
    = Tree Int comparable (Set comparable) (Set comparable)
    | Empty

The Int as the new first field of Tree will hold the tree’s height. We’ll need to refer to this field often, let’s write a quick function to get the height of any tree:

height : Set comparable -> Int
height set =
    case set of
        Empty ->
            0

        Tree height _ _ _ ->
            height

Now we change our constructor functions. empty can stay the same, but singleton will change to add a height:

singleton : comparable -> Set comparable
singleton item =
    Tree 1 item empty empty

…and we’ll need an entirely new constructor, tree:

tree : comparable -> Set comparable -> Set comparable -> Set comparable
tree head left right =
    Tree
        (max (height left) (height right) + 1)
        head
        left
        right

We’ll use this in insert in place of direct calls to Tree:

insert : comparable -> Set comparable -> Set comparable
insert item set =
    case set of
        Empty ->
            singleton item

        Tree _ head left right ->
            if item < head then
                tree head (insert item left) right
            else if item > head then
                tree head left (insert item right)
            else
                set

And finally, we need to change rotateLeft and rotateRight:

rotateLeft : Set comparable -> Set comparable
rotateLeft set =
    case set of
        Tree _ head lessThans (Tree _ subHead betweens greaterThans) ->
            tree subHead (tree head lessThans betweens) greaterThans

        _ ->
            set


rotateRight : Set comparable -> Set comparable
rotateRight set =
    case set of
        Tree _ head (Tree _ subHead lessThans betweens) greaterThans ->
            tree subHead lessThans (tree head betweens greaterThans)

        _ ->
            set

That’s all we need to do to set the stage! Now that we track of balance, we can rotate during insertion to keep the tree balanced. This takes less work that you might think!

Our Trigger: Height Difference

Remember that we measure the balance of a tree as the height of the right subtree minus the height of the left subtree. A tree is a valid AVL tree if the height is -1, 0, or 1. So, to make our trees valid AVL trees, we’ll rotate if the balance is outside that range.

We can get the height difference for a set with a new function:

diff : Set comparable -> Int
diff set =
    case set of
        Empty ->
            0

        Tree _ _ left right ->
            height right - height left

This means that if the tree leans to the right the value will be positive, and to the left the value will be negative.

The Simplest Case: 2 or More

To balance our tree, we’ll first need to handle if the height difference is greater than one.

Say we insert the letters A through C, in order.

A through C, leaning right

A through C, leaning right

This is not a valid AVL tree, since the balance is 2 (a height of two on the right minus 0 on the left.) Since the tree is leaning right we’ll rotate left, so that B is at the root.

A through C, balanced

A through C, balanced

Now the height of the left and the right subtrees is the same, so the difference is 0. This will make our tree valid again.

We’ll implement our initial balancing function like this:

balance : Set comparable -> Set comparable
balance set =
    case set of
        Empty ->
            set

        Tree _ head left right ->
            if diff set < -1 then
                rotateRight set
            else if diff set > 1 then
                rotateLeft set
            else
                set

An empty set is already balanced, so we return it immediately. Same for a set whose difference falls between the valid range of -1 and 1. Otherwise, we rotate!

We’ll also change insert to call balance after inserting:

insert : comparable -> Set comparable -> Set comparable
insert item set =
    case set of
        Empty ->
            singleton item

        Tree _ head left right ->
            if item < head then
                tree head (insert item left) right |> balance
            else if item > head then
                tree head left (insert item right) |> balance
            else
                set

Notice that we’re only balancing the tree after insertion. This will make sure that each level of our tree is balanced from the bottom up, and only if any levels need it.

Left- and Right-Leaning Sets

This seems great so far, but there’s one more tree shape we need to consider. What if we inserted A, C, then B?

A through C, unbalanced but with a "between" value (B)

A through C, unbalanced but with a "between" value (B)

The difference would be 2, but if we rotated right it would swap to -2!

In this case, we need to rotate the tree at C right, then the tree at A left. The right rotation at C will create our unbalanced A-B-C set from above, then the left rotation at A will move B to the root, balancing the tree. If the tree were leaning the other way, we’d just reverse the operation.

The code looks like this:

balance : Set comparable -> Set comparable
balance set =
    case set of
        Empty ->
            set

        Tree _ head left right ->
            if diff set == -2 && diff left == 1 then
                -- left leaning tree with right-leaning left subtree.
                tree head (rotateLeft left) right |> rotateRight
            else if diff set < -1 then
                rotateRight set
            else if diff set == 2 && diff right == -1 then
                -- right leaning tree with left-leaning right subtree.
                tree head left (rotateRight right) |> rotateLeft
            else if diff set > 1 then
                rotateLeft set
            else
                set

Whew, that function has quite a few conditionals! Generally, we’re examining the difference in each tree, and the difference in the relevant subtrees. If we find that the difference is 2 (or -2) and the difference in the subtree is -1 (or 1) we’ve hit our problem. We send the subtree to rotateLeft or rotateRight before rotating the whole tree in question.

It looks sort of messy, but these 9 lines of code will maintain a valid AVL tree (and fast insert/lookup operations) in every situation. We’re done with balancing and insert!

Make Your Own Trees, Again

Now you can try this out for yourself. This is the same visualization from last time, but with balancing turned on. Insert values to see how the balancing code reacts and keeps everything tuned up.

Things to play around with: