Functional Sets, Part 3: Balancing
Last time, we covered how rotation could keep our trees balanced. But how do we automatically apply our rotations?
Rotation Recap
When we rotate a tree, we move the subtrees and heads around so that the tree has a different root, but the same order. It looks like this:
Last time we created two functions, rotateLeft
and rotateRight
, to do this rotation for us.
AVL Trees
So, OK, we can rotate our trees.
That’s a nice first step!
But we don’t want to have to rotate left or right manually, we just want insert
to take care of it for us.
So how do we get from here to there?
Fortunately, tons of smart people have thought about this problem, and we have tons of options! Red-black trees, scapegoat trees, splay trees, treaps… but we’re going to use AVL trees (which take their name from their inventors: Adelson-Velsky and Landis.)
AVL trees balance by making sure the height of each subtree is about the same. In our example from before, the balanced tree had a height of 2, while the unbalanced tree had a height of 5.
Take note that we’re using the number of levels of nodes to define the height, not how many nodes there are (this is called the weight.) This means that:
- An empty set has a height of 0.
- A singleton set has a height of 1.
- Anything else is the height of it’s tallest subtree, plus 1.
When we insert
, we’ll look at the new heights of the trees after insertion.
If the difference between them is greater than 1 (so if the left is taller than the right, or the reverse), we’ll rotate!
We want to keep track of these heights without recomputing them every time, so we’ll change our implementation of Set
to look like this:
type Set comparable
= Tree Int comparable (Set comparable) (Set comparable)
| Empty
The Int
as the new first field of Tree
will hold the tree’s height.
We’ll need to refer to this field often, let’s write a quick function to get the height of any tree:
height : Set comparable -> Int
height set =
case set of
Empty ->
0
Tree height _ _ _ ->
height
Now we change our constructor functions.
empty
can stay the same, but singleton will change to add a height:
singleton : comparable -> Set comparable
singleton item =
Tree 1 item empty empty
…and we’ll need an entirely new constructor, tree
:
tree : comparable -> Set comparable -> Set comparable -> Set comparable
tree head left right =
Tree
(max (height left) (height right) + 1)
head
left
right
We’ll use this in insert
in place of direct calls to Tree
:
insert : comparable -> Set comparable -> Set comparable
insert item set =
case set of
Empty ->
singleton item
Tree _ head left right ->
if item < head then
tree head (insert item left) right
else if item > head then
tree head left (insert item right)
else
set
And finally, we need to change rotateLeft
and rotateRight
:
rotateLeft : Set comparable -> Set comparable
rotateLeft set =
case set of
Tree _ head lessThans (Tree _ subHead betweens greaterThans) ->
tree subHead (tree head lessThans betweens) greaterThans
_ ->
set
rotateRight : Set comparable -> Set comparable
rotateRight set =
case set of
Tree _ head (Tree _ subHead lessThans betweens) greaterThans ->
tree subHead lessThans (tree head betweens greaterThans)
_ ->
set
That’s all we need to do to set the stage! Now that we track of balance, we can rotate during insertion to keep the tree balanced. This takes less work that you might think!
Our Trigger: Height Difference
Remember that we measure the balance of a tree as the height of the right subtree minus the height of the left subtree. A tree is a valid AVL tree if the height is -1, 0, or 1. So, to make our trees valid AVL trees, we’ll rotate if the balance is outside that range.
We can get the height difference for a set with a new function:
diff : Set comparable -> Int
diff set =
case set of
Empty ->
0
Tree _ _ left right ->
height right - height left
This means that if the tree leans to the right the value will be positive, and to the left the value will be negative.
The Simplest Case: 2 or More
To balance our tree, we’ll first need to handle if the height difference is greater than one.
Say we insert the letters A through C, in order.
This is not a valid AVL tree, since the balance is 2 (a height of two on the right minus 0 on the left.) Since the tree is leaning right we’ll rotate left, so that B is at the root.
Now the height of the left and the right subtrees is the same, so the difference is 0. This will make our tree valid again.
We’ll implement our initial balancing function like this:
balance : Set comparable -> Set comparable
balance set =
case set of
Empty ->
set
Tree _ head left right ->
if diff set < -1 then
rotateRight set
else if diff set > 1 then
rotateLeft set
else
set
An empty set is already balanced, so we return it immediately. Same for a set whose difference falls between the valid range of -1 and 1. Otherwise, we rotate!
We’ll also change insert
to call balance
after inserting:
insert : comparable -> Set comparable -> Set comparable
insert item set =
case set of
Empty ->
singleton item
Tree _ head left right ->
if item < head then
tree head (insert item left) right |> balance
else if item > head then
tree head left (insert item right) |> balance
else
set
Notice that we’re only balancing the tree after insertion. This will make sure that each level of our tree is balanced from the bottom up, and only if any levels need it.
Left- and Right-Leaning Sets
This seems great so far, but there’s one more tree shape we need to consider. What if we inserted A, C, then B?
The difference would be 2, but if we rotated right it would swap to -2!
In this case, we need to rotate the tree at C right, then the tree at A left. The right rotation at C will create our unbalanced A-B-C set from above, then the left rotation at A will move B to the root, balancing the tree. If the tree were leaning the other way, we’d just reverse the operation.
The code looks like this:
balance : Set comparable -> Set comparable
balance set =
case set of
Empty ->
set
Tree _ head left right ->
if diff set == -2 && diff left == 1 then
-- left leaning tree with right-leaning left subtree.
tree head (rotateLeft left) right |> rotateRight
else if diff set < -1 then
rotateRight set
else if diff set == 2 && diff right == -1 then
-- right leaning tree with left-leaning right subtree.
tree head left (rotateRight right) |> rotateLeft
else if diff set > 1 then
rotateLeft set
else
set
Whew, that function has quite a few conditionals!
Generally, we’re examining the difference in each tree, and the difference in the relevant subtrees.
If we find that the difference is 2 (or -2) and the difference in the subtree is -1 (or 1) we’ve hit our problem.
We send the subtree to rotateLeft
or rotateRight
before rotating the whole tree in question.
It looks sort of messy, but these 9 lines of code will maintain a valid AVL tree (and fast insert/lookup operations) in every situation.
We’re done with balancing and insert
!
Make Your Own Trees, Again
Now you can try this out for yourself. This is the same visualization from last time, but with balancing turned on. Insert values to see how the balancing code reacts and keeps everything tuned up.
Things to play around with:
- Insert three values in order.
- Insert three values so that you would get a leaning tree (like A C B.) Check out how the tree balances itself.
- Create some trees with the same values in different insertion order. Notice how the composition of the tree will change with different insertion orders.