Functional Sets, Part 1: Construction
Before we start for real, what’s a set?
Say we have the numbers 1 through 4.
In a list, that’s easy enough: [1, 2, 3, 4]
.
And, in a set, it’s still [1, 2, 3, 4]
.
The difference comes with duplicate elements: four ones in a list is [1, 1, 1, 1]
, but in a set that’s just [1]
.
Sets don’t respect order, either: [4, 3, 2, 1]
and [1, 2, 3, 4]
are the same set.
Instead of order, we talk about membership.
The set has four items: 1, 2, 3, and 4 (or, if you prefer: 4, 3, 2, and 1.)
You can use sets for tons of neat things! Relational databases use sets as the basis for operation, and TLA+ uses set logic to do formal verification of software. If you’d like to find out more about set theory later, Wikipedia has got your back.
What are our sets made of?
We’re going to implement our Sets with binary search trees. A binary search tree containing the numbers 1, 2, and 3 might look like this:
An element in this particular data structure has a head (the top node) and two sub-elements, or sub-trees. The left tree contains items less than the head, and the right tree contains items greater than the head. If there are no items less than or greater than the head, the trees below it are empty.
In the example above 2 is the head, with 1 on the left and 3 on the right. We don’t have any values other than those three, so the other subtrees are all empty.
This all sounds kiiiiind of complicated. Why are we using this particular data structure? Well, think about this: if a node can contain one value, and things greater and less than that value, then we have disallowed duplicate items. We didn’t have to write complicated code to do this, we just had to choose a structure that made impossible states (duplicate values) unrepresentable!
The Data Model
Enough theory! How does this look in actual code?
type Set comparable
= Tree comparable (Set comparable) (Set comparable)
| Empty
This is a type union with two tags: Tree
and Empty
.
We’ll use Empty
to represent an empty set, and Tree
to represent a non-empty set (with potentially non-empty left and right subtrees.)
Our Set
type takes a comparable
as part of the type.
comparable
refers to any type whose values we can compare, for example 1 < 2
or 'c' > 'a'
.
In Elm, this means we can use Int
, Float
, Time
, Char
, String
, and tuples or lists of those types.
This data structure can represent every tree imaginable. For example:
- An empty set:
Empty
- A set with just 1 in it:
Tree 1 Empty Empty
- Our example with 1, 2, and 3:
Tree 2 (Tree 1 Empty Empty) (Tree 3 Empty Empty)
But even with three values, creating these by hand gets a little hairy, so let’s make some basic constructor functions:
empty : Set comparable
empty =
Empty
singleton : comparable -> Set comparable
singleton item =
Tree item empty empty
That’s better. So now:
- An empty set:
empty
- A set with just 1 in it:
singleton 1
- A set with 1, 2, and 3:
Tree 2 (singleton 1) (singleton 3)
These functions make constructing simple sets easier, but what about more complex things? Even 1 through 3 is still too complex for everyday use! We don’t want to track whether to place things on the left or the right for every single set, right? Lucky for us, that’s where insertion comes in:
Inserting
We’ll want a function that will insert a new value into our set… so, being extremely creative we call it insert
.
Let’s dive into the code, and I’ll explain as we go:
insert : comparable -> Set comparable -> Set comparable
insert item set =
case set of
Empty ->
singleton item
Tree head left right ->
if item < head then
Tree head (insert item left) right
else if item > head then
Tree head left (insert item right)
else
set
First, we need to determine if the set is empty.
If we’ve got an empty list, we don’t need to worry about inserting on the left or the right; we can just create a new singleton
!
Otherwise, we need to compare the head with the new item:
- If the item we’re inserting is less than the head, we insert into the left set.
- If it’s greater, we insert into the right set.
- If the two values are equal, we do nothing. The item is already present, so don’t make any changes!
Then, after we’ve figured out where we insert, we’ll do it again for the next level until we get to either a level that doesn’t have a value (empty) or a value we’ve already seen.
Recursive code like this can be tricky to understand, so let’s visualize what’s happening.
Let’s say we want to insert 0
into a singleton set containing 1.
We’ll call insert 0 (singleton 1)
:
We compare to the singleton set in the pattern matching.
Now we’re comparing the item we’re inserting (0
) to the head of the current tree (1
):
We find that 0
is less than 1
, so insert the new item to the left.
This is the same as calling insert 0 Empty
, since the left node is empty.
This time around, the set in question is Empty
, so we’ll return a new singleton.
Our new set then bubbles up through the recursive calls.
It ends up replacing the left branch of the original set.
The original call returns with the modified set, and we’re finished.
List to Set
Now, that takes care of inserting a single value, but what if you have a list?
All it takes is List.foldl
!
fromList : List comparable -> Set comparable
fromList items =
List.foldl insert empty items
To put this in English: starting from the beginning, insert the items one at a time into a new set.
This is the functional equivalent of a for
loop with an accumulator.
Conversion function: complete!
So Far
To sum up: we’ve got:
- a recursively defined Set data structure
- constructors for both kinds of base values (empty sets and singleton sets)
- two ways to insert values into sets (
insert
andfromList
)
We’ve got a good foundation, but we have quite a way to go. Next time we’ll going to look at how to tell if a set contains a value. We’ll also cover how to measure how many values are in our set.