In this part we’ll cover so called ‘trees’ in computer science.
Trees
The formal definition of a ‘tree’ is: “A hierarchical data structure built up from nodes”.
So unlike a real tree which has its root at the bottom - the computer scientist flipped it - the root node is at the top of the tree.
In any tree there’s always one root node. Each node can have so-called child nodes, and these children can have their own child nodes and so on.
Therefore, all nodes - expect the root node - must have exactly one parent node.
Tree rules
I’ve already stated some of the ‘rules’ of a tree, but generally we have 3 rules that makes a collection of nodes a tree.
- Each node must have exactly one unique parent node - expect the root node.
- There must exactly be one root node.
- If the first and second rules are followed - then no ‘cycles’ will appear.
Recursive Tree
Trees can also be defined recursively: “A tree consists of a root node together with its subtrees, which do not share any common nodes”.
This means, child node (of the root node) $\rightarrow$ root of subtree. This also means that a tree can be empty (no nodes) - for some applications this is needed.
Programming with trees
Now that we have some basic knowledge about trees - let’s try to represent them with code. A very natural representation would be a directory tree.
class Tree: name: string size: int
children: list[Tree]
If we now want to compute the total size of a directory (folder) we would need to:
def total(tree: Tree) -> int: n: int = tree.bytes for c in tree.children: n += total(c) return n
And we can see that recursion and trees go hand in hand. Now let’s see how we can traverse the tree.
There are main methods for traversing - Pre-order traversal - meaning you process each node before visiting its children. As well as Post-order traversal - meaning you process each node after visiting its children.
Binary Trees
Often we can use a special kind of tree called ‘Binary tree’ - a binary tree is a tree where each node has exactly two children.
As we said before, these children can be represented as a null-value if they’re ‘missing’. We call these children nodes the ‘left’ and ‘right’ nodes or subtrees.
Some terminology common used for trees are, size
height
, and level
.
Size is the number of nodes in the (sub)tree - height, is the number of levels
in the tree. Level is the distance from node to the root.
Binary Search Trees
Binary search trees or BSTs is a kind of binary tree but with some extra conditions:
- Each node has a key
- The node’s key is smaller than all the keys in its right subtree
- The node’s key is greater than all the keys in its left subtree
Or differently phrased, everything in its left subtree is less than the root, everything to the right is greater than the root.
With these rules searching becomes really, really effective and natural. We just compare our key to each node’s key and see if we traverse to the right or left.
If we ever hit a ‘null’ node, we can also say that the key we’re searching for doesn’t exist in this key.
So let us implement a map using BSTs!
class BSTMap[Key, Value]: class Node: key: Key value: Value
left, right: Node
root: Node
def get(key: Key) -> Value: return get(key, root)
def put(key : Key, val : Value): root = put(key, val, root)
def get(key, node) -> Value: if node is None return None
if key < node.key return get(key, node.left)
if key > node.key return get(key, node.right)
else: return node.value
def put(key, val, node): if node is None return Node(key = key, value = val, left = None, right = None)
elif key < node.key: node.left = put(key, value, node.left)
elif key > node.key: node.right = put(key, value, node.right)
else: node.value = value
return node
Fin max/min value in BST
To find the max/min value in BST we just repeatedly go right/left until we hit a null node. Real simple!
Or in code:
def get_max(tree: BST): c_node = tree.root
while c_node != None: c_node = c_node.right
return c_node.value
def get_min(tree: BST): c_node = tree.root
while c_node != None: c_node = c_node.left
return c_node.value
Sorting a BST
Sometimes we might want to print/store a BST in ascending order - the way to do this of course is via recursion!
This is also known as in order traversal - we print the left subtree, print the root then print the right subtree - we do this recursively.
So for example:
def print_sorted(node: Node): if node is None: return
print_sorted(node.left) print(node.key) print_sorted(node.right)
We can also do a more general case - we make an iterator for the keys in the tree - by doing an in order traversal and store the keys in a dynamic array.
def keys() -> list[Key] list = [] inorder(root, list) return list
def inorder(node: Node, list: list[Key]): if node is None: return
inorder(node.left, list) list.add(node.key) inorder(node.right, list)
...
for key in BST.keys(): print(key)
Deletion
Deletion can be quite tricky in trees, and especially in BSTs - we can’t just remove a node however, there are some rules.
Deleting a leaf
A so-called leaf is a node with no children (or visually the leafs are the bottom nodes). These are fine to remove without any corrections.
Since we just remove that child, with no children there won’t be any problems.
Deleting a node with one child
Now, in the case we want to delete a node with exactly one child, it’s still quite trivial. Since that node only has one child, we can just replace the deleted node with its child. It doesn’t affect the tree as a whole.
Deleting a node with two children
This is where it becomes tricky, but if we take a step back one think about it, we need to find a node that upholds the requirements of, everything to the left should be smaller and everything to the right must be greater. If we think what candidates fulfill this requirement - it becomes clear two nodes are good candidates.
The greatest (rightmost) key in the node’s left subtree or the smallest (leftmost) in the nodes’ right subtree.
The hardest case
The case above becomes quite simple if the candidates don’t have any children, the hardest case in deletion is when these have one child. What we need to do in that case is: We first delete the parent node by replacing it with its child - after we have done this we can delete the node we wanted to delete with our old (deleted) parent node.
So, not that hard logically - but can be a pain to implement correctly.
So let’s now do exactly that :)!
def delete(node: Node, key : Key): if node is None: return node
if key < node.key: node.left = delete(node.left, key)
elif(key > node.key): node.right = delete(node.right, key)
else: if node.left is None: temp = node.right node = None return temp
elif node.right is None: temp = node.left node = None return temp
temp = get_max(node.left)
node.key = temp.key
node.right = delete(node.right, temp.key)
return node
Complexity
The complexity of operations for BST are in the best case and average case, $\mathcal{O}(log\ n)$ which is great. In the worst case however, these operations will take $\mathcal{O}(n)$ since the items are in ascending order.
BST Summary
We can use BSTs to implement a set or map, however, with ordering-based operations such as min, max, range etc. with good complexity.
Insert/Delete/Search will take proportional time to the height of the tree. Which in the best/average case is $\mathcal{O}(log\ n)$, but if the tree is unbalanced it will become linear.
So the main takeaway is that, if a BST is balanced, it’s really powerful and efficient. So let’s take a look how we can make it balanced.
Invariants
The property that, in the left subtree, the keys should be lesser than and that in the right subtree, the keys should be greater than is an example of data structure invariant.
An invariant is a property that the designer wants to always hold. Usually a rule about what instances of the data structure are ‘valid’.
But these invariants can be expensive to maintain - if we say ‘this array is sorted’ we can’t just use append anymore, we need to insert it. So we need to find an invariant for BSTs which is easier to maintain.
AVL Trees
AVL Trees has a less restrictive invariant compared to BSTs - in an AVL tree we keep the tree almost balanced.
The invariant for an AVL tree is: “For each node, the heights of its left and right subtrees differ by at most 1”.
Tree Rotations
Now before we dive deeper into AVL trees - let’s take a look into rotations within trees. When we talk about rotations in trees we mean that it has a different node as the root.
Right rotation: left child of root becomes root, old root becomes right child. Left rotation: right child of root becomes root, old root becomes left child.
Tree rotations preserve ordering and contents in BSTs. Note that we can also rotate a subtree.
AVL Insertion
To insert a new node into an AVL tree - we first start by doing a BST insertion - this might break the AVL tree (the invariant).
But to fix this we go upwards from the new node, finding and fixing nodes that break the invariant. Note that during BST insertion, each node’s height either stays the same or goes up by 1.
We have different cases which we’ll look at.
Case 1: Left-left Tree
In the Left-left case, we did an insertion in the left child of the left child of this node, which broke the invariant.
To fix this, we just make a right rotation
Case 2: Right-right Tree
In the Right-right case, we did an insertion in the right child of the right child of this node, which broke the tree.
To fix we make a left rotation!
Case 3: Left-right Tree
In the Left-right case, the extra height is in the root’s left child then in the right child of that. We can’t fix it with one rotation instead - but luckily we still have a solution.
First we’re going to rotate the left subtree left. When we have performed the left rotation, we’ve now created a Left-left tree, which is fixed by one right rotation as we saw earlier.
Case 4: Right-left Tree
The same logic as case 3, the extra height is in the root’s right child and then in the left child of that one. To fix this we first make a right rotation in the right subtree - then make a final left rotation to fix the tree.
AVL Insertion Summary
So we have these four cases and, we know how to solve them - we just need to remember how we identify them. We identify which grandchild the extra unit of height is in.
AVL Deletion
In the case of deleting the node, the same situation occurs! But instead we reduce the height of some tree by 1, breaking the invariant.
So, therefore we can use the exact same balancing algorithm. So when implementing an AVL tree we only need to implement the balancing algorithm once!
In conclusion, AVL trees use rotation to keep the tree balanced. In the worst case height of the tree is $1.44\ log_2(n)$, but normally close to $log_2(n)$.
2-3 Trees
2-3 trees is also another height balanced trees. 2-3 trees is made from two kinds of nodes:
- 2 - node: Has a key
x
and two subtreesA
andB
. Just like in a BST, A < x < B - 3 - node: Has two keys
x
,y
with three subtrees, A, B, and C. As usual, A < x < B < y < C.
Searching in 2-3 Trees
It proceeds just as in a BST. Since - In a 2-node, compare search key against key in node, descend into one of the two children.
In the case of a 3-node, compare the search key against both keys in the node, descend into one of the three children.
How 2-3 Trees balance
In the AVL case we saw that we have to balance via rotations after each insertion/deletion. In the case of a 2-3 tree, we never have to balance since it’s always balanced.
Because if one child is null all children must be null together (you can test this if you want by drawing examples). This means the empty trees are all at one level (which implies a balanced tree).
The resulting tree will have a height of between $log_3(n)$ and $lg(n)$
Insertion in 2-3 trees
When we want to insert a new node to a 2-3 tree we use a normal insertion, however, this might break the tree via, for example, an unbalanced tree or a ‘4-node’.
To fix this we use the following algorithm:
- absorb the node into its parent, move up to the parent
- If the node is a 4-node, split it into 2-nodes.
We keep alternating these 2 steps, we stop once we don’t need to split.
Summary 2-3 trees
So conceptually they are quite simple - but the implementation is quite horrible, even though the complexity is great, since the tree is always perfectly balanced.
So to fix this we use something called a ‘red-black BSTs’.
Red-black BSTs
A red-black BSTs implement a 2-3 tree using a BST. We can see a 2-3 node as a BST tree since - a 2-node becomes a regular BST node, however, a 3-node becomes two BST nodes.
The name ‘red-black’ comes from that we color the BST nodes in order to distinguish 2-nodes from 3-nodes.
2-nodes become black and 3-nodes become black with red child. So red is ‘only part of a 3-node’.
So a Red-black BST is really just a 2-3 but store it in a BST since we translate it to the corresponding 2-3 tree.
Red-black Tree Invariant
The invariant that we must uphold for a red-black tree is:
- A red node may only occur as the left child of a black node
- Black balance: on any path from the root to a null, we pass through the same number of black nodes
- And the BST invariant must also hold.
Red-black BST Insertion
To make an insertion into a red-black tree, we first make a normal BST insertion, the newly added node is a red node.
This may break our invariant on red nodes, but this way we always keep the black balance and BST properties.
But this way we’ll only break the invariant by placing a red node in a forbidden place. There are 3 cases which this will happen.
Case 1: Skew
In a skew case, we have placed it to the right of a 2-node (black node) and therefore broken the first rule. We fix it by an operation called skewing.
Which is a kind of rotation, we’ll look at the implementation later.
Case 2: Split
In a split case we have created a 4-node (or a red having a red child), therefore we need to split it into 2-nodes. In the case of splitting, we may break the black balance which means we need to continue up recursively
Case 3: Red Root Node
After a split, we might end up with a red node as the root of the whole tree. In a case like this, the fix is simple, we just make the root a black node instead!
Red-black Tree Implementation
class RBNode: def __init__(self, val): self.red = False self.parent = None self.val = val self.left = None self.right = None
class RBTree: def __init__(self): self.nil = RBNode(0) self.nil.red = False self.nil.left = None self.nil.right = None self.root = self.nil
def insert(self, val): new_node = RBNode(val) new_node.parent = None new_node.left = self.nil new_node.right = self.nil
# New node must be red new_node.red = True
parent = None current = self.root while current != self.nil: parent = current if new_node.val < current.val: current = current.left elif new_node.val > current.val: current = current.right else: return
new_node.parent = parent if parent == None: self.root = new_node elif new_node.val < parent.val: parent.left = new_node else: parent.right = new_node
self.fix_insert(new_node)
def rotate_left(self, x): y = x.right x.right = y.left if y.left != self.nil: y.left.parent = x
y.parent = x.parent if x.parent == None: self.root = y elif x == x.parent.left: x.parent.left = y else: x.parent.right = y y.left = x x.parent = y
def rotate_right(self, x): y = x.left x.left = y.right if y.right != self.nil: y.right.parent = x
y.parent = x.parent if x.parent == None: self.root = y elif x == x.parent.right: x.parent.right = y else: x.parent.left = y y.right = x x.parent = y
def fix_insert(self, new_node): while new_node != self.root and new_node.parent.red: if new_node.parent == new_node.parent.parent.right: u = new_node.parent.parent.left # uncle if u.red:
u.red = False new_node.parent.red = False new_node.parent.parent.red = True new_node = new_node.parent.parent else: if new_node == new_node.parent.left: new_node = new_node.parent self.rotate_right(new_node) new_node.parent.red = False new_node.parent.parent.red = True self.rotate_left(new_node.parent.parent) else: u = new_node.parent.parent.right # uncle
if u.red: u.red = False new_node.parent.red = False new_node.parent.parent.red = True new_node = new_node.parent.parent else: if new_node == new_node.parent.right: new_node = new_node.parent self.rotate_left(new_node) new_node.parent.red = False new_node.parent.parent.red = True self.rotate_right(new_node.parent.parent) self.root.red = False
def exists(self, val): curr = self.root while curr != self.nil and val != curr.val: if val < curr.val: curr = curr.left else: curr = curr.right return curr
Red-black Tree Summary
Red-black trees are 2-3 trees implemented using a BST - they have the invariant that they have to correspond to a perfectly balanced 2-3 tree.
Update strategy is: Insert in a way that break the rules on red nodes, then fix it.
They have logarithmic performance - worst case the height is $2\ log(n)$
This was the end of this part - it was a lot to process, it was a lot for me as well :). In the next part we’ll cover priority queues and how they can be implemented as a kind of tree :).