In this part weāll cover so called ātreesā in computer science.
Trees
The formal definition of a ātreeā is: āA hierarchical data structure built up from nodesā.
So unlike a real tree which has its root at the bottom - the computer scientist flipped it - the root node is at the top of the tree.
In any tree thereās always one root node. Each node can have so-called child nodes, and these children can have their own child nodes and so on.
Therefore, all nodes - expect the root node - must have exactly one parent node.
Tree rules
Iāve already stated some of the ārulesā of a tree, but generally we have 3 rules that makes a collection of nodes a tree.
- Each node must have exactly one unique parent node - expect the root node.
- There must exactly be one root node.
- If the first and second rules are followed - then no ācyclesā will appear.
Recursive Tree
Trees can also be defined recursively: āA tree consists of a root node together with its subtrees, which do not share any common nodesā.
This means, child node (of the root node) $\rightarrow$ root of subtree. This also means that a tree can be empty (no nodes) - for some applications this is needed.
Programming with trees
Now that we have some basic knowledge about trees - letās try to represent them with code. A very natural representation would be a directory tree.
class Tree:
name: string
size: int
children: list[Tree]
If we now want to compute the total size of a directory (folder) we would need to:
def total(tree: Tree) -> int:
n: int = tree.bytes
for c in tree.children:
n += total(c)
return n
And we can see that recursion and trees go hand in hand. Now letās see how we can traverse the tree.
There are main methods for traversing - Pre-order traversal - meaning you process each node before visiting its children. As well as Post-order traversal - meaning you process each node after visiting its children.
Binary Trees
Often we can use a special kind of tree called āBinary treeā - a binary tree is a tree where each node has exactly two children.
As we said before, these children can be represented as a null-value if theyāre āmissingā. We call these children nodes the āleftā and ārightā nodes or subtrees.
Some terminology common used for trees are, size
height
, and level
.
Size is the number of nodes in the (sub)tree - height, is the number of levels
in the tree. Level is the distance from node to the root.
Binary Search Trees
Binary search trees or BSTs is a kind of binary tree but with some extra conditions:
- Each node has a key
- The nodeās key is smaller than all the keys in its right subtree
- The nodeās key is greater than all the keys in its left subtree
Or differently phrased, everything in its left subtree is less than the root, everything to the right is greater than the root.
With these rules searching becomes really, really effective and natural. We just compare our key to each nodeās key and see if we traverse to the right or left.
If we ever hit a ānullā node, we can also say that the key weāre searching for doesnāt exist in this key.
So let us implement a map using BSTs!
class BSTMap[Key, Value]:
class Node:
key: Key
value: Value
left, right: Node
root: Node
def get(key: Key) -> Value:
return get(key, root)
def put(key : Key, val : Value):
root = put(key, val, root)
def get(key, node) -> Value:
if node is None
return None
if key < node.key
return get(key, node.left)
if key > node.key
return get(key, node.right)
else:
return node.value
def put(key, val, node):
if node is None
return Node(key = key, value = val, left = None, right = None)
elif key < node.key:
node.left = put(key, value, node.left)
elif key > node.key:
node.right = put(key, value, node.right)
else:
node.value = value
return node
Fin max/min value in BST
To find the max/min value in BST we just repeatedly go right/left until we hit a null node. Real simple!
Or in code:
def get_max(tree: BST):
c_node = tree.root
while c_node != None:
c_node = c_node.right
return c_node.value
def get_min(tree: BST):
c_node = tree.root
while c_node != None:
c_node = c_node.left
return c_node.value
Sorting a BST
Sometimes we might want to print/store a BST in ascending order - the way to do this of course is via recursion!
This is also known as in order traversal - we print the left subtree, print the root then print the right subtree - we do this recursively.
So for example:
def print_sorted(node: Node):
if node is None:
return
print_sorted(node.left)
print(node.key)
print_sorted(node.right)
We can also do a more general case - we make an iterator for the keys in the tree - by doing an in order traversal and store the keys in a dynamic array.
def keys() -> list[Key]
list = []
inorder(root, list)
return list
def inorder(node: Node, list: list[Key]):
if node is None:
return
inorder(node.left, list)
list.add(node.key)
inorder(node.right, list)
...
for key in BST.keys():
print(key)
Deletion
Deletion can be quite tricky in trees, and especially in BSTs - we canāt just remove a node however, there are some rules.
Deleting a leaf
A so-called leaf is a node with no children (or visually the leafs are the bottom nodes). These are fine to remove without any corrections.
Since we just remove that child, with no children there wonāt be any problems.
Deleting a node with one child
Now, in the case we want to delete a node with exactly one child, itās still quite trivial. Since that node only has one child, we can just replace the deleted node with its child. It doesnāt affect the tree as a whole.
Deleting a node with two children
This is where it becomes tricky, but if we take a step back one think about it, we need to find a node that upholds the requirements of, everything to the left should be smaller and everything to the right must be greater. If we think what candidates fulfill this requirement - it becomes clear two nodes are good candidates.
The greatest (rightmost) key in the nodeās left subtree or the smallest (leftmost) in the nodesā right subtree.
The hardest case
The case above becomes quite simple if the candidates donāt have any children, the hardest case in deletion is when these have one child. What we need to do in that case is: We first delete the parent node by replacing it with its child - after we have done this we can delete the node we wanted to delete with our old (deleted) parent node.
So, not that hard logically - but can be a pain to implement correctly.
So letās now do exactly that :)!
def delete(node: Node, key : Key):
if node is None:
return node
if key < node.key:
node.left = delete(node.left, key)
elif(key > node.key):
node.right = delete(node.right, key)
else:
if node.left is None:
temp = node.right
node = None
return temp
elif node.right is None:
temp = node.left
node = None
return temp
temp = get_max(node.left)
node.key = temp.key
node.right = delete(node.right, temp.key)
return node
Complexity
The complexity of operations for BST are in the best case and average case, $\mathcal{O}(log\ n)$ which is great. In the worst case however, these operations will take $\mathcal{O}(n)$ since the items are in ascending order.
BST Summary
We can use BSTs to implement a set or map, however, with ordering-based operations such as min, max, range etc. with good complexity.
Insert/Delete/Search will take proportional time to the height of the tree. Which in the best/average case is $\mathcal{O}(log\ n)$, but if the tree is unbalanced it will become linear.
So the main takeaway is that, if a BST is balanced, itās really powerful and efficient. So letās take a look how we can make it balanced.
Invariants
The property that, in the left subtree, the keys should be lesser than and that in the right subtree, the keys should be greater than is an example of data structure invariant.
An invariant is a property that the designer wants to always hold. Usually a rule about what instances of the data structure are āvalidā.
But these invariants can be expensive to maintain - if we say āthis array is sortedā we canāt just use append anymore, we need to insert it. So we need to find an invariant for BSTs which is easier to maintain.
AVL Trees
AVL Trees has a less restrictive invariant compared to BSTs - in an AVL tree we keep the tree almost balanced.
The invariant for an AVL tree is: āFor each node, the heights of its left and right subtrees differ by at most 1ā.
Tree Rotations
Now before we dive deeper into AVL trees - letās take a look into rotations within trees. When we talk about rotations in trees we mean that it has a different node as the root.
Right rotation: left child of root becomes root, old root becomes right child. Left rotation: right child of root becomes root, old root becomes left child.
Tree rotations preserve ordering and contents in BSTs. Note that we can also rotate a subtree.
AVL Insertion
To insert a new node into an AVL tree - we first start by doing a BST insertion - this might break the AVL tree (the invariant).
But to fix this we go upwards from the new node, finding and fixing nodes that break the invariant. Note that during BST insertion, each nodeās height either stays the same or goes up by 1.
We have different cases which weāll look at.
Case 1: Left-left Tree
In the Left-left case, we did an insertion in the left child of the left child of this node, which broke the invariant.
To fix this, we just make a right rotation
Case 2: Right-right Tree
In the Right-right case, we did an insertion in the right child of the right child of this node, which broke the tree.
To fix we make a left rotation!
Case 3: Left-right Tree
In the Left-right case, the extra height is in the rootās left child then in the right child of that. We canāt fix it with one rotation instead - but luckily we still have a solution.
First weāre going to rotate the left subtree left. When we have performed the left rotation, weāve now created a Left-left tree, which is fixed by one right rotation as we saw earlier.
Case 4: Right-left Tree
The same logic as case 3, the extra height is in the rootās right child and then in the left child of that one. To fix this we first make a right rotation in the right subtree - then make a final left rotation to fix the tree.
AVL Insertion Summary
So we have these four cases and, we know how to solve them - we just need to remember how we identify them. We identify which grandchild the extra unit of height is in.
AVL Deletion
In the case of deleting the node, the same situation occurs! But instead we reduce the height of some tree by 1, breaking the invariant.
So, therefore we can use the exact same balancing algorithm. So when implementing an AVL tree we only need to implement the balancing algorithm once!
In conclusion, AVL trees use rotation to keep the tree balanced. In the worst case height of the tree is $1.44\ log_2(n)$, but normally close to $log_2(n)$.
2-3 Trees
2-3 trees is also another height balanced trees. 2-3 trees is made from two kinds of nodes:
- 2 - node: Has a key
x
and two subtreesA
andB
. Just like in a BST, A < x < B - 3 - node: Has two keys
x
,y
with three subtrees, A, B, and C. As usual, A < x < B < y < C.
Searching in 2-3 Trees
It proceeds just as in a BST. Since - In a 2-node, compare search key against key in node, descend into one of the two children.
In the case of a 3-node, compare the search key against both keys in the node, descend into one of the three children.
How 2-3 Trees balance
In the AVL case we saw that we have to balance via rotations after each insertion/deletion. In the case of a 2-3 tree, we never have to balance since itās always balanced.
Because if one child is null all children must be null together (you can test this if you want by drawing examples). This means the empty trees are all at one level (which implies a balanced tree).
The resulting tree will have a height of between $log_3(n)$ and $lg(n)$
Insertion in 2-3 trees
When we want to insert a new node to a 2-3 tree we use a normal insertion, however, this might break the tree via, for example, an unbalanced tree or a ā4-nodeā.
To fix this we use the following algorithm:
- absorb the node into its parent, move up to the parent
- If the node is a 4-node, split it into 2-nodes.
We keep alternating these 2 steps, we stop once we donāt need to split.
Summary 2-3 trees
So conceptually they are quite simple - but the implementation is quite horrible, even though the complexity is great, since the tree is always perfectly balanced.
So to fix this we use something called a āred-black BSTsā.
Red-black BSTs
A red-black BSTs implement a 2-3 tree using a BST. We can see a 2-3 node as a BST tree since - a 2-node becomes a regular BST node, however, a 3-node becomes two BST nodes.
The name āred-blackā comes from that we color the BST nodes in order to distinguish 2-nodes from 3-nodes.
2-nodes become black and 3-nodes become black with red child. So red is āonly part of a 3-nodeā.
So a Red-black BST is really just a 2-3 but store it in a BST since we translate it to the corresponding 2-3 tree.
Red-black Tree Invariant
The invariant that we must uphold for a red-black tree is:
- A red node may only occur as the left child of a black node
- Black balance: on any path from the root to a null, we pass through the same number of black nodes
- And the BST invariant must also hold.
Red-black BST Insertion
To make an insertion into a red-black tree, we first make a normal BST insertion, the newly added node is a red node.
This may break our invariant on red nodes, but this way we always keep the black balance and BST properties.
But this way weāll only break the invariant by placing a red node in a forbidden place. There are 3 cases which this will happen.
Case 1: Skew
In a skew case, we have placed it to the right of a 2-node (black node) and therefore broken the first rule. We fix it by an operation called skewing.
Which is a kind of rotation, weāll look at the implementation later.
Case 2: Split
In a split case we have created a 4-node (or a red having a red child), therefore we need to split it into 2-nodes. In the case of splitting, we may break the black balance which means we need to continue up recursively
Case 3: Red Root Node
After a split, we might end up with a red node as the root of the whole tree. In a case like this, the fix is simple, we just make the root a black node instead!
Red-black Tree Implementation
class RBNode:
def __init__(self, val):
self.red = False
self.parent = None
self.val = val
self.left = None
self.right = None
class RBTree:
def __init__(self):
self.nil = RBNode(0)
self.nil.red = False
self.nil.left = None
self.nil.right = None
self.root = self.nil
def insert(self, val):
new_node = RBNode(val)
new_node.parent = None
new_node.left = self.nil
new_node.right = self.nil
# New node must be red
new_node.red = True
parent = None
current = self.root
while current != self.nil:
parent = current
if new_node.val < current.val:
current = current.left
elif new_node.val > current.val:
current = current.right
else:
return
new_node.parent = parent
if parent == None:
self.root = new_node
elif new_node.val < parent.val:
parent.left = new_node
else:
parent.right = new_node
self.fix_insert(new_node)
def rotate_left(self, x):
y = x.right
x.right = y.left
if y.left != self.nil:
y.left.parent = x
y.parent = x.parent
if x.parent == None:
self.root = y
elif x == x.parent.left:
x.parent.left = y
else:
x.parent.right = y
y.left = x
x.parent = y
def rotate_right(self, x):
y = x.left
x.left = y.right
if y.right != self.nil:
y.right.parent = x
y.parent = x.parent
if x.parent == None:
self.root = y
elif x == x.parent.right:
x.parent.right = y
else:
x.parent.left = y
y.right = x
x.parent = y
def fix_insert(self, new_node):
while new_node != self.root and new_node.parent.red:
if new_node.parent == new_node.parent.parent.right:
u = new_node.parent.parent.left # uncle
if u.red:
u.red = False
new_node.parent.red = False
new_node.parent.parent.red = True
new_node = new_node.parent.parent
else:
if new_node == new_node.parent.left:
new_node = new_node.parent
self.rotate_right(new_node)
new_node.parent.red = False
new_node.parent.parent.red = True
self.rotate_left(new_node.parent.parent)
else:
u = new_node.parent.parent.right # uncle
if u.red:
u.red = False
new_node.parent.red = False
new_node.parent.parent.red = True
new_node = new_node.parent.parent
else:
if new_node == new_node.parent.right:
new_node = new_node.parent
self.rotate_left(new_node)
new_node.parent.red = False
new_node.parent.parent.red = True
self.rotate_right(new_node.parent.parent)
self.root.red = False
def exists(self, val):
curr = self.root
while curr != self.nil and val != curr.val:
if val < curr.val:
curr = curr.left
else:
curr = curr.right
return curr
Red-black Tree Summary
Red-black trees are 2-3 trees implemented using a BST - they have the invariant that they have to correspond to a perfectly balanced 2-3 tree.
Update strategy is: Insert in a way that break the rules on red nodes, then fix it.
They have logarithmic performance - worst case the height is $2\ log(n)$
This was the end of this part - it was a lot to process, it was a lot for me as well :). In the next part weāll cover priority queues and how they can be implemented as a kind of tree :).