Dynamic arrays
Finally we have arrived at the DS of DSA. In (almost) all programming languages the most common built in data structure are arrays. We have seen them, worked with them - nothing new. We’ve probably even worked with dynamic arrays, they are an essential data structure - sometimes we don’t know how much data we’ll need.
But how are dynamic arrays actually implemented into a language? The ideas are actually quite simple - we use a fixed length array and make a larger, fixed length array again, when we have run out of space.
Different approaches
Naive Approach
A very naive and brute force implementation would be, if we append a value to an array which doesn’t have space, we copy the old array and make the index one larger so, we can fit the new value. The problem with this approach is that it’s really slow. If I want to append 10 items to a list of size 1, we need to copy each time we append.
The complexity of this becomes $\mathcal{O}(n^2)$, which again, is bad.
A Better, but Naive Approach
A better approach would be that each we need to resize the array - we make some extra space for future appends. Let’s say each time we need to resize the array we add 10 more indexes, for future use.
This is actually 10 times faster than our first approach! But it’s still $\mathcal{O}(n^2)$ so…
Multiplicative Approach
So, instead of resizing by a constant - why not just double the array size each time? With this approach:
To reach an array of size $2^n$ we would need to do $1 + 2 + 4 + \dots + 2^{n-1}$ total element copies This sums up to $2^n - 1$ which makes this approach $\mathcal{O}(n)$!
Worst-case Complexities
One might go through an example about the worst-case complexity of these two approaches. If we do that we find that both have a worst-case complexity of $\mathcal{O}(n)$. If we ever need to resize, we would need to copy $n$ elements from the old array to the new one.
We now suppose we call the add
function $n$ times instead. What’s the worst-case complexity now? For the constant approach it’s quite obvious that it becomes
$\mathcal{O}(n^2)$, for each call, in the worst-case, we would need to copy n items, then n adds in total.
For our multiplicative approach - this instead becomes $\mathcal{O}(n)$. But this is still ‘too slow’ - we would like a $\mathcal{O}(1)$ operation.
But don’t worry - here we have something called amortized complexity which plays a huge role.
Amortized Complexity
If add
had a $\mathcal{O}(1)$ worst-case complexity, $n$ calls would take $\mathcal{O}(n)$ time right?
But as we stated, our approach took $\mathcal{O}(n)$ for n calls - so we can say that add
has a $\mathcal{O}(1)$ amortized complexity!
Basically what amortized complexity is, we take an average cost of the operation over n calls.
Now that we’ve defined how the most important feature of dynamic arrays should work - let’s implement one!
Implementation
A python implementation of a dynamic array:
class dynamic_array:
def __init__(self):
self.array = [None]
self.size = 0
def get(self, i):
if 0 <= i < self.size:
return "Error, index out of bounds"
return self.array[i]
def set(self, i, val):
if 0 <= i < self.size:
return "Error, index out of bounds"
self.array[i] = val
def append(self, val):
if self.size == len(self.array):
self.resize()
self.array[self.size] = val
self.size += 1
def resize(self):
new_array = [None] * 2
for i in range(len(self.array)):
new_array[i] = self.array[i]
self.array = new_array
Now this isn’t a perfect implementation - but it works okay.
Now that we understand really how dynamic arrays work - we can actually start implementing some more interesting data structures
Stacks and Queues
Stacks and Queues are quite popular and powerful data structures. But just to refresh let’s go over how both data structures work.
Stacks
A Stack is a so-called LIFO data structure. LIFO stands for ‘Last In First Out’. One can visualize it as a literal stack of items (therefore the name). If we have a stack of things, we might take items from the top and keep throwing away the items til we reach the bottom. This is exactly how the stack data structure works!
Important Functions for Stacks
These operations that I was talking about have names - if we remove the top item, we call it pop()
- and if we place an item on the top of the stack
it’s called push()
. We have some helper functions that are usually needed for an implementation as well - the peek()
function gives us the item at the top of the stack.
The size()
returns how large the stack currently is.
Queues
A queue is a so-called FIFO data structure, ‘First In First Out’. Just as the stack data structure, we can visualize this as a literal queue. If someone joins the queue you are put in last - and the first person to join queue will be the first to leave the queue as well.
How to implement Stacks and Queues
Now that we understood the basics of these data structures - let’s see how one can implement them. But before we dive right in let’s see the different approaches to this. Both stacks and queues can be implemented with Dynamic arrays as we covered - but just as good with linked lists. We’ll start with how we can implement it using a linked list - but let’s refresh our memory of what a linked list is.
Linked Lists
A (singly) linked list consist of ‘Node’ objects in a sequence - these node objects can contain one or multiple values - each node is ‘pointing’ to the next node in the sequence.
A singly linked list points to the first node and the last node in the list points to nothing (Nothing can for example be Null
)
Since it would be inefficient to calculate the size each time we call size()
, since we would need to traverse the whole list, we instead keep a dynamic size variable that we can access.
Here’s how the boilerplate classes for a Linked List might look:
class Node:
def __init__(self, val, next = None):
self.val = val
self.next = next
class LinkedList:
def __init__(self, val = None):
self.head = None
self.size = 0
With our new knowledge about linked lists - we can return to stacks!
Stacks as linked lists
An implementation with a linked list for a stack is perfect - we only care about the top element in stack, therefore, when we push()
,
we first make the new nodes next point to the current head. Then we redirect head to our new node and increase size. When we want to pop()
-
we first store the value of our current head node temporarily, to return the value popped. Then we redirect head to the next
of the node we want to pop.
In many languages we do not need to manually remove the node from the list - since it will be handled by garbage collection. But in a language like C, we would need to free()
that node from memory.
So let’s implement a stack using a linked list.
Implementation of a stack using a linked list
class Node:
def __init__(self, val):
self.val = val
self.next = None
class Stack:
def __init__(self):
self.head = None
self.size = 0
def is_empty(self):
if self.head == None:
return True
return False
def push(self, val):
if self.head == None:
self.head = Node(val)
return
new = Node(val)
new.next = self.head
self.head = new
self.size += 1
def pop(self):
if self.head == None:
return None
pop = self.head.val
self.head = self.head.next
self.size -= 1
return pop
Now let’s implement a stack using a dynamic array.
Implementation of a stack using a dynamic array
Before we start the implementation, let’s consider how we should implement it before. One thing that might not be obvious might be that, using the first element as the top - will work - but be super slow.
If we pop the first element - we would need to push forward the rest of the stack by 1 step. Which would take $\mathcal{O}(n)$. We want $\mathcal{O}(1)$!
So instead let’s use the last element as the top!
class Stack:
def __init__(self):
self.stack = [None]
self.size = 0
def is_empty(self):
return self.size == 0
def push(self, val):
if self.size == len(self.stack):
self.resize(2 * self.size)
self.stack[self.size] = val
self.size += 1
def pop(self):
self.size -= 1
pop = self.stack[self.size]
self.stack[self.size] = None
if self.size == (len(self.stack) // 4):
self.resize(len(self.stack) // 2)
return pop
def resize(self, factor):
new_array = [None] * factor
for i in range(self.size):
new_array[i] = self.stack[i]
self.stack = new_array
Implementation of a queue using linked lists
Now it’s now to implement queues! Using a linked list seems as a very natural approach since we have a sequence of pointers forward. However, in a queue we always need to the first and the last element. Therefore, we need another pointer to the end of the list as well.
class Node:
def __init__(self, val):
self.val = val
self.next = None
class Queue:
def __init__(self):
self.last = None
self.first = None
self.size = 0
def is_empty(self):
if self.first == None:
return True
return False
def enqueue(self, val):
new_node = Node(val)
if self.is_empty():
self.first = new_node
self.last = new_node
else:
self.last.next = new_node
self.last = new_node
self.size += 1
def dequeue(self):
if self.is_empty():
return None
exit = self.first
self.first = self.first.next
self.size -= 1
return exit.val
Quite simple and elegant!
Implementation of a queue using a (dynamic) array
One thing we have to go through first is that - our implementation will be a so-called circular array. Our pointers to the head and tail can (and will cross).
class Queue:
def __init__(self):
self.queue = [None]
self.head = 0
self.tail = 1
def is_empty(self):
return self.head == (self.tail + 1) % len(self.queue)
def enqueue(self, val):
if (self.tail + 1) % len(self.queue) == self.head:
self.resize(2 * len(self.queue))
self.queue[self.tail] = val
self.tail = (self.tail + 1) % len(self.queue)
def dequeue(self):
if self.is_empty():
raise Exception("Queue is empty")
result = self.queue[self.head]
self.head = (self.head + 1) % len(self.queue)
return result
def resize(self, new_size):
old_queue = self.queue
self.queue = [None] * new_size
if self.head < self.tail:
for i in range(self.tail - self.head):
self.queue[i] = old_queue[(self.head + i) % len(old_queue)]
self.head = 0
self.tail = self.tail - self.head
else:
for i in range(len(old_queue) - 1):
_index = (self.head + i) % len(old_queue)
self.queue[i] = old_queue[_index]
self.head = 0
self.tail = self.tail - self.head - 1
That’s it for this part - in the next part we’ll cover something called ‘Abstract Data Types’ or for short ADTs. We’ll define what a ADT is and what’s the difference between them and ordinary data structures.