In the last part we covered the basics and some principles for concurrent programs. In this part we’ll cover how we define concurrent problems, the outcome we want and some solutions.
Races
As we saw in the last part - a different trace leads to a different outcome - this means that concurrent programs are non-deterministic.
Why concurrent programs are non-deterministic is due to the scheduler’s decisions.
When we have a problem with different possible outcomes, we need to label what is a faulty run and a successful run.
A race condition is a situation where the correctness of a concurrent program depends on the specific execution. If we try to define this a bit easier, since concurrent programs can are non-deterministic, this means we have to “race” the program.
There are different types of race-conditions. A data race is one example.
A data race is a when two (or more) threads access a shared memory location and where at least one access is a write. The threads use no synchronization mechanism to protect said data.
A simple example is the counter program, in the case where the program spat out 1, instead of 2, it was due to this data race.
One thing to note is that: Not every race-condition is a data race and not every data is a race-condition either.
The first one is quite intuitive, the race isn’t always about data, it can be an access call to files/network. The latter is not clear from the start, but there is a simple explanation. A data race might not always affect the result, therefore it might not be a race condition.
Synchronization problems
Now that we’ve seen how we can outline the problem, lets where the problem stems from and how we can identify where in code.
We call the part of a program where threads access the shared resource a critical section. It’s quite obvious that no more than 1 thread should be in this critical section at a time. We call this property the mutual exclusion property. Therefore, we can call this concurrent program problem for mutual exclusion problem.
Let’s define some more technical terms to this mutual exclusion property as well. A deadlock is when the program/group of threads can’t proceed further.
For example, if we have 10 threads, t1, t2, t3, ... , t10
. If t10
has to wait for t9
and t9
for t8
and so on. If t1
happens to be stuck, we achieve a so-called deadlock.
A good example is the dining philosophers problem. Many times deadlocks are created from ‘circular dependency’ problems.
Starvation is when we have threads trying to enter the critical section, but never succeeds. So we have threads that are “starved” from entering the critical section.
A good solution to the mutual exclusion property solves all these invariants. Note, freedom from starvation implies freedom from deadlock as well!
To solve starvation, we need to become fair, fairness is a situation where all threads that request to enter the critical section eventually gets it.
Alright, now we have clearly defined the outline of the problem, the problem and each ‘sub-problem’. Let’s look at how we can achieve a good solution.
Locks
Locks are one of (if not the simplest) synchronization mechanism that can ensure a good solution to the mutual exclusion property.
A lock (interface) can be described as:
interface Lock {
// Acquire lock
void lock();
// Release lock
void unlock();
}
It’s a quite simple to see how we can use this:
int cnt;
lock.lock();
cnt = counter;
counter = cnt + 1;
lock.unlock()
This ensures exactly one lock gets the lock and, after it’s finished, another lock will acquire it and so on. In this simple program, it will become sequential even, but that’s besides the point.
The idea of locks still ensures a solution for the mutual exclusion problem.
Semaphores
Semaphores are a further generalization of locks. Instead of the only possible states being locked
and unlocked
(one can see this as 0 and 1). We have a counting mechanism:
interface Semaphore {
// current value of counter
int count();
// increment counter
void up();
// decrement counter
void down();
}
When we initialize a semaphore, it is set to a non-negative value, C
, which is the capacity.
When we call sem.up()
, we increment C
by one.
A call to sem.down()
, first wait until C
is positive, and then uninterruptedly decrements the value of C
by one.
Let’s look at the counter program with semaphores now:
int cnt;
sem.down();
cnt = counter;
counter = cnt + 1;
sem.up();
As we can see, it’s really similar. But how do we ‘wait until C
is positive’?
We need to understand something called invariant first.
An invariant is a property that always holds between method calls. The invariant should hold initially, at the beginning of a method call, and at the end of it.
A good example is, a bank account should never become negative, when you create an account it should hold true. Also when you make an transaction.
A semaphore, with capacity C
satifies the invariant:
invariant{
count() >= 0;
count() == C + # of calls to up - # of calls to down;
}
This makes sense, we shouldn’t ever go past 0, and the number of calls to the up and down methods, should correspond to the current count number.
A special semaphore case is when we have capacity 1, (0 and 1), this is called a binary semaphore and are essentially the same as locks.
The only difference is that, in a lock, the thread with the lock is the only one that can release the lock as well. But in a binary semaphore, a thread may decrement the semaphore down to 0 from 1, but another thread can, legally, increment the semaphore back up to 1 again.
This is called transferring of permissions, binary semaphores thus supports this.
Barriers
The last thing we’ll cover in this part is a so-called barrier. A barrier is a form of synchronization where there is a entry-point (barrier) in a program’s execution that all threads in a group have to reach before any of them is allowed to continue.
A simple and good example, let’s say we use a kind of array as our synchronization mechanism:
In thread t0
’s run()
:
// code before barrier
// t0 done
done[t0].up();
// wait for t1
done[t1].down();
// code after barrier
And for thread t1
’s run()
:
// code before barrier
// t1 done
done[t1].up();
// wait for t0
done[t0].down();
// code after barrier
We can see that this ensures they wait for each other at this point at the program.
Summary
This concludes this part - races, locks and semaphores are an important part in solving concurrent problems that can occur.