CSC5101 – Advanced Programming of Multicore Architectures

PAAM – Exam 2022-2023

Duration: 2 hours, any document allowed.

The barrier (11 points, easy/medium)

The goal of this exercise is to implement a barrier. A barrier is an object that allows N threads to synchronize. Technically, a barrier mainly provides a barrier_wait method. When a thread calls barrier_wait, if the number of threads having reached the barrier is strictly smaller than N, the thread waits. When the N^th thread calls barrier_wait, all the threads are unblocked.

A barrier is represented by a struct barrier. In the exercise, you must implement the following functions:

void barrier_init(struct barrier* b, int n): initializes the barrier to n, where n is the number of threads that participates,
void barrier_wait(struct barrier* b): joins the barrier.

2 points (easy)

Write a program that:

Takes n as argument, where n is the number of threads to create,
Starts n threads. Each thread must wait the n threads on a barrier before terminating.
Terminates the program once the n threads are terminate.

3,5 points (easy)

First, we want to implement the barrier using POSIX locks and condition variables. Give the definition of struct barrier and the code of barrier_init and barrier_wait. A thread must sleep when it waits for the other threads.

2 points (easy)

We now want to implement barrier using only atomic operations. Give the new definition of struct barrier and of the code of barrier_init and barrier_wait. A thread must sleep when it waits for the other threads.

2 points (medium)

In our previous implementation, a thread uselessly uses the processor when it waits the other threads. We want to improve this behavior by using a futex so that a thread sleeps while it waits. Give the new definition of struct barrier and the code of barrier_init and barrier_wait. To use a futex, we suppose that you can use these functions:

futex_wait(int* ptr, int value) : the thread sleeps if and only if *ptr is equal to value. This function is executed atomically.
futex_wake(int* ptr, int n) : wake up n threads waiting on the ptr address. This function is executed atomically.

1,5 points (easy)

We finally want to implement a barrier by using a transactional memory. Give the new definition of struct barrier and the code of barrier_init and barrier_wait.

Questions de cours (2 points, easy)

1 points

Why the sfence (store fence) instruction of a pentium is enough to implement the abstract instruction pfence?

1 points

Why the sfence (store fence) instruction of a pentium is enough to implement the abstract instruction psync?

Flat combining (7 points, difficult/very difficult)

In this exercise, we have the goal of adapting the MCS lock algorithm in order to transform it into a so-called flat combining algorithm. The following code recalls how the MCS algorithm is implemented. Compared to the course, the code has been slightly adapted. First, the condition isFree has been inverted and is now is_busy. Second, instead of having a function to take a lock and a function to release the lock, the code contains only a single function named execute_in_cs. This function takes as argument a pointer to a job function (type job_t). The execute_in_cs function takes the lock, executes the job and releases the lock.

mcs.c

With the flat combining algorithm, when a thread acquires the lock, we say that it becomes a combiner. In this case, instead of executing only its own job, it executes also the next jobs in the queue. At a high level, the algorithm becomes:

Try to take the lock,
If the lock is taken, waits until a combiner has executed its job,
Otherwise, the thread becomes a combiner. It executes its own job and then traverses the queue in order to execute the jobs of the waiters.

4 points (difficult)

Implement the flat combining algorithm. Since implementing the flat combining algorithm requires modifying the struct node structure, you also have to give the new struct node structure that you are using. Implementing the flat combining algorithm requires few modifications in the MCS algorithm. At a high level, the main difference is that a thread must execute the jobs of the waiting threads before leaving execute_in_cs.

1 point (difficult)

Why do you think the flat combining algorithm generally performs better than the MCS algorithm?

2 points (very difficult)

With the previous algorithm, a thread may never progress if new nodes are continuously adding their jobs. Modify your algorithm so that a combiner never executes more than 256 jobs.