7.6. Multi-Threaded Updates

The difficulty in updating multi-threaded programs lies in safely coordinating the update timeliness. When datatypes are updated by one thread, one of the remaining threads may attempt to use code that relies on the old representation of the datatype before it encounters an update point. We adapted an algorithm that blocks all threads in heterogeneous checkpointing for multi-threaded applications for dynamic updates. The idea is to force all but one thread to block when the application must update. The one thread that is not blocked will be the coordinator of the update. It polls the status of the remaining threads until it can tell for sure that all threads are blocked, as defined below.

When a thread reaches an update point and the application must update, it raises a flag indicating that it is willing to cooperate on the update and then attempts to acquire a coordination lock. The first thread to acquire the coordination lock is the coordinator of the update. The coordinator can tell that some threads are blocked if their cooperation flags are raised. But this does not cover all threads. Some threads might be blocked waiting on an application lock owned by a thread that is already willing to cooperate and that is blocked on the coordination lock To that end, the system needs to keep track of the blocking status of various threads. Calls to pthread_mutex_lock() and pthread_mutex_unlock() are replaced with wrapper calls to keep track of the blocking status of threads. When a thread attempts to acquire a lock, it adds the lock to a WANT list. When the lock is acquired, the lock is removed from the WANT list and placed on a HAVE list. When the thread releases the lock, the lock is removed from the HAVE list.

The coordinator determines that a thread is really blocked if:

The coordinator keeps on checking the status of the other threads until it can determine that all other threads are really blocked, at which time the coordinator initiates the actual update: the stack of each thread is fully unrolled; all datatypes are transformed; the stacks are reconstructed; and, the threads are released to resume executing the updated version.