4003-532/4005-736 Parallel Computing II
Module 4 Lecture Notes
Lock-Free Concurrent Programming
Prof. Alan Kaminsky
Rochester Institute of Technology -- Department of Computer Science
Goal, Solutions, Problems
- The Goal: Multiple thread safe updating of a shared data structure
- Solution 1: Locking
- Protect all updating code with a critical region
- Semaphores, monitors, locks, Java synchronized keyword
- Problem 1A: Locking is a heavyweight operation
- Reduces performance, even if multiple threads are not contending for the lock
- Problematic in a high performance program
- Problem 1B: Lock implementations are typically optimized for the no-contention case
- They do not perform as well if multiple threads are constantly contending for the lock
- Problematic in an SMP parallel program
- Problem 1C: Convoying
- Suppose a thread acquires a lock, then is swapped out
- Another thread tries to acquire the lock, can't, and becomes blocked
- And another thread, and another, . . .
- You end up with a whole bunch of threads blocked behind the first thread -- a "convoy"
- These threads stay blocked until the first thread is swapped back in and releases the lock
- Further performance reduction
- Problem 1D: Can cause priority inversion in a real-time system
- Suppose a low-priority thread acquires a lock
- Suppose a high-priority thread preempts the low-priority thread
- Suppose the high-priority thread also attempts to acquire the lock
- Now the high-priority thread is prevented from running because of the low-priority thread: priority inversion
- This may cause the high-priority thread to miss its deadline -- disaster!
- Solution 2: Lock-free concurrent programming
- The trick is to avoid locks while still ensuring multiple thread safe updating
Atomic Compare-and-Set
Lock-Free Updating Using Atomic Compare-and-Set
- General pattern for using CAS
for (;;)
{
oldValue = X.get();
newValue = Whatever; // May be computed based on oldValue
if (X.compareAndSet (oldValue, newValue)) break;
}
- Lock-free increment: Atomically set X to X+1
public static void increment
(AtomicInteger X)
{
for (;;)
{
oldValue = X.get();
newValue = oldValue + 1;
if (X.compareAndSet (oldValue, newValue)) break;
}
}
- Lock-free minimum: Atomically set X to min (X, value)
public static void minimum
(AtomicInteger X,
int value)
{
for (;;)
{
oldValue = X.get();
newValue = Math.min (oldValue, value);
if (X.compareAndSet (oldValue, newValue)) break;
}
}
- Use of CAS in the maximum parsimony phylogenetic tree SMP parallel program
The ABA Problem
- The ABA Problem
- Suppose a thread is executing the above general pattern
- It does oldValue = X.get(); and reads the value A
- Then a context switch happens
- Another thread sets the value of X to B
- Still another thread sets the value of X back to A again
- Then the original thread resumes
- The X.compareAndSet() operation will succeed
- But this may not be correct, if the new value was computed assuming the value of X did not change between X.get() and X.compareAndSet()
- The ABA Problem cannot happen if X's value changes monotonically
- X is always incremented, never decremented
- X is always decremented, never incremented
- X is always set to the minimum of itself and a given value
- X is always set to the maximum of itself and a given value
- Etc.
- If the ABA Problem can happen, one solution is:
- Associate a modification counter with X
- Use double-compare-and-set to change X's value and increment the counter as a single atomic operation
- Can still get the ABA Problem when the modification counter rolls over -- but practically, the chances of a problem are remote
- Atomic double-compare-and-set (DCAS) operation on two variables X and Y:
doubleCompareAndSet (X, xOldValue, xNewValue, Y, yOldValue, yNewValue)
If X == xOldValue and Y == yOldValue:
X = xNewValue
Y = yNewValue
Return true
Else:
Return false
- In Java: AtomicStampedReference.compareAndSet()
Lock-Free Concurrent Queue
-
M. Michael and M. Scott.
Simple, fast, and practical non-blocking and blocking concurrent queue algorithms.
In Proceedings of the 15th Annual ACM Symposium on Principles of Distributed Computing (PODC'96),
pages 267-275, 1996.
- Implemented in class java.util.concurrent.ConcurrentLinkedQueue
- Slightly simplified excerpts from the ConcurrentLinkedQueue source code:
// Hidden linked list node class.
private static class Node<E>
{
private volatile E item;
private volatile Node<E> next;
// Create a new node with the given item and a null next node.
Node (E x);
// Create a new node with the given item and the given next node.
Node (E x, Node<E> n);
// Returns this node's item.
E getItem ();
// Atomic compare-and-set of this node's item.
boolean casItem (E cmp, E val);
// Atomically set this node's item.
void setItem (E val);
// Returns this node's next node.
Node<E> getNext ();
// Atomic compare-and-set of this node's next node.
boolean casNext (Node<E> cmp, Node<E> val);
// Atomically set this node's next node.
void setNext (Node<E> val);
}
// Pointer to header node, initialized to a dummy node. The first
// actual node is at head.getNext ().
private transient volatile Node<E> head = new Node<E> (null, null);
// Pointer to last node on list.
private transient volatile Node<E> tail = head;
// Atomic compare-and-set of this queue's head pointer.
private boolean casHead (Node<E> cmp, Node<E> val);
// Atomic compare-and-set of this queue's tail pointer.
private boolean casTail (Node<E> cmp, Node<E> val);
// Add the given item to the end of this queue.
public boolean offer (E o)
{
if (o == null) throw new NullPointerException();
Node<E> n = new Node<E> (o, null);
for (;;)
{
Node<E> t = tail;
Node<E> s = t.getNext();
if (t == tail)
{
if (s == null)
{
if (t.casNext (s, n))
{
casTail (t, n);
return true;
}
}
else
{
casTail (t, s);
}
}
}
}
// Remove and return the first item from this queue.
public E poll()
{
for (;;)
{
Node<E> h = head;
Node<E> t = tail;
Node<E> first = h.getNext();
if (h == head)
{
if (h == t)
{
if (first == null)
return null;
else
casTail (t, first);
}
else if (casHead (h, first))
{
E item = first.getItem();
first.setItem (null);
return item;
}
}
}
}
- What about the ABA Problem?
- In this lock-free concurrent queue algorithm, the ABA Problem can happen if you re-use nodes
- But this Java implementation never re-uses nodes, it always allocates fresh ones
- So modification counters are not needed
- (Prof. Doug Lea, SUNY Oswego, private communication)
Concurrent Hash Map
- Implemented in class java.util.concurrent.ConcurrentHashMap
- Does use locking, but is designed to reduce thread contention for locks
- The map is divided among N segments
- Typically, N is (an estimate of) the number of threads that will be updating the map
- The most significant bits of the key's hash code determine which segment to access
- Within each segment, there is an array of hash buckets as in a typical hash map implementation
- Update operations are programmed so that the segment is always in a consistent state, even in the middle of an update
- This means that query operations do not need locking
- Update operations are protected with a lock
- When threads query the map:
- There is no locking, hence good performance
- When threads update the map:
- Assume threads update the map with keys having "random" hash codes
- Then most of the time threads will not access the same segment
- Thus, most locking will encounter the no-contention case, for which locks are typically optimized to give good performance
|
Parallel Computing II
|
|
•
|
|
4003-532-70/4005-736-70
|
|
•
|
|
Spring Quarter 2007
|
|
Course Page
|
|
Alan Kaminsky
|
|
•
|
|
Department of Computer Science
|
|
•
|
|
Rochester Institute of Technology
|
|
•
|
|
4486 +
2220 =
6706
|
|
Home Page
|
Copyright © 2007 Alan Kaminsky.
All rights reserved.
Last updated 14-May-2007.
Please send comments to ark@cs.rit.edu.
|