Alan Kaminsky Department of Computer Science Rochester Institute of Technology 4486 + 2220 = 6706
Home Page
Parallel Computing II 4003-532-70/4005-736-70 Spring Quarter 2007
Course Page

4003-532/4005-736 Parallel Computing II
Module 4 Lecture Notes
Lock-Free Concurrent Programming

Prof. Alan Kaminsky
Rochester Institute of Technology -- Department of Computer Science


Goal, Solutions, Problems

  • The Goal: Multiple thread safe updating of a shared data structure
     
  • Solution 1: Locking
    • Protect all updating code with a critical region
    • Semaphores, monitors, locks, Java synchronized keyword
       
  • Problem 1A: Locking is a heavyweight operation
    • Reduces performance, even if multiple threads are not contending for the lock
    • Problematic in a high performance program
       
  • Problem 1B: Lock implementations are typically optimized for the no-contention case
    • They do not perform as well if multiple threads are constantly contending for the lock
    • Problematic in an SMP parallel program
       
  • Problem 1C: Convoying
    • Suppose a thread acquires a lock, then is swapped out
    • Another thread tries to acquire the lock, can't, and becomes blocked
    • And another thread, and another, . . .
    • You end up with a whole bunch of threads blocked behind the first thread -- a "convoy"
    • These threads stay blocked until the first thread is swapped back in and releases the lock
    • Further performance reduction
       
  • Problem 1D: Can cause priority inversion in a real-time system
    • Suppose a low-priority thread acquires a lock
    • Suppose a high-priority thread preempts the low-priority thread
    • Suppose the high-priority thread also attempts to acquire the lock
    • Now the high-priority thread is prevented from running because of the low-priority thread: priority inversion
    • This may cause the high-priority thread to miss its deadline -- disaster!
       
  • Solution 2: Lock-free concurrent programming
    • The trick is to avoid locks while still ensuring multiple thread safe updating


Atomic Compare-and-Set

  • Atomic compare-and-set (CAS) operation on a variable X:
        X.compareAndSet (oldValue, newValue)
            If X == oldValue:
                X = newValue
                Return true
            Else:
                Return false
    
  • Also called compare-and-swap
     
  • CAS is atomic
    • A thread context switch is guaranteed not to happen in the middle
       
  • Modern CPU chips support CAS in hardware
    • Actually, the IBM System/370 CPU (introduced in 1970) was the first to have this instruction
    • Intel 486: CMPXCHG instruction (operates on a 32-bit quantity)
    • Intel Pentium: CMPXCHG8B instruction (operates on a 64-bit quantity)
    • Sun SPARC: CASA, CASXA instructions
       
  • Java support for CAS
    • Package java.util.concurrent.atomic
      • Classes AtomicInteger, AtomicLong, AtomicBoolean, AtomicReference
      • Classes AtomicIntegerArray, AtomicLongArray, AtomicReferenceArray
      • Classes AtomicIntegerFieldUpdater, AtomicLongFieldUpdater, AtomicReferenceFieldUpdater
      • Classes AtomicMarkableReference, AtomicStampedReference


Lock-Free Updating Using Atomic Compare-and-Set

  • General pattern for using CAS
        for (;;)
            {
            oldValue = X.get();
            newValue = Whatever; // May be computed based on oldValue
            if (X.compareAndSet (oldValue, newValue)) break;
            }
    
  • Lock-free increment: Atomically set X to X+1
        public static void increment
        	(AtomicInteger X)
    	{
            for (;;)
                {
                oldValue = X.get();
                newValue = oldValue + 1;
                if (X.compareAndSet (oldValue, newValue)) break;
                }
            }
    
  • Lock-free minimum: Atomically set X to min (X, value)
        public static void minimum
        	(AtomicInteger X,
    	 int value)
    	{
            for (;;)
                {
                oldValue = X.get();
                newValue = Math.min (oldValue, value);
                if (X.compareAndSet (oldValue, newValue)) break;
                }
            }
    
  • Use of CAS in the maximum parsimony phylogenetic tree SMP parallel program


The ABA Problem

  • The ABA Problem
    • Suppose a thread is executing the above general pattern
    • It does oldValue = X.get(); and reads the value A
    • Then a context switch happens
    • Another thread sets the value of X to B
    • Still another thread sets the value of X back to A again
    • Then the original thread resumes
    • The X.compareAndSet() operation will succeed
    • But this may not be correct, if the new value was computed assuming the value of X did not change between X.get() and X.compareAndSet()
       
  • The ABA Problem cannot happen if X's value changes monotonically
    • X is always incremented, never decremented
    • X is always decremented, never incremented
    • X is always set to the minimum of itself and a given value
    • X is always set to the maximum of itself and a given value
    • Etc.
       
  • If the ABA Problem can happen, one solution is:
    • Associate a modification counter with X
    • Use double-compare-and-set to change X's value and increment the counter as a single atomic operation
    • Can still get the ABA Problem when the modification counter rolls over -- but practically, the chances of a problem are remote
       
  • Atomic double-compare-and-set (DCAS) operation on two variables X and Y:
        doubleCompareAndSet (X, xOldValue, xNewValue, Y, yOldValue, yNewValue)
            If X == xOldValue and Y == yOldValue:
                X = xNewValue
                Y = yNewValue
                Return true
            Else:
                Return false
    
  • In Java: AtomicStampedReference.compareAndSet()


Lock-Free Concurrent Queue

  • M. Michael and M. Scott. Simple, fast, and practical non-blocking and blocking concurrent queue algorithms. In Proceedings of the 15th Annual ACM Symposium on Principles of Distributed Computing (PODC'96), pages 267-275, 1996.
     
  • Implemented in class java.util.concurrent.ConcurrentLinkedQueue
     
  • Slightly simplified excerpts from the ConcurrentLinkedQueue source code:
        // Hidden linked list node class.
        private static class Node<E>
            {
            private volatile E item;
            private volatile Node<E> next;
            
            // Create a new node with the given item and a null next node.
            Node (E x);
            
            // Create a new node with the given item and the given next node.
            Node (E x, Node<E> n);
            
            // Returns this node's item.
            E getItem ();
            
            // Atomic compare-and-set of this node's item.
            boolean casItem (E cmp, E val);
            
            // Atomically set this node's item.
            void setItem (E val);
            
            // Returns this node's next node.
            Node<E> getNext ();
            
            // Atomic compare-and-set of this node's next node.
            boolean casNext (Node<E> cmp, Node<E> val);
            
            // Atomically set this node's next node.
            void setNext (Node<E> val);
            }
    
        // Pointer to header node, initialized to a dummy node.  The first
        // actual node is at head.getNext ().
        private transient volatile Node<E> head = new Node<E> (null, null);
    
        // Pointer to last node on list.
        private transient volatile Node<E> tail = head;
    
        // Atomic compare-and-set of this queue's head pointer.
        private boolean casHead (Node<E> cmp, Node<E> val);
    
        // Atomic compare-and-set of this queue's tail pointer.
        private boolean casTail (Node<E> cmp, Node<E> val);
    
        // Add the given item to the end of this queue.
        public boolean offer (E o)
            {
            if (o == null) throw new NullPointerException();
            Node<E> n = new Node<E> (o, null);
            for (;;)
                {
                Node<E> t = tail;
                Node<E> s = t.getNext();
                if (t == tail)
                    {
                    if (s == null)
                        {
                        if (t.casNext (s, n))
                            {
                            casTail (t, n);
                            return true;
                            }
                        }
                    else
                        {
                        casTail (t, s);
                        }
                    }
                }
            }
    
        // Remove and return the first item from this queue.
        public E poll()
            {
            for (;;)
                {
                Node<E> h = head;
                Node<E> t = tail;
                Node<E> first = h.getNext();
                if (h == head)
                    {
                    if (h == t)
                        {
                        if (first == null)
                            return null;
                        else
                            casTail (t, first);
                        }
                    else if (casHead (h, first))
                        {
                        E item = first.getItem();
                        first.setItem (null);
                        return item;
                        }
                    }
                }
            }
    
  • What about the ABA Problem?
    • In this lock-free concurrent queue algorithm, the ABA Problem can happen if you re-use nodes
    • But this Java implementation never re-uses nodes, it always allocates fresh ones
    • So modification counters are not needed
    • (Prof. Doug Lea, SUNY Oswego, private communication)


Concurrent Hash Map

  • Implemented in class java.util.concurrent.ConcurrentHashMap
     
  • Does use locking, but is designed to reduce thread contention for locks
     
  • The map is divided among N segments
    • Typically, N is (an estimate of) the number of threads that will be updating the map
    • The most significant bits of the key's hash code determine which segment to access
       
  • Within each segment, there is an array of hash buckets as in a typical hash map implementation
    • Update operations are programmed so that the segment is always in a consistent state, even in the middle of an update
    • This means that query operations do not need locking
    • Update operations are protected with a lock
       
  • When threads query the map:
    • There is no locking, hence good performance
       
  • When threads update the map:
    • Assume threads update the map with keys having "random" hash codes
    • Then most of the time threads will not access the same segment
    • Thus, most locking will encounter the no-contention case, for which locks are typically optimized to give good performance

Parallel Computing II 4003-532-70/4005-736-70 Spring Quarter 2007
Course Page
Alan Kaminsky Department of Computer Science Rochester Institute of Technology 4486 + 2220 = 6706
Home Page
Copyright © 2007 Alan Kaminsky. All rights reserved. Last updated 14-May-2007. Please send comments to ark­@­cs.rit.edu.