Alan Kaminsky Department of Computer Science Rochester Institute of Technology 4486 + 2220 = 6706
Home Page
Parallel Computing II 4003-532-70/4005-736-70 Spring Quarter 2007
Course Page

4003-532/4005-736 Parallel Computing II
Module 2 Lecture Notes -- Hybrid SMP Cluster Programming
Massively Parallel Problems

Prof. Alan Kaminsky
Rochester Institute of Technology -- Department of Computer Science


Problem: Cryptanalysis Using Exhaustive Key Search

  • Advanced Encryption Standard (AES)
    • Plaintext block (128 bits) goes in
    • Encrypted ciphertext block (128 bits) comes out
    • Uses a 256-bit secret key
       
  • Known plaintext attack
    • Plaintext block is known
    • Corresponding ciphertext block is known
    • Key is not known
    • Problem: Discover the key
    • Once you have the key, you can decrypt everything (that was encrypted with that key)
       
  • Exhaustive key search
    • For every key from 0 to 2256-1, encrypt the given plaintext with that key, and see if you get the given ciphertext
       
  • Partial exhaustive key search
    • 256-N bits of the key are known, N bits of the key are unknown
    • For every combination of unknown key bits from 0 to 2N-1, encrypt the given plaintext with that key, and see if you get the given ciphertext
       
  • For further information, see Parallel Computing I Module 2 Lecture Notes


Massively Parallel Problems

  • Massively parallel problem = Problem where there are no dependencies between the pieces
     
  • On a SMP computer, this means that during the computation, no thread needs to use data created by another thread
    • Reduces thread coordination overhead
    • Thus, potential for good parallel performance
       
  • On a cluster, this means that during the computation, no process needs to send messages to any other process
    • Reduces communication overhead
    • Thus, potential for good parallel performance


Sequential Solution


Hybrid SMP Cluster Solution


Parallel Program Performance Metrics

  • N = Problem size
    • Must be directly proportional to the amount of computation
       
  • K = Number of processors
     
  • T(N,K) = Running time
    • Depends on problem size and number of processors
    • You need to carefully define what is included in the running time -- computation only? I/O also?
    • It's best to do several (e.g., seven) running time measurements for the same N and K, then take the median of the results
       
  • S(N,K) = Speed in units of computations/second
    • S(N,K) = 1 / T(N,K)
    • Depends on problem size and number of processors
       
  • Speedup(N,K) = Speedup -- N stays constant, K increases
    • Speedup(N,K) = S(N,K) / S(N,sequential) = T(N,sequential) / T(N,K)
    • How much faster can we solve the same problem by going to more processors?
    • Ideal speedup = K
    • Calculated with respect to the running time for a sequential version of the program, not a parallel version with K=1
       
  • Eff(N,K) = Efficiency -- N stays constant, K increases
    • Eff(N,K) = Speedup(N,K) / IdealSpeedup = Speedup(N,K) / K
    • Ideal efficiency = 1
       
  • Plots of running time, speedup, efficiency versus K
    • See Building Parallel Programs, Chapter 7


Hybrid Cluster SMP Version Performance

  • Running time measurements on the Tardis cluster
     
  • Problem size N = 32M, 64M, 128M, 256M keys searched (25, 26, 27, 28 missing key bits)
     
  • Number of parallel processes Kp = 1, 2, 3, 4
     
  • Number of parallel threads per process Kt = 4
     
  • Number of effective processors K = 4, 8, 12, 16
     
  • Running time T (msec) = median of seven program runs


  N    K       T   Spdup    Eff         N    K       T   Spdup    Eff
---  ---  ------  ------  -----      ----  ---  ------  ------  -----
32M  seq   55573                     128M  seq  223269
32M    4   14083   3.946  0.987      128M    4   55564   4.018  1.005
32M    8    7156   7.766  0.971      128M    8   27947   7.989  0.999
32M   12    4813  11.546  0.962      128M   12   18971  11.769  0.981
32M   16    3700  15.021  0.939      128M   16   14489  15.410  0.963

64M  seq  111175                     256M  seq  442091
64M    4   27898   3.985  0.996      256M    4  110749   3.992  0.998
64M    8   14038   7.920  0.990      256M    8   55576   7.955  0.994
64M   12    9433  11.786  0.982      256M   12   37368  11.831  0.986
64M   16    7146  15.558  0.972      256M   16   27910  15.840  0.990

Parallel Computing I 4003-531-01/4005-735-01 Spring Quarter 2013
Course Page
Alan Kaminsky Department of Computer Science Rochester Institute of Technology 4486 + 2220 = 6706
Home Page
Copyright © 2007 Alan Kaminsky. All rights reserved. Last updated 20-Mar-2007. Please send comments to ark­@­cs.rit.edu.