Alan Kaminsky Department of Computer Science Rochester Institute of Technology 4486 + 2220 = 6706
Home Page
Parallel Computing II 4003-532-70/4005-736-70 Spring Quarter 2007
Course Page

4003-532/4005-736 Parallel Computing II
Module 2 Lecture Notes -- Hybrid SMP Cluster Programming
Parallel Datastore Querying

Prof. Alan Kaminsky
Rochester Institute of Technology -- Department of Computer Science


Datastore Querying

  • Datastore = A collection of pre-computed data that can be queried
    • A more general and abstract concept than "database"
       
  • Examples of datastores
    • Flat file, plain text
    • Flat file, binary (e.g. java.io.DataOutput, java.io.ObjectOutput)
    • Database
    • Data computed "on the fly"
       
  • Query = A question to be answered using data in the datastore
    • Computing the answer may require very little computation
    • Computing the answer may require lots of computation
       
  • Parallel datastore querying
    • To get a speedup or sizeup when . . .
    • Size of datastore is large, or . . .
    • Number of queries is large, or . . .
    • Computation needed for each query is large, or . . .
    • All of the above!
       
  • Strategy 1
    • Size of datastore is small (can fit in main memory)
    • Replicate the datastore in all processes
    • Partition the queries among the processes
       
  • Strategy 2
    • Size of datastore is large (can't fit in main memory)
    • Partition the datastore among the processes
    • Replicate queries to all processes
    • Combine (reduce) partial query results from each process


Real-world examples of parallel datastore querying


Prime Finding Using Parallel Datastore Querying

  • Problem: Find all primes up to 2N, N <= 32
     
  • Use Strategy 1
    • Datastore = List of all odd primes up to 216 (there are only 6,541 of them)
    • Datastore replicated in each process, computed on the fly
    • Query = A block of 216 numbers -- 0..65535, 65536..131071, etc. up to 2N
    • Query computation = Attempt to find a factor of each odd number in the block, using trial division by the primes in the datastore
    • If there are no factors, the number is prime
    • Store query results (list of prime numbers) in an output file
       
  • Sequential program
  • Hybrid SMP cluster parallel program
    • Package edu.rit.hyb.prime
    • Master-worker pattern for process parallelism
      • Master sends a query block to worker
      • Worker sends list of primes in that block to master
      • Master writes list of primes to output file
      • Master-worker pattern gives load balancing
    • Parallel loop pattern for thread parallelism in each worker
      • Numbers in query block tested for primality in parallel
      • Guided schedule for load balancing
         
  • Example: Generate primes up to 2N for N = 26
    java -Dpj.np=$KP -Dpj.nt=$KT edu.rit.hyb.prime.Prime32Hyb primes.dat $N
    
  • Running times for the above example on the Tardis hybrid SMP cluster parallel computer
    • K = Kp * Kt = Calculation parallelism
                                Due to Kp
     N  Kp  Kt   K  T (msec)   Spdup  Effi.
    ---------------------------------------
    26         seq    255081
    26   1   1   1    252264   1.011  1.011
    26   2   1   2    125973   2.025  1.012
    26   3   1   3     83283   3.063  1.021
    26   4   1   4     63116   4.041  1.010
    26   5   1   5     50489   5.016  1.003
    26   6   1   6     42217   6.042  1.007
    26   7   1   7     36118   7.062  1.009
    26   8   1   8     31577   8.078  1.010
    26   9   1   9     28208   9.043  1.005
    26  10   1  10     25375  10.052  1.005
    
                               Due to Kp+Kt   Due to Kt
     N  Kp  Kt   K  T (msec)   Spdup  Effi.  Spdup  Effi.
    -----------------------------------------------------
    26   1   4   4     62952   4.052  1.013  4.007  1.002
    26   2   4   8     31650   8.059  1.007  3.980  0.995
    26   3   4  12     21502  11.863  0.989  3.873  0.968
    26   4   4  16     16157  15.788  0.987  3.906  0.977
    26   5   4  20     13168  19.371  0.969  3.834  0.959
    26   6   4  24     10956  23.282  0.970  3.853  0.963
    26   7   4  28      9480  26.907  0.961  3.810  0.952
    26   8   4  32      8335  30.604  0.956  3.788  0.947
    26   9   4  36      7320  34.847  0.968  3.854  0.963
    26  10   4  40      6777  37.639  0.941  3.744  0.936
    

     
  • Example: Generate primes up to 2N for N = 32
    • 32 minutes on all 40 CPUs of the tardis cluster
    • A whole day (21.4 hours) if done sequentially
    java -Dpj.np=10 -Dpj.nt=4 edu.rit.hyb.prime.Prime32Hyb primes.dat 32
    Job 76, dr04, dr08, dr05, dr06, dr07, dr01, dr02, dr09, dr00, dr03
    1921092 msec 8
    1921094 msec 9
    1921145 msec 5
    1921387 msec 6
    1921400 msec 2
    1921481 msec 4
    1921500 msec 7
    1921516 msec 3
    1921508 msec 1
    203280221 primes
    1921536 msec 0
    

Parallel Computing II 4003-532-70/4005-736-70 Spring Quarter 2007
Course Page
Alan Kaminsky Department of Computer Science Rochester Institute of Technology 4486 + 2220 = 6706
Home Page
Copyright © 2007 Alan Kaminsky. All rights reserved. Last updated 09-Apr-2007. Please send comments to ark­@­cs.rit.edu.