Alan Kaminsky Department of Computer Science Rochester Institute of Technology 4531 + 2408 = 6939
Home Page

Parallel Java
A Unified API for Shared Memory and
Cluster Parallel Programming in 100% Java
Lecture Notes

Prof. Alan Kaminsky
Department of Computer Science
B. Thomas Golisano College of Computing and Information Sciences
Rochester Institute of Technology
Rochester, NY, USA

Presented at the 9th International Workshop on Java and Components for Parallelism, Distribution and Concurrency
International Parallel and Distributed Processing Symposium (IPDPS 2007)
Long Beach, CA, USA
March 26, 2007


Overview


Parallel Computer Architectures

  • Shared memory multiprocessor (SMP) parallel computers
    • Even consumer desktop machines are starting to look like this
       
     
  • Cluster parallel computers
    • Soon you won't even be able to build these any more
       
     
  • Hybrid SMP cluster parallel computers
    • Soon all clusters will look like this
       
     
  • The RIT CS Department's parallel computers, used for teaching parallel computing courses:
     
  • SMP parallel computers
    • paradise.cs.rit.edu -- Four Sun UltraSPARC-IV dual-core CPUs, eight processors, 1.35 GHz clock, 16 GB main memory
    • parasite.cs.rit.edu -- Four Sun UltraSPARC-IV dual-core CPUs, eight processors, 1.35 GHz clock, 16 GB main memory
    • paradox.cs.rit.edu -- Four Sun UltraSPARC-II CPUs, four processors, 450 MHz clock, 4 GB main memory
    • paragon.cs.rit.edu -- Four Sun UltraSPARC-II CPUs, four processors, 450 MHz clock, 4 GB main memory
       
  • Cluster parallel computer
    • Frontend computer -- paranoia.cs.rit.edu -- UltraSPARC-II CPU, 296 MHz clock, 192 MB main memory
    • 32 backend computers -- thug01 through thug32 -- each an UltraSPARC-IIi CPU, 440 MHz clock, 256 MB main memory
    • 100-Mbps switched Ethernet backend interconnection network
    • Aggregate 14 GHz clock, 8 GB main memory
       
  • Hybrid SMP cluster parallel computer
    • Frontend computer -- tardis.cs.rit.edu -- UltaSPARC-IIe CPU, 650 MHz clock, 512 MB main memory
    • 10 backend computers -- dr00 through dr09 -- each with two AMD Opteron 2218 dual-core CPUs, four processors, 2.6 GHz clock, 8 GB main memory
    • 1-Gbps switched Ethernet backend interconnection network
    • Aggregate 104 GHz clock speed, 80 GB main memory


Motivations for Parallel Java

  • Parallel computing is moving into nontraditional domains
    • Graphics, animation, data mining, informatics, . . .
    • Applications tend to be written in newer languages like Java, not Fortran or C
       
  • Java is becoming the language of choice for learning programming
    • ACM Java Task Force resources for teaching introductory programming in Java -- http://jtf.acm.org/
       
  • Soon everyone's desktop or laptop PC will be an SMP parallel computer
    • That is, will have a multicore CPU chip
    • Every computing student will have to learn how to write parallel programs, not just computational scientists
       
  • Standard parallel programming libraries have not caught up to the multicore revolution
    • OpenMP addresses only SMP parallel programming
    • MPI addresses only cluster parallel programming
    • Programming a hybrid SMP cluster computer requires mastering two large and intricate APIs and their interactions
       
  • Hence: Parallel Java
    • A single unified API . . .
    • For thread-based SMP parallel programming, with features inspired by OpenMP . . .
    • And for message-passing cluster parallel programming, with features inspired by MPI . . .
    • And for hybrid SMP cluster parallel programming . . .
    • In 100% Java
       
  • For instructors and students: Teach and learn parallel programming in the language of choice, Java
     
  • For developers: Master just one API whose SMP and cluster programming features are designed to work well together
     
  • For me: Explore just how much you can do with Java for high performance computing
    • Both functionality-wise and performance-wise


Programming with Parallel Java -- Example 1

  • Example: Cryptanalysis using exhaustive key search
     

     
  • Advanced Encryption Standard (AES)
    • Plaintext block (128 bits) goes in
    • Encrypted ciphertext block (128 bits) comes out
    • Uses a 256-bit secret key
       
  • Known plaintext attack
    • Plaintext block is known
    • Corresponding ciphertext block is known
    • Key is not known
    • Problem: Discover the key
    • Once you have the key, you can decrypt everything (that was encrypted with that key)
       
  • Exhaustive key search
    • For every key from 0 to 2256-1, encrypt the given plaintext with that key, and see if you get the given ciphertext
       
  • Partial exhaustive key search
    • 256-N bits of the key are known, N bits of the key are unknown
    • For every combination of unknown key bits from 0 to 2N-1, encrypt the given plaintext with that key, and see if you get the given ciphertext
       
  • Sequential program -- Class FindKeySeq
     
  • SMP parallel program -- Class FindKeySmp3
     
  • Cluster parallel program -- Class FindKeyClu
     
  • Hybrid SMP cluster parallel program -- Class FindKeyHyb


Parallel Java Performance -- Example 1

  • Running time measurements on the tardis.cs.rit.edu machine
    • 10 backend processors, each with two dual-core CPU chips
    • N = Problem size = number of keys searched = 32M, 64M, 128M, 256M
    • K = Number of processors
    • T = Running time (msec)
       
  • SMP parallel program -- Class FindKeySmp3 -- 1 process, 1 to 4 threads

       
  • Cluster parallel program -- Class FindKeyClu -- 1 to 4 processes, each with 1 thread

       
  • Hybrid SMP cluster parallel program -- Class FindKeyHyb -- 1 to 4 processes, each with 4 threads


Programming with Parallel Java -- Example 2

  • Floyd's Algorithm
    • Calculates the length of the shortest path from each node to every other node in a network of N nodes, given the distance from each node to its adjacent nodes
    • On input, D is a distance matrix
      • An NxN matrix where D[i,j] is the distance from node i to adjacent node j
      • If node j is not adjacent to node i, then D[i,j] = infinity
      • (The distance matrix need not be symmetric)
    • On output, D[i,j] has been replaced by the length of the shortest path from node i to node j
      • If there is no path from node i to node j, then D[i,j] = infinity
    • Algorithm:
          for i = 0 to N-1
              for r = 0 to N-1
                  for c = 0 to N-1
                      D[r,c] = min (D[r,c], D[r,i] + D[i,c])
      
    • Running time = O(N3)
       
  • Example input distance matrix (N = 10)
    0.000 0.213 ∞     ∞     ∞     ∞     ∞     0.248 0.240 ∞     
    0.213 0.000 ∞     ∞     ∞     ∞     ∞     ∞     ∞     ∞     
    ∞     ∞     0.000 ∞     ∞     ∞     ∞     ∞     0.103 0.137 
    ∞     ∞     ∞     0.000 0.212 ∞     ∞     0.188 ∞     ∞     
    ∞     ∞     ∞     0.212 0.000 0.077 ∞     ∞     ∞     ∞     
    ∞     ∞     ∞     ∞     0.077 0.000 ∞     ∞     ∞     ∞     
    ∞     ∞     ∞     ∞     ∞     ∞     0.000 ∞     ∞     ∞     
    0.248 ∞     ∞     0.188 ∞     ∞     ∞     0.000 ∞     ∞     
    0.240 ∞     0.103 ∞     ∞     ∞     ∞     ∞     0.000 0.043 
    ∞     ∞     0.137 ∞     ∞     ∞     ∞     ∞     0.043 0.000 
    
  • Example output distance matrix
    0.000 0.213 0.344 0.436 0.648 0.726 ∞     0.248 0.240 0.284 
    0.213 0.000 0.557 0.649 0.861 0.939 ∞     0.461 0.453 0.497 
    0.344 0.557 0.000 0.780 0.992 1.069 ∞     0.592 0.103 0.137 
    0.436 0.649 0.780 0.000 0.212 0.289 ∞     0.188 0.677 0.720 
    0.648 0.861 0.992 0.212 0.000 0.077 ∞     0.400 0.888 0.932 
    0.726 0.939 1.069 0.289 0.077 0.000 ∞     0.477 0.966 1.009 
    ∞     ∞     ∞     ∞     ∞     ∞     0.000 ∞     ∞     ∞     
    0.248 0.461 0.592 0.188 0.400 0.477 ∞     0.000 0.488 0.532 
    0.240 0.453 0.103 0.677 0.888 0.966 ∞     0.488 0.000 0.043 
    0.284 0.497 0.137 0.720 0.932 1.009 ∞     0.532 0.043 0.000 
    
  • Sequential program -- Class FloydSeq
     
  • SMP parallel program -- Class FloydSmpRow
     
  • Cluster parallel program -- Class FloydClu
     
  • Hybrid SMP cluster parallel program -- Class FloydHyb


Parallel Java Performance -- Example 2

  • Running time measurements on the tardis.cs.rit.edu machine
    • 10 backend processors, each with two dual-core CPU chips
    • N = Problem size = number of network nodes = 2000, 2500, 3200, 4000
    • K = Number of processors
    • T = Running time (msec) for computation only (not file I/O)
       
  • SMP parallel program -- Class FloydSmpRow -- 1 process, 1 to 4 threads

       
  • Cluster parallel program -- Class FloydClu -- 1 to 4 processes, each with 1 thread

       
  • Hybrid SMP cluster parallel program -- Class FloydHyb -- 1 to 4 processes, each with 4 threads


Parallel Java Architecture

  • Parallel Java on a cluster
     

     
  • Job Scheduler Daemon runs on the frontend processor
    • Web interface for monitoring cluster status
       
  • SSH Daemon runs on each backend processor (not part of Parallel Java per se)
     
  • User runs a Java program with Parallel Java calls on the frontend processor
     
  • Parallel Java middleware, using SSH, runs a job backend process on each backend processor
     
  • Processes communicate using TCP sockets plus Parallel Java's own message passing layer


Status

  • Shared memory parallel programming features largely complete
     
  • Message passing parallel programming features partially complete
    • Some of the more esoteric collective communication operations are not implemented
       
  • I use Parallel Java in my Parallel Computing I and Parallel Computing II classes
    • Taught using Parallel Java for two years now
       
  • I started writing a parallel programming textbook using Parallel Java
    • Building Parallel Programs: SMPs, Clusters, and Java


Future Plans

  • Continue work on the Parallel Java Library
    • Continue improving performance of the parallel loop classes
    • Continue improving performance of the message passing classes
    • Implement remaining collective communication operations
    • Add parallel file I/O as in MPI-2.0
       
  • Continue teaching the Parallel Computing I and II classes with Parallel Java
     
  • Finish writing the textbook
    • To be published by Thomson Course Technology
    • Tentative publication date January 2009
       
  • Accumulate more performance measurements on standard parallel benchmarks
     
  • Compare performance of Parallel Java programs with Fortran/C/OpenMP/MPI programs
     
  • Work on solving scientific computing problems with Parallel Java programs
    • Computational medicine: MRI spin relaxometry -- done with mpiJava, need to redo with Parallel Java
    • Computational biology: Maximum parsimony phylogenetic tree construction


For Further Information


Running Time Data -- Example 1

  • Running time measurements on the tardis.cs.rit.edu machine
    • 10 backend processors, each with two dual-core CPU chips
    • N = Problem size = number of keys searched
    • K = Number of processors
    • T = Running time (msec)
       
  • SMP parallel program -- Class FindKeySmp3 -- 1 process, 1 to 4 threads
     
    • Raw running time data (msec)
         N    K   Run 1   Run 2   Run 3   Run 4   Run 5   Run 6   Run 7
      -----------------------------------------------------------------
       32M  seq   57554   55888   55898   55373   56628   55466   56340
       32M    1   55132   55108   55118   55109   55129   55129   55127
       32M    2   27855   27882   27886   27793   27847   27898   27886
       32M    3   20052   18514   19598   18516   18604   20037   19846
       32M    4   13990   13989   13997   14014   13878   14878   13912
      -----------------------------------------------------------------
       64M  seq  110613  113153  112272  112494  111997  112042  114074
       64M    1  110039  110060  110025  110059  110042  110043  110027
       64M    2   55582   55572   55771   55642   55620   54017   55401
       64M    3   37193   37079   36896   39464   36797   36855   36879
       64M    4   28687   27670   27641   27989   27892   29947   29670
      -----------------------------------------------------------------
      128M  seq  223830  221114  226464  228439  224009  223579  224229
      128M    1  219907  219938  219931  219912  219917  219913  220055
      128M    2  111108  111160  110978  111151  110973  110681  111162
      128M    3   73943   73636   73723   74341   78368   74586   78434
      128M    4   55627   58625   55239   56407   59402   55622   55232
      -----------------------------------------------------------------
      256M  seq  448199  447639  448544  447489  452227  442963  443627
      256M    1  439946  439657  439669  439669  439954  439776  439911
      256M    2  226232  236317  221227  221153  221355  221173  222203
      256M    3  148458  147751  147751  147918  147928  156770  147808
      256M    4  117842  117915  110956  117778  111206  111822  111150
      
    • Performance data based on median running time of 7 runs
        N    K       T  Spdup    Eff  |     N    K       T  Spdup    Eff
      --------------------------------+---------------------------------
      32M  seq   55898                |  128M  seq  224009
      32M    1   55127  1.014  1.014  |  128M    1  219917  1.019  1.019
      32M    2   27882  2.005  1.002  |  128M    2  111108  2.016  1.008
      32M    3   19598  2.852  0.951  |  128M    3   74341  3.013  1.004
      32M    4   13990  3.996  0.999  |  128M    4   55627  4.027  1.007
      --------------------------------+---------------------------------
      64M  seq  112272                |  256M  seq  447639
      64M    1  110042  1.020  1.020  |  256M    1  439776  1.018  1.018
      64M    2   55582  2.020  1.010  |  256M    2  221355  2.022  1.011
      64M    3   36896  3.043  1.014  |  256M    3  147918  3.026  1.009
      64M    4   27989  4.011  1.003  |  256M    4  111822  4.003  1.001
      
      
  • Cluster parallel program -- Class FindKeyClu -- 1 to 4 processes, each with 1 thread
     
    • Raw running time data (msec)
         N    K   Run 1   Run 2   Run 3   Run 4   Run 5   Run 6   Run 7
      -----------------------------------------------------------------
       32M  seq   56093   55433   55370   56598   55677   57116   57020
       32M    1   56063   55864   56534   56432   55866   56282   56238
       32M    2   28629   28224   28486   28301   28358   28385   28486
       32M    3   19273   19049   19527   19091   18901   19007   19182
       32M    4   14512   14258   14283   14267   14169   14307   14381
      -----------------------------------------------------------------
       64M  seq  111259  111406  113811  110616  112279  113315  113414
       64M    1  112703  111003  111514  111164  111070  111114  111162
       64M    2   56778   56018   56169   56606   57202   56040   55746
       64M    3   38011   37852   38100   37942   38543   37876   37868
       64M    4   28414   28288   28515   28278   28855   28879   28776
      -----------------------------------------------------------------
      128M  seq  223911  226603  221868  221101  228810  224519  223992
      128M    1  230116  225962  223623  221852  227974  224528  223135
      128M    2  111652  110914  112747  113791  111568  112976  114307
      128M    3   74977   76833   76514   75105   76936   76526   75448
      128M    4   56364   56716   57856   57604   56099   56268   56511
      -----------------------------------------------------------------
      256M  seq  446635  458383  448998  443373  447317  447569  452077
      256M    1  447636  447304  449702  449063  447011  449932  447347
      256M    2  223405  223451  225417  225791  226536  227557  224497
      256M    3  150542  152181  150320  154115  153313  153034  151609
      256M    4  112367  114819  112454  113711  114548  112209  112854
      
    • Performance data based on median running time of 7 runs
        N    K       T  Spdup    Eff  |     N    K       T  Spdup    Eff
      --------------------------------+---------------------------------
      32M  seq   56093                |  128M  seq  223992
      32M    1   56238  0.997  0.997  |  128M    1  224528  0.998  0.998
      32M    2   28385  1.976  0.988  |  128M    2  112747  1.987  0.993
      32M    3   19091  2.938  0.979  |  128M    3   76514  2.927  0.976
      32M    4   14283  3.927  0.982  |  128M    4   56511  3.964  0.991
      --------------------------------+---------------------------------
      64M  seq  112279                |  256M  seq  447569
      64M    1  111162  1.010  1.010  |  256M    1  447636  1.000  1.000
      64M    2   56169  1.999  0.999  |  256M    2  225417  1.986  0.993
      64M    3   37942  2.959  0.986  |  256M    3  152181  2.941  0.980
      64M    4   28515  3.938  0.984  |  256M    4  112854  3.966  0.991
      
      
  • Hybrid SMP cluster parallel program -- Class FindKeyHyb -- 1 to 4 processes, each with 4 threads
     
    • Raw running time data (msec)
         N    K   Run 1   Run 2   Run 3   Run 4   Run 5   Run 6   Run 7
      -----------------------------------------------------------------
       32M  seq   56310   56128   56463   56264   56297   56040   55573
       32M    4   14200   14100   14781   14922   14873   13997   14036
       32M    8    7539    7207    7333    7091    7484    7196    7129
       32M   12    5029    5031    5058    4861    4765    5069    5049
       32M   16    3855    3683    3790    3621    3727    4025    3678
      -----------------------------------------------------------------
       64M  seq  111439  112733  112808  112144  111382  112339  110704
       64M    4   27727   28071   27888   29518   27851   27803   28046
       64M    8   14840   14007   14619   14822   14068   14741   14826
       64M   12    9790   10128    9433    9842    9787    9899   10047
       64M   16    7478    7470    7476    7146    7516    7428    7502
      -----------------------------------------------------------------
      128M  seq  221286  224377  228848  222893  224856  225697  222933
      128M    4   55762   55189   55828   58887   55336   55734   55534
      128M    8   27851   29526   28043   28960   29603   29462   29371
      128M   12   19910   19650   19239   19691   18703   19874   19768
      128M   16   14720   15769   14002   15034   14833   14041   14849
      -----------------------------------------------------------------
      256M  seq  451873  450527  452347  449579  454331  442091  448688
      256M    4  110538  110776  117559  117530  110952  110463  111017
      256M    8   55601   58826   55502   55651   55464   58885   55661
      256M   12   37335   37196   37572   39284   39341   39518   39157
      256M   16   29658   29492   27955   29479   29177   27864   29530
      
    • Performance data based on median running time of 7 runs
        N    K       T   Spdup    Eff  |     N    K       T   Spdup    Eff
      ---------------------------------+----------------------------------
      32M  seq   56264                 |  128M  seq  224377
      32M    4   14200   3.962  0.991  |  128M    4   55734   4.026  1.006
      32M    8    7207   7.807  0.976  |  128M    8   29371   7.639  0.955
      32M   12    5031  11.183  0.932  |  128M   12   19691  11.395  0.950
      32M   16    3727  15.096  0.944  |  128M   16   14833  15.127  0.945
      ---------------------------------+----------------------------------
      64M  seq  112144                 |  256M  seq  450527
      64M    4   27888   4.021  1.005  |  256M    4  110952   4.061  1.015
      64M    8   14741   7.608  0.951  |  256M    8   55651   8.096  1.012
      64M   12    9842  11.394  0.950  |  256M   12   39157  11.506  0.959
      64M   16    7476  15.001  0.938  |  256M   16   29479  15.283  0.955
      


Running Time Data -- Example 2

  • Running time measurements on the tardis.cs.rit.edu machine
    • 10 backend processors, each with two dual-core CPU chips
    • N = Problem size = number of network nodes
    • K = Number of processors
    • T = Running time (msec) for computation only (not file I/O)
       
  • SMP parallel program -- Class FloydSmpRow -- 1 process, 1 to 4 threads
     
    • Raw running time data (msec)
         N    K   Run 1   Run 2   Run 3   Run 4   Run 5   Run 6   Run 7
      -----------------------------------------------------------------
      2000  seq   97113   99085   98948   93671   94408   97140   96513
      2000    1   67057   67670   64367   67563   67373   67526   65576
      2000    2   34662   35782   36223   36414   36784   34158   37070
      2000    3   24532   26125   27204   24710   23798   26082   25952
      2000    4   21260   20669   21822   21593   21700   23701   20447
      -----------------------------------------------------------------
      2500  seq  186063  217191  217615  191590  189348  190165  180123
      2500    1  131027  129526  126603  132528  127296  129003  133174
      2500    2   69734   67110   69780   67836   68175   70753   68435
      2500    3   47995   47582   48740   47389   53106   52267   49738
      2500    4   47511   38973   38829   43564   41002   37614   38511
      -----------------------------------------------------------------
      3200  seq  394730  598482  379841  382651  382114  395578  392587
      3200    1  274454  281809  274050  281935  272110  273204  269594
      3200    2  146226  151452  146754  139883  150754  150534  147635
      3200    3  109583   99601  103254  108057  104460  109318  109124
      3200    4   96488   90686   80469   92404   92076   99219   97737
      -----------------------------------------------------------------
      4000  seq  731069  732866  730051  759480  745603  792375  767126
      4000    1  532354  533930  546589  533816  537893  531977  531226
      4000    2  272714  277347  290291  287493  298268  299514  283927
      4000    3  212327  207476  200459  208415  202730  190989  203084
      4000    4  164552  165455  152348  178772  169537  165960  189851
      
    • Performance data based on median running time of 7 runs
         N    K       T  Spdup    Eff  |     N    K       T  Spdup    Eff
      ---------------------------------+---------------------------------
      2000  seq   97113                |  3200  seq  392587
      2000    1   67373  1.441  1.441  |  3200    1  274050  1.433  1.433
      2000    2   36223  2.681  1.340  |  3200    2  147635  2.659  1.330
      2000    3   25952  3.742  1.247  |  3200    3  108057  3.633  1.211
      2000    4   21593  4.497  1.124  |  3200    4   92404  4.249  1.062
      ---------------------------------+---------------------------------
      2500  seq  190165                |  4000  seq  745603
      2500    1  129526  1.468  1.468  |  4000    1  533816  1.397  1.397
      2500    2   68435  2.779  1.389  |  4000    2  287493  2.593  1.297
      2500    3   48740  3.902  1.301  |  4000    3  203084  3.671  1.224
      2500    4   38973  4.879  1.220  |  4000    4  165960  4.493  1.123
      
      
  • Cluster parallel program -- Class FloydClu -- 1 to 4 processes, each with 1 thread
     
    • Raw running time data (msec)
         N    K   Run 1   Run 2   Run 3   Run 4   Run 5   Run 6   Run 7
      -----------------------------------------------------------------
      2000  seq   95829   93626   97073  119715   93656   93963   96778
      2000    1   76081   76612   76153   76761   76043   76131  120483
      2000    2   38891   38608   38782   39367   39265   39194   38972
      2000    3   26409   26409   26377   25965   26470   26426   26445
      2000    4   20069   22724   19723   20111   19732   20167   20060
      -----------------------------------------------------------------
      2500  seq  188945  278582  181498  192093  186893  271585  192097
      2500    1  148169  148301  148775  147408  148472  148149  146089
      2500    2   75189   73653   74360   75181   74290   74998   74843
      2500    3   50756   50044   50605   50554   50481   50308   50350
      2500    4   38169   38160   38141   38128   38270   38034   38119
      -----------------------------------------------------------------
      3200  seq  379428  395100  378885  379657  395567  391628  384953
      3200    1  307701  303635  303056  305669  307553  305448  307532
      3200    2  153887  155461  154936  154351  154607  247483  154256
      3200    3  103526  104033  101934  103090  102651  103705  101656
      3200    4   77603   78591   79202   77175   77797   78921   78390
      -----------------------------------------------------------------
      4000  seq  730375  767507  756481  905333  753473  760107  770087
      4000    1  583924  579399  586376  941518  584101  584242  588733
      4000    2  300765  298561  299811  296489  293777  297021  292867
      4000    3  198491  199893  197299  199329  198110  201843  195711
      4000    4  149874  149214  150304  150631  150831  150422  151087
      
    • Performance data based on median running time of 7 runs
         N    K       T  Spdup    Eff  |     N    K       T  Spdup    Eff
      ---------------------------------+---------------------------------
      2000  seq   95829                |  3200  seq  384953
      2000    1   76153  1.258  1.258  |  3200   1   305669  1.259  1.259
      2000    2   38972  2.459  1.229  |  3200   2   154607  2.490  1.245
      2000    3   26409  3.629  1.210  |  3200   3   103090  3.734  1.245
      2000    4   20069  4.775  1.194  |  3200   4    78390  4.911  1.228
      ---------------------------------+---------------------------------
      2500  seq  192093                |  4000 seq   760107
      2500    1  148169  1.296  1.296  |  4000   1   584242  1.301  1.301
      2500    2   74843  2.567  1.283  |  4000   2   297021  2.559  1.280
      2500    3   50481  3.805  1.268  |  4000   3   198491  3.829  1.276
      2500    4   38141  5.036  1.259  |  4000   4   150422  5.053  1.263
      
      
  • Hybrid SMP cluster parallel program -- Class FloydHyb -- 1 to 4 processes, each with 4 threads
     
    • Raw running time data (msec)
         N    K   Run 1   Run 2   Run 3   Run 4   Run 5   Run 6   Run 7
      -----------------------------------------------------------------
      2000  seq   95376   97130   96306   96309   96578   95076   98759
      2000    4   23503   21650   20597   22375   24933   21825   20539
      2000    8   12369   12867   11351   12885   13802   12850   12954
      2000   12    9874    9600   10517   10592    9814    9984   10092
      2000   16    7817    7780    7754    7947    7775    7656    7888
      -----------------------------------------------------------------
      2500  seq  278491  188900  190663  189832  192424  189394  188200
      2500    4   42337   43052   41499   46714   36733   46795   44571
      2500    8   26495   24024   24359   24170   23485   25513   26596
      2500   12   18453   18362   18426   18293   18639   18341   18433
      2500   16   14597   14538   14780   14839   14396   14758   14661
      -----------------------------------------------------------------
      3200  seq  382824  394215  381224  382087  393131  394416  380780
      3200    4   80895   95277   89185   96307   87662   93239   91036
      3200    8   44654   56366   49481   49506   50378   49746   46708
      3200   12   37511   37625   37959   31491   37458   38274   39091
      3200   16   30133   29728   27584   29385   28692   30015   29608
      -----------------------------------------------------------------
      4000  seq  751262  768429  741521  760933  761249  733081  767620
      4000    4  175165  189191  192225  182727  173654  171494  165932
      4000    8  102282  111787   96570   93923   89797  111533  106312
      4000   12   70304   73596   65165   75121   64560   67690   76289
      4000   16   54833   56919   55561   54969   55687   56324   56447
      
    • Performance data based on median running time of 7 runs
         N    K       T   Spdup    Eff  |     N    K       T   Spdup    Eff
      ----------------------------------+----------------------------------
      2000  seq   96309                 |  3200  seq  382824
      2000    4   21825   4.413  1.103  |  3200    4   91036   4.205  1.051
      2000    8   12867   7.485  0.936  |  3200    8   49506   7.733  0.967
      2000   12    9984   9.646  0.804  |  3200   12   37625  10.175  0.848
      2000   16    7780  12.379  0.774  |  3200   16   29608  12.930  0.808
      ----------------------------------+----------------------------------
      2500  seq  189832                 |  4000  seq  760933
      2500    4   43052   4.409  1.102  |  4000    4  175165   4.344  1.086
      2500    8   24359   7.793  0.974  |  4000    8  102282   7.440  0.930
      2500   12   18426  10.302  0.859  |  4000   12   70304  10.823  0.902
      2500   16   14661  12.948  0.809  |  4000   16   55687  13.664  0.854
      

Alan Kaminsky Department of Computer Science Rochester Institute of Technology 4531 + 2408 = 6939
Home Page
Copyright © 2007 Alan Kaminsky. All rights reserved. Last updated 24-Mar-2007. Please send comments to ark­@­cs.rit.edu.