Parallel Java
A Unified API for Shared Memory and
Cluster Parallel Programming in 100% Java
Lecture Notes
Prof. Alan Kaminsky
Department of Computer Science
B. Thomas Golisano College of Computing and Information Sciences
Rochester Institute of Technology
Rochester, NY, USA
Presented at the 9th International Workshop on Java and Components for Parallelism, Distribution and Concurrency
International Parallel and Distributed Processing Symposium (IPDPS 2007)
Long Beach, CA, USA
March 26, 2007
Overview
Parallel Computer Architectures
- Shared memory multiprocessor (SMP) parallel computers
- Even consumer desktop machines are starting to look like this
- Cluster parallel computers
- Soon you won't even be able to build these any more
- Hybrid SMP cluster parallel computers
- Soon all clusters will look like this
- The RIT CS Department's parallel computers, used for teaching parallel computing courses:
- SMP parallel computers
- paradise.cs.rit.edu -- Four Sun UltraSPARC-IV dual-core CPUs, eight processors, 1.35 GHz clock, 16 GB main memory
- parasite.cs.rit.edu -- Four Sun UltraSPARC-IV dual-core CPUs, eight processors, 1.35 GHz clock, 16 GB main memory
- paradox.cs.rit.edu -- Four Sun UltraSPARC-II CPUs, four processors, 450 MHz clock, 4 GB main memory
- paragon.cs.rit.edu -- Four Sun UltraSPARC-II CPUs, four processors, 450 MHz clock, 4 GB main memory
- Cluster parallel computer
- Frontend computer -- paranoia.cs.rit.edu -- UltraSPARC-II CPU, 296 MHz clock, 192 MB main memory
- 32 backend computers -- thug01 through thug32 -- each an UltraSPARC-IIi CPU, 440 MHz clock, 256 MB main memory
- 100-Mbps switched Ethernet backend interconnection network
- Aggregate 14 GHz clock, 8 GB main memory
- Hybrid SMP cluster parallel computer
- Frontend computer -- tardis.cs.rit.edu -- UltaSPARC-IIe CPU, 650 MHz clock, 512 MB main memory
- 10 backend computers -- dr00 through dr09 -- each with two AMD Opteron 2218 dual-core CPUs, four processors, 2.6 GHz clock, 8 GB main memory
- 1-Gbps switched Ethernet backend interconnection network
- Aggregate 104 GHz clock speed, 80 GB main memory
Motivations for Parallel Java
- Parallel computing is moving into nontraditional domains
- Graphics, animation, data mining, informatics, . . .
- Applications tend to be written in newer languages like Java, not Fortran or C
- Java is becoming the language of choice for learning programming
- ACM Java Task Force resources for teaching introductory programming in Java -- http://jtf.acm.org/
- Soon everyone's desktop or laptop PC will be an SMP parallel computer
- That is, will have a multicore CPU chip
- Every computing student will have to learn how to write parallel programs, not just computational scientists
- Standard parallel programming libraries have not caught up to the multicore revolution
- OpenMP addresses only SMP parallel programming
- MPI addresses only cluster parallel programming
- Programming a hybrid SMP cluster computer requires mastering two large and intricate APIs and their interactions
- Hence: Parallel Java
- A single unified API . . .
- For thread-based SMP parallel programming, with features inspired by OpenMP . . .
- And for message-passing cluster parallel programming, with features inspired by MPI . . .
- And for hybrid SMP cluster parallel programming . . .
- In 100% Java
- For instructors and students: Teach and learn parallel programming in the language of choice, Java
- For developers: Master just one API whose SMP and cluster programming features are designed to work well together
- For me: Explore just how much you can do with Java for high performance computing
- Both functionality-wise and performance-wise
Programming with Parallel Java -- Example 1
- Example: Cryptanalysis using exhaustive key search
- Advanced Encryption Standard (AES)
- Plaintext block (128 bits) goes in
- Encrypted ciphertext block (128 bits) comes out
- Uses a 256-bit secret key
- Known plaintext attack
- Plaintext block is known
- Corresponding ciphertext block is known
- Key is not known
- Problem: Discover the key
- Once you have the key, you can decrypt everything (that was encrypted with that key)
- Exhaustive key search
- For every key from 0 to 2256-1, encrypt the given plaintext with that key, and see if you get the given ciphertext
- Partial exhaustive key search
- 256-N bits of the key are known, N bits of the key are unknown
- For every combination of unknown key bits from 0 to 2N-1, encrypt the given plaintext with that key, and see if you get the given ciphertext
- Sequential program -- Class FindKeySeq
- SMP parallel program -- Class FindKeySmp3
- Cluster parallel program -- Class FindKeyClu
- Hybrid SMP cluster parallel program -- Class FindKeyHyb
Parallel Java Performance -- Example 1
- Running time measurements on the tardis.cs.rit.edu machine
- 10 backend processors, each with two dual-core CPU chips
- N = Problem size = number of keys searched = 32M, 64M, 128M, 256M
- K = Number of processors
- T = Running time (msec)
- SMP parallel program -- Class FindKeySmp3 -- 1 process, 1 to 4 threads
- Cluster parallel program -- Class FindKeyClu -- 1 to 4 processes, each with 1 thread
- Hybrid SMP cluster parallel program -- Class FindKeyHyb -- 1 to 4 processes, each with 4 threads
Programming with Parallel Java -- Example 2
- Floyd's Algorithm
- Calculates the length of the shortest path from each node to every other node in a network of N nodes, given the distance from each node to its adjacent nodes
- On input, D is a distance matrix
- An NxN matrix where D[i,j] is the distance from node i to adjacent node j
- If node j is not adjacent to node i, then D[i,j] = infinity
- (The distance matrix need not be symmetric)
- On output, D[i,j] has been replaced by the length of the shortest path from node i to node j
- If there is no path from node i to node j, then D[i,j] = infinity
- Algorithm:
for i = 0 to N-1
for r = 0 to N-1
for c = 0 to N-1
D[r,c] = min (D[r,c], D[r,i] + D[i,c])
- Running time = O(N3)
- Example input distance matrix (N = 10)
0.000 0.213 ∞ ∞ ∞ ∞ ∞ 0.248 0.240 ∞
0.213 0.000 ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞
∞ ∞ 0.000 ∞ ∞ ∞ ∞ ∞ 0.103 0.137
∞ ∞ ∞ 0.000 0.212 ∞ ∞ 0.188 ∞ ∞
∞ ∞ ∞ 0.212 0.000 0.077 ∞ ∞ ∞ ∞
∞ ∞ ∞ ∞ 0.077 0.000 ∞ ∞ ∞ ∞
∞ ∞ ∞ ∞ ∞ ∞ 0.000 ∞ ∞ ∞
0.248 ∞ ∞ 0.188 ∞ ∞ ∞ 0.000 ∞ ∞
0.240 ∞ 0.103 ∞ ∞ ∞ ∞ ∞ 0.000 0.043
∞ ∞ 0.137 ∞ ∞ ∞ ∞ ∞ 0.043 0.000
- Example output distance matrix
0.000 0.213 0.344 0.436 0.648 0.726 ∞ 0.248 0.240 0.284
0.213 0.000 0.557 0.649 0.861 0.939 ∞ 0.461 0.453 0.497
0.344 0.557 0.000 0.780 0.992 1.069 ∞ 0.592 0.103 0.137
0.436 0.649 0.780 0.000 0.212 0.289 ∞ 0.188 0.677 0.720
0.648 0.861 0.992 0.212 0.000 0.077 ∞ 0.400 0.888 0.932
0.726 0.939 1.069 0.289 0.077 0.000 ∞ 0.477 0.966 1.009
∞ ∞ ∞ ∞ ∞ ∞ 0.000 ∞ ∞ ∞
0.248 0.461 0.592 0.188 0.400 0.477 ∞ 0.000 0.488 0.532
0.240 0.453 0.103 0.677 0.888 0.966 ∞ 0.488 0.000 0.043
0.284 0.497 0.137 0.720 0.932 1.009 ∞ 0.532 0.043 0.000
- Sequential program -- Class FloydSeq
- SMP parallel program -- Class FloydSmpRow
- Cluster parallel program -- Class FloydClu
- Hybrid SMP cluster parallel program -- Class FloydHyb
Parallel Java Performance -- Example 2
- Running time measurements on the tardis.cs.rit.edu machine
- 10 backend processors, each with two dual-core CPU chips
- N = Problem size = number of network nodes = 2000, 2500, 3200, 4000
- K = Number of processors
- T = Running time (msec) for computation only (not file I/O)
- SMP parallel program -- Class FloydSmpRow -- 1 process, 1 to 4 threads
- Cluster parallel program -- Class FloydClu -- 1 to 4 processes, each with 1 thread
- Hybrid SMP cluster parallel program -- Class FloydHyb -- 1 to 4 processes, each with 4 threads
Parallel Java Architecture
- Parallel Java on a cluster
- Job Scheduler Daemon runs on the frontend processor
- Web interface for monitoring cluster status
- SSH Daemon runs on each backend processor (not part of Parallel Java per se)
- User runs a Java program with Parallel Java calls on the frontend processor
- Parallel Java middleware, using SSH, runs a job backend process on each backend processor
- Processes communicate using TCP sockets plus Parallel Java's own message passing layer
Status
- Shared memory parallel programming features largely complete
- Message passing parallel programming features partially complete
- Some of the more esoteric collective communication operations are not implemented
- I use Parallel Java in my Parallel Computing I and Parallel Computing II classes
- Taught using Parallel Java for two years now
- I started writing a parallel programming textbook using Parallel Java
- Building Parallel Programs: SMPs, Clusters, and Java
Future Plans
- Continue work on the Parallel Java Library
- Continue improving performance of the parallel loop classes
- Continue improving performance of the message passing classes
- Implement remaining collective communication operations
- Add parallel file I/O as in MPI-2.0
- Continue teaching the Parallel Computing I and II classes with Parallel Java
- Finish writing the textbook
- To be published by Thomson Course Technology
- Tentative publication date January 2009
- Accumulate more performance measurements on standard parallel benchmarks
- Compare performance of Parallel Java programs with Fortran/C/OpenMP/MPI programs
- Work on solving scientific computing problems with Parallel Java programs
- Computational medicine: MRI spin relaxometry -- done with mpiJava, need to redo with Parallel Java
- Computational biology: Maximum parsimony phylogenetic tree construction
For Further Information
Running Time Data -- Example 1
- Running time measurements on the tardis.cs.rit.edu machine
- 10 backend processors, each with two dual-core CPU chips
- N = Problem size = number of keys searched
- K = Number of processors
- T = Running time (msec)
- SMP parallel program -- Class FindKeySmp3 -- 1 process, 1 to 4 threads
- Raw running time data (msec)
N K Run 1 Run 2 Run 3 Run 4 Run 5 Run 6 Run 7
-----------------------------------------------------------------
32M seq 57554 55888 55898 55373 56628 55466 56340
32M 1 55132 55108 55118 55109 55129 55129 55127
32M 2 27855 27882 27886 27793 27847 27898 27886
32M 3 20052 18514 19598 18516 18604 20037 19846
32M 4 13990 13989 13997 14014 13878 14878 13912
-----------------------------------------------------------------
64M seq 110613 113153 112272 112494 111997 112042 114074
64M 1 110039 110060 110025 110059 110042 110043 110027
64M 2 55582 55572 55771 55642 55620 54017 55401
64M 3 37193 37079 36896 39464 36797 36855 36879
64M 4 28687 27670 27641 27989 27892 29947 29670
-----------------------------------------------------------------
128M seq 223830 221114 226464 228439 224009 223579 224229
128M 1 219907 219938 219931 219912 219917 219913 220055
128M 2 111108 111160 110978 111151 110973 110681 111162
128M 3 73943 73636 73723 74341 78368 74586 78434
128M 4 55627 58625 55239 56407 59402 55622 55232
-----------------------------------------------------------------
256M seq 448199 447639 448544 447489 452227 442963 443627
256M 1 439946 439657 439669 439669 439954 439776 439911
256M 2 226232 236317 221227 221153 221355 221173 222203
256M 3 148458 147751 147751 147918 147928 156770 147808
256M 4 117842 117915 110956 117778 111206 111822 111150
- Performance data based on median running time of 7 runs
N K T Spdup Eff | N K T Spdup Eff
--------------------------------+---------------------------------
32M seq 55898 | 128M seq 224009
32M 1 55127 1.014 1.014 | 128M 1 219917 1.019 1.019
32M 2 27882 2.005 1.002 | 128M 2 111108 2.016 1.008
32M 3 19598 2.852 0.951 | 128M 3 74341 3.013 1.004
32M 4 13990 3.996 0.999 | 128M 4 55627 4.027 1.007
--------------------------------+---------------------------------
64M seq 112272 | 256M seq 447639
64M 1 110042 1.020 1.020 | 256M 1 439776 1.018 1.018
64M 2 55582 2.020 1.010 | 256M 2 221355 2.022 1.011
64M 3 36896 3.043 1.014 | 256M 3 147918 3.026 1.009
64M 4 27989 4.011 1.003 | 256M 4 111822 4.003 1.001
- Cluster parallel program -- Class FindKeyClu -- 1 to 4 processes, each with 1 thread
- Raw running time data (msec)
N K Run 1 Run 2 Run 3 Run 4 Run 5 Run 6 Run 7
-----------------------------------------------------------------
32M seq 56093 55433 55370 56598 55677 57116 57020
32M 1 56063 55864 56534 56432 55866 56282 56238
32M 2 28629 28224 28486 28301 28358 28385 28486
32M 3 19273 19049 19527 19091 18901 19007 19182
32M 4 14512 14258 14283 14267 14169 14307 14381
-----------------------------------------------------------------
64M seq 111259 111406 113811 110616 112279 113315 113414
64M 1 112703 111003 111514 111164 111070 111114 111162
64M 2 56778 56018 56169 56606 57202 56040 55746
64M 3 38011 37852 38100 37942 38543 37876 37868
64M 4 28414 28288 28515 28278 28855 28879 28776
-----------------------------------------------------------------
128M seq 223911 226603 221868 221101 228810 224519 223992
128M 1 230116 225962 223623 221852 227974 224528 223135
128M 2 111652 110914 112747 113791 111568 112976 114307
128M 3 74977 76833 76514 75105 76936 76526 75448
128M 4 56364 56716 57856 57604 56099 56268 56511
-----------------------------------------------------------------
256M seq 446635 458383 448998 443373 447317 447569 452077
256M 1 447636 447304 449702 449063 447011 449932 447347
256M 2 223405 223451 225417 225791 226536 227557 224497
256M 3 150542 152181 150320 154115 153313 153034 151609
256M 4 112367 114819 112454 113711 114548 112209 112854
- Performance data based on median running time of 7 runs
N K T Spdup Eff | N K T Spdup Eff
--------------------------------+---------------------------------
32M seq 56093 | 128M seq 223992
32M 1 56238 0.997 0.997 | 128M 1 224528 0.998 0.998
32M 2 28385 1.976 0.988 | 128M 2 112747 1.987 0.993
32M 3 19091 2.938 0.979 | 128M 3 76514 2.927 0.976
32M 4 14283 3.927 0.982 | 128M 4 56511 3.964 0.991
--------------------------------+---------------------------------
64M seq 112279 | 256M seq 447569
64M 1 111162 1.010 1.010 | 256M 1 447636 1.000 1.000
64M 2 56169 1.999 0.999 | 256M 2 225417 1.986 0.993
64M 3 37942 2.959 0.986 | 256M 3 152181 2.941 0.980
64M 4 28515 3.938 0.984 | 256M 4 112854 3.966 0.991
- Hybrid SMP cluster parallel program -- Class FindKeyHyb -- 1 to 4 processes, each with 4 threads
- Raw running time data (msec)
N K Run 1 Run 2 Run 3 Run 4 Run 5 Run 6 Run 7
-----------------------------------------------------------------
32M seq 56310 56128 56463 56264 56297 56040 55573
32M 4 14200 14100 14781 14922 14873 13997 14036
32M 8 7539 7207 7333 7091 7484 7196 7129
32M 12 5029 5031 5058 4861 4765 5069 5049
32M 16 3855 3683 3790 3621 3727 4025 3678
-----------------------------------------------------------------
64M seq 111439 112733 112808 112144 111382 112339 110704
64M 4 27727 28071 27888 29518 27851 27803 28046
64M 8 14840 14007 14619 14822 14068 14741 14826
64M 12 9790 10128 9433 9842 9787 9899 10047
64M 16 7478 7470 7476 7146 7516 7428 7502
-----------------------------------------------------------------
128M seq 221286 224377 228848 222893 224856 225697 222933
128M 4 55762 55189 55828 58887 55336 55734 55534
128M 8 27851 29526 28043 28960 29603 29462 29371
128M 12 19910 19650 19239 19691 18703 19874 19768
128M 16 14720 15769 14002 15034 14833 14041 14849
-----------------------------------------------------------------
256M seq 451873 450527 452347 449579 454331 442091 448688
256M 4 110538 110776 117559 117530 110952 110463 111017
256M 8 55601 58826 55502 55651 55464 58885 55661
256M 12 37335 37196 37572 39284 39341 39518 39157
256M 16 29658 29492 27955 29479 29177 27864 29530
- Performance data based on median running time of 7 runs
N K T Spdup Eff | N K T Spdup Eff
---------------------------------+----------------------------------
32M seq 56264 | 128M seq 224377
32M 4 14200 3.962 0.991 | 128M 4 55734 4.026 1.006
32M 8 7207 7.807 0.976 | 128M 8 29371 7.639 0.955
32M 12 5031 11.183 0.932 | 128M 12 19691 11.395 0.950
32M 16 3727 15.096 0.944 | 128M 16 14833 15.127 0.945
---------------------------------+----------------------------------
64M seq 112144 | 256M seq 450527
64M 4 27888 4.021 1.005 | 256M 4 110952 4.061 1.015
64M 8 14741 7.608 0.951 | 256M 8 55651 8.096 1.012
64M 12 9842 11.394 0.950 | 256M 12 39157 11.506 0.959
64M 16 7476 15.001 0.938 | 256M 16 29479 15.283 0.955
Running Time Data -- Example 2
- Running time measurements on the tardis.cs.rit.edu machine
- 10 backend processors, each with two dual-core CPU chips
- N = Problem size = number of network nodes
- K = Number of processors
- T = Running time (msec) for computation only (not file I/O)
- SMP parallel program -- Class FloydSmpRow -- 1 process, 1 to 4 threads
- Raw running time data (msec)
N K Run 1 Run 2 Run 3 Run 4 Run 5 Run 6 Run 7
-----------------------------------------------------------------
2000 seq 97113 99085 98948 93671 94408 97140 96513
2000 1 67057 67670 64367 67563 67373 67526 65576
2000 2 34662 35782 36223 36414 36784 34158 37070
2000 3 24532 26125 27204 24710 23798 26082 25952
2000 4 21260 20669 21822 21593 21700 23701 20447
-----------------------------------------------------------------
2500 seq 186063 217191 217615 191590 189348 190165 180123
2500 1 131027 129526 126603 132528 127296 129003 133174
2500 2 69734 67110 69780 67836 68175 70753 68435
2500 3 47995 47582 48740 47389 53106 52267 49738
2500 4 47511 38973 38829 43564 41002 37614 38511
-----------------------------------------------------------------
3200 seq 394730 598482 379841 382651 382114 395578 392587
3200 1 274454 281809 274050 281935 272110 273204 269594
3200 2 146226 151452 146754 139883 150754 150534 147635
3200 3 109583 99601 103254 108057 104460 109318 109124
3200 4 96488 90686 80469 92404 92076 99219 97737
-----------------------------------------------------------------
4000 seq 731069 732866 730051 759480 745603 792375 767126
4000 1 532354 533930 546589 533816 537893 531977 531226
4000 2 272714 277347 290291 287493 298268 299514 283927
4000 3 212327 207476 200459 208415 202730 190989 203084
4000 4 164552 165455 152348 178772 169537 165960 189851
- Performance data based on median running time of 7 runs
N K T Spdup Eff | N K T Spdup Eff
---------------------------------+---------------------------------
2000 seq 97113 | 3200 seq 392587
2000 1 67373 1.441 1.441 | 3200 1 274050 1.433 1.433
2000 2 36223 2.681 1.340 | 3200 2 147635 2.659 1.330
2000 3 25952 3.742 1.247 | 3200 3 108057 3.633 1.211
2000 4 21593 4.497 1.124 | 3200 4 92404 4.249 1.062
---------------------------------+---------------------------------
2500 seq 190165 | 4000 seq 745603
2500 1 129526 1.468 1.468 | 4000 1 533816 1.397 1.397
2500 2 68435 2.779 1.389 | 4000 2 287493 2.593 1.297
2500 3 48740 3.902 1.301 | 4000 3 203084 3.671 1.224
2500 4 38973 4.879 1.220 | 4000 4 165960 4.493 1.123
- Cluster parallel program -- Class FloydClu -- 1 to 4 processes, each with 1 thread
- Raw running time data (msec)
N K Run 1 Run 2 Run 3 Run 4 Run 5 Run 6 Run 7
-----------------------------------------------------------------
2000 seq 95829 93626 97073 119715 93656 93963 96778
2000 1 76081 76612 76153 76761 76043 76131 120483
2000 2 38891 38608 38782 39367 39265 39194 38972
2000 3 26409 26409 26377 25965 26470 26426 26445
2000 4 20069 22724 19723 20111 19732 20167 20060
-----------------------------------------------------------------
2500 seq 188945 278582 181498 192093 186893 271585 192097
2500 1 148169 148301 148775 147408 148472 148149 146089
2500 2 75189 73653 74360 75181 74290 74998 74843
2500 3 50756 50044 50605 50554 50481 50308 50350
2500 4 38169 38160 38141 38128 38270 38034 38119
-----------------------------------------------------------------
3200 seq 379428 395100 378885 379657 395567 391628 384953
3200 1 307701 303635 303056 305669 307553 305448 307532
3200 2 153887 155461 154936 154351 154607 247483 154256
3200 3 103526 104033 101934 103090 102651 103705 101656
3200 4 77603 78591 79202 77175 77797 78921 78390
-----------------------------------------------------------------
4000 seq 730375 767507 756481 905333 753473 760107 770087
4000 1 583924 579399 586376 941518 584101 584242 588733
4000 2 300765 298561 299811 296489 293777 297021 292867
4000 3 198491 199893 197299 199329 198110 201843 195711
4000 4 149874 149214 150304 150631 150831 150422 151087
- Performance data based on median running time of 7 runs
N K T Spdup Eff | N K T Spdup Eff
---------------------------------+---------------------------------
2000 seq 95829 | 3200 seq 384953
2000 1 76153 1.258 1.258 | 3200 1 305669 1.259 1.259
2000 2 38972 2.459 1.229 | 3200 2 154607 2.490 1.245
2000 3 26409 3.629 1.210 | 3200 3 103090 3.734 1.245
2000 4 20069 4.775 1.194 | 3200 4 78390 4.911 1.228
---------------------------------+---------------------------------
2500 seq 192093 | 4000 seq 760107
2500 1 148169 1.296 1.296 | 4000 1 584242 1.301 1.301
2500 2 74843 2.567 1.283 | 4000 2 297021 2.559 1.280
2500 3 50481 3.805 1.268 | 4000 3 198491 3.829 1.276
2500 4 38141 5.036 1.259 | 4000 4 150422 5.053 1.263
- Hybrid SMP cluster parallel program -- Class FloydHyb -- 1 to 4 processes, each with 4 threads
- Raw running time data (msec)
N K Run 1 Run 2 Run 3 Run 4 Run 5 Run 6 Run 7
-----------------------------------------------------------------
2000 seq 95376 97130 96306 96309 96578 95076 98759
2000 4 23503 21650 20597 22375 24933 21825 20539
2000 8 12369 12867 11351 12885 13802 12850 12954
2000 12 9874 9600 10517 10592 9814 9984 10092
2000 16 7817 7780 7754 7947 7775 7656 7888
-----------------------------------------------------------------
2500 seq 278491 188900 190663 189832 192424 189394 188200
2500 4 42337 43052 41499 46714 36733 46795 44571
2500 8 26495 24024 24359 24170 23485 25513 26596
2500 12 18453 18362 18426 18293 18639 18341 18433
2500 16 14597 14538 14780 14839 14396 14758 14661
-----------------------------------------------------------------
3200 seq 382824 394215 381224 382087 393131 394416 380780
3200 4 80895 95277 89185 96307 87662 93239 91036
3200 8 44654 56366 49481 49506 50378 49746 46708
3200 12 37511 37625 37959 31491 37458 38274 39091
3200 16 30133 29728 27584 29385 28692 30015 29608
-----------------------------------------------------------------
4000 seq 751262 768429 741521 760933 761249 733081 767620
4000 4 175165 189191 192225 182727 173654 171494 165932
4000 8 102282 111787 96570 93923 89797 111533 106312
4000 12 70304 73596 65165 75121 64560 67690 76289
4000 16 54833 56919 55561 54969 55687 56324 56447
- Performance data based on median running time of 7 runs
N K T Spdup Eff | N K T Spdup Eff
----------------------------------+----------------------------------
2000 seq 96309 | 3200 seq 382824
2000 4 21825 4.413 1.103 | 3200 4 91036 4.205 1.051
2000 8 12867 7.485 0.936 | 3200 8 49506 7.733 0.967
2000 12 9984 9.646 0.804 | 3200 12 37625 10.175 0.848
2000 16 7780 12.379 0.774 | 3200 16 29608 12.930 0.808
----------------------------------+----------------------------------
2500 seq 189832 | 4000 seq 760933
2500 4 43052 4.409 1.102 | 4000 4 175165 4.344 1.086
2500 8 24359 7.793 0.974 | 4000 8 102282 7.440 0.930
2500 12 18426 10.302 0.859 | 4000 12 70304 10.823 0.902
2500 16 14661 12.948 0.809 | 4000 16 55687 13.664 0.854
|
Alan Kaminsky
|
|
•
|
|
Department of Computer Science
|
|
•
|
|
Rochester Institute of Technology
|
|
•
|
|
4486 +
1980 =
6466
|
|
Home Page
|
Copyright © 2007 Alan Kaminsky.
All rights reserved.
Last updated 24-Mar-2007.
Please send comments to ark@cs.rit.edu.
|