Parallel Java
An API for Developing Parallel Programs in 100% Java
Lecture Notes
Prof. Alan Kaminsky
Department of Computer Science
Rochester Institute of Technology
Presented to the RIT Research Computing Group
November 9, 2006
Overview
Modern Parallel Computing
- Parallel computer architectures:
- Shared memory multiprocessor (SMP) parallel computers
- Even consumer desktop machines are starting to look like this
- Cluster parallel computers
- Soon you won't even be able to build these any more
- Hybrid SMP cluster parallel computers
- Soon all clusters will look like this
- The RIT CS Department's parallel computers:
- SMP parallel computers
- paradise.cs.rit.edu -- Four Sun UltraSPARC-IV dual-core CPUs, eight processors, 1.35 GHz clock, 16 GB main memory
- parasite.cs.rit.edu -- Four Sun UltraSPARC-IV dual-core CPUs, eight processors, 1.35 GHz clock, 16 GB main memory
- paradox.cs.rit.edu -- Four Sun UltraSPARC-II CPUs, four processors, 450 MHz clock, 4 GB main memory
- paragon.cs.rit.edu -- Four Sun UltraSPARC-II CPUs, four processors, 450 MHz clock, 4 GB main memory
- Cluster parallel computer
- Frontend computer -- paranoia.cs.rit.edu -- UltraSPARC-II CPU, 296 MHz clock, 192 MB main memory
- 32 backend computers -- thug01 through thug32 -- each an UltraSPARC-IIi CPU, 440 MHz clock, 256 MB main memory
- 100-Mbps switched Ethernet backend interconnection network
- Aggregate 14 GHz clock, 8 GB main memory
- Hybrid SMP cluster parallel computer
- Under construction
- 10 backend computers, each a 4-CPU SMP machine -- 40 CPUs
- 1-Gbps switched Ethernet backend interconnection network
- Standard middleware for SMP parallel programming: OpenMP
- Standard middleware for cluster parallel programming: Message Passing Interface (MPI)
SMP Parallel Programming in Java with OpenMP
- Monte Carlo technique for computing an approximate value of pi
- The area of the unit square is 1
- The area of the circle quadrant is pi/4
- Generate a large number of points at random in the unit square
- Count how many of them fall within the circle quadrant, i.e. distance from origin <= 1
- The fraction of the points within the circle quadrant gives an approximation for pi/4
- 4 x this fraction gives an approximation for pi
- Sequential program
- Package edu.rit.openmp.monte
- Java version of OpenMP: JOMP
- JOMP parallel program
- Package edu.rit.openmp.monte
- Process for using OpenMP: Precompiler
Criticisms
- OpenMP and MPI were designed for use with Fortran and C, not Java
- OpenMP and MPI are not object oriented
- Existing forays into parallel programming middleware in Java leave much to be desired
- There are a couple Java versions of MPI, but each one is just a thin veneer on top of the non-object oriented MPI API and does not smell like a Java API
- JOMP mimics OpenMP, but the precompiler directive approach feels unnatural to Java programmers
- JOMP is alpha quality software, buggy, no longer maintained, and no source code is released
- JOMP and mpiJava do not play well together
- MPI and threads do not play well together
- Parallel programs using MPI, when run on a hybrid SMP cluster, do not take full advantage of the SMP machines' capabilities
- You want to use threading within each SMP machine and message passing between SMP machines
- But MPI programs use message passing for everything, even between processors of the same SMP machine -- performance penalty
- I am not aware of any middleware standard that encompasses SMP, cluster, and hybrid SMP cluster parallel programming
Parallel Java (PJ)
SMP Parallel Programming with PJ
- Monte Carlo calculation of pi -- sequential program
- Monte Carlo calculation of pi -- PJ parallel program
- Demonstrations
- Running time measurements on the parasite.cs.rit.edu machine (30-Sep-2005)
- Speedup calculations
- Tests done by Luke McOmber in the Spring 2005 quarter show that Java/PJ programs' performance equals or exceeds equivalent C/OpenMP programs' performance
Cluster Parallel Programming with PJ
- Monte Carlo calculation of pi -- sequential program
- Monte Carlo calculation of pi -- PJ parallel program
- Demonstrations
- Running time measurements on the paranoia.cs.rit.edu cluster (09-Jun-2006)
- Sizeup calculations
Status
- Shared memory parallel programming API complete
- Message passing parallel programming API nearly complete
- Some of the more esoteric collective communication operations are not implemented
- Major redesign of the cluster parallel programming capabilities recently completed
- Message passing classes redesigned to improve performance
- Cluster middleware redesigned to make it easier for the user to run cluster parallel programs
- Web interface for viewing cluster status and job queue
- I use PJ in my Parallel Computing I class
- I presented a "work in progress" poster on PJ at the SIGCSE 2006 Conference
- Poster (PDF, 188,290 bytes, 36" x 24")
- I started writing a parallel programming textbook using Java and PJ
Future Plans
- Continue work on the PJ API
- Continue improving performance of the message passing classes
- Implement remaining collective communication operations
- Expand the web frontend to include job submission and job control
- Continue teaching the Parallel Computing I class with PJ
- Finish writing the textbook and publish it
- Accumulate more performance measurements
- Work on solving scientific computing problems with PJ programs
- Computational medicine: MRI spin relaxometry -- done with mpiJava, need to redo with PJ
- Computational biology: Maximum parsimony phylogenetic tree construction
|
Alan Kaminsky
|
|
•
|
|
Department of Computer Science
|
|
•
|
|
Rochester Institute of Technology
|
|
•
|
|
4484 +
1892 =
6376
|
|
Home Page
|
Copyright © 2006 Alan Kaminsky.
All rights reserved.
Last updated 07-Nov-2006.
Please send comments to ark@cs.rit.edu.
|