4003-440-02 Operating Systems I
Module 11. Parallel and Distributed Systems -- Lecture Notes
Prof. Alan Kaminsky -- Winter Quarter 2012
Rochester Institute of Technology -- Department of Computer Science
Little and Big
| |
Little Data |
Big Data |
| Little CPU |
Word processor Spreadsheet |
Google PageRank Facebook ad targeting |
| Big CPU |
Cryptographic attacks NP-hard problems |
Climate modeling Nuclear explosion modeling |
An Example Problem
- A toxic spill contaminates the soil
 |
|
 |
|
 |
| Initial spill |
|
The pollutant starts diffusing |
|
After a longer time |
- Partial differential equation: ∂u/∂t = D (∂2u/∂x2 + ∂2u/∂y2)
- Computation
- Given diffusion coefficients D(x,y) . . .
- And given initial pollutant concentration u(x,y) . . .
- Compute pollutant concentration as a function of time . . .
- On an N×N spatial grid (N might be several thousand) . . .
- For many, many small time steps (thousands or millions) . . .
- And make a movie to visualize the results
- Class ToxicSpillSeq
(download)
SMP Parallel Programming
Amdahl's Law
- There is a limit on the amount of speedup you can get with strong scaling
- Sequential portion and parallelizable portion of a program
- A parallel program with running time T(N,1) and sequential fraction F executing with different numbers of processors K
- Amdahl's Law
- Running time T(N,K) = F⋅T(N,1) + (1/K)⋅(1 − F)⋅T(N,1)
- Speedup(N,K) = 1/(F + (1 − F)/K)
- Efficiency(N,K) = 1/(K⋅F + 1 − F)
where
- N = Problem size
- K = Number of processors
- F = Sequential fraction
- T(N,1) = Sequential program running time on one processor
- T(N,K) = Parallel program running time on K processors
- Limits on speedup
- As K goes to infinity:
- Speedup goes to 1/F
- Efficiency goes to 0
Cluster Parallel Programming
Hybrid SMP Cluster Parallel Programming
GPU Parallel Programming
The Top 500 Supercomputers
- Processing speed
- 1 flops = 1 floating point operation per second
- 1 gigaflops = 109 flops
- 1 teraflops = 1012 flops
- 1 petaflops = 1015 flops
- 1 exaflops = 1018 flops
- Memory capacity
- 1 gigabyte = 109 bytes
- 1 terabyte = 1012 byte
- 1 petabyte = 1015 byte
- 1 exabyte = 1018 byte
- Top500 List: http://www.top500.org/
- Measures processing speed on the LINPACK linear algebra benchmark
- CPU intensive benchmark
- Number One supercomputer: Titan (November 2012)
- Location: Oak Ridge National Laboratory, USA
- Manufacturer: Cray
- Model: XK7
- CPU: Opteron 6274 16C
- Clock speed: 2.2 GHz
- Cores: 560,640
- Processing speed: 17.5 petaflops
- RAM: 0.71 petabytes
- Accelerators: 261,632 NVIDIA K20x
- Number Two supercomputer: Sequoia (November 2012)
- Location: Lawrence Livermore National Laboratory, USA
- Manufacturer: IBM
- Model: BlueGene/Q
- CPU: Power BQC 16C
- Clock speed: 1.6 GHz
- Cores: 1,572,864
- Processing speed: 16.3 petaflops
- RAM: 1.57 petabytes
- Accelerators: None
- Number Three supercomputer: K Computer (November 2012)
- Location: RIKEN Advanced Institute for Computational Science, Japan
- Manufacturer: Fujitsu
- CPU: SPARC64 CIIIfx
- Clock speed: 2.0 GHz
- Cores: 705,024
- Processing speed: 10.5 petaflops
- RAM: 1.41 petabytes
- Accelerators: None
- Graph500 List: http://www.graph500.org/
- Measures processing speed on selected graph algorithms
- Data intensive benchmark
- Green500 List: http://www.green500.org/
- Uses the Top500 List data
- Ranks computers based on energy efficiency -- performance per watt
Distributed Object Systems
- A normal method call
- Calling object → Method (arguments) → Target object
- Calling object ← Return value ← Target object
- or -
- Calling object ← Thrown exception ← Target object
- A remote method call
- Calling object → Method (arguments) → Client-side proxy object
- Client-side proxy object → Network message → Server-side proxy object
- Server-side proxy object → Method (arguments) → Target object
- Server-side proxy object ← Return value ← Target object
- Client-side proxy object ← Network message ← Server-side proxy object
- Calling object ← Return value ← Client-side proxy object
- or -
- Server-side proxy object ← Thrown exception ← Target object
- Client-side proxy object ← Network message ← Server-side proxy object
- Calling object ← Thrown exception ← Client-side proxy object
- Another possibility
- Calling object → Method (arguments) → Client-side proxy object
- Client-side proxy object → Network message → Server-side proxy object
- Client-side proxy object ← Network message ← Server-side proxy object
- Calling object ← RemoteException ← Client-side proxy object
- Possible reasons:
- The client → server message was lost
- The server → client message was lost
- The server crashed in the middle of processing the message
- Still another possibility
- Calling object → Method (arguments) → Client-side proxy object
- Calling object ← RemoteException ← Client-side proxy object
- Possible reasons:
- The server is down, so the client can't set up a connection
- Terminology
- Calling object
- Target object
- Remote object
- A target object that can be called from other processes
- Remote interface
- An interface implemented by a remote object containing the methods that can be called from other processes
- Non-remote interface
- An interface implemented by a remote object containing the methods that cannot be called from other processes
- Proxy object
- Client-side proxy object; stub
- Implements the same remote interface(s) as the remote object
- Does not implement the remote object's non-remote interfaces
- Server-side proxy object; skeleton
- Remote reference
- A reference to a client-side proxy object for a remote object
- Remote exception
- An exception thrown by the RMI system itself, not by the target object
- Marshalling
- Unmarshalling
- Distributed object middleware; protocols
- Common Object Request Broker Architecture (CORBA); IIOP/TCP/IP
- Java Remote Method Invocation (RMI); JRMP/TCP/IP
- Web services; SOAP/HTTP/TCP/IP, JSON
- Web Services Example
Distributed Hash Tables
- Normal (non-distributed) hash table
- Data item = (key, value)
- All data items stored in a table
- Data item's location in table computed from hash of key
- Operations: Add data item, remove data item, query key
- Distributed hash table (DHT)
- Data item = (key, value)
- Table of data items spread out across multiple nodes
- Data item's location (node) computed from hash of key
- Operations: Add data item, remove data item, query key -- requires routing from originating node to destination node
- Additional operations: Add node, remove node
- Chord
- I. Stoica, R. Morris, D. Liben-Nowell, D. R. Karger, F. Kaashoek, F. Dabek, and H. Balakrishnan. Chord: a scalable peer-to-peer lookup protocol for internet applications. IEEE/ACM Transactions on Networking, 11(1):17-32, February 2003.
- Chord is just a DHT algorithm, not a complete file sharing application
- Keys and hashing
- Each node has a key which is the hash of the node's identifier (e.g. IP address)
- Each data item (file) has a key which is the hash of the file's identifier
- Any hash function can be used
- The Chord paper uses SHA-1 (for hashing, not for security)
- The Chord ring
- The N different hash values are conceptually arranged in a circle, representing modulo N
- For SHA-1, which yields a 160-bit hash value, N = 2160
- Key storage
- Each node "lives" at the location on the ring equal to the node's key (hash)
- Each data item is stored in the node whose key is the smallest value greater than or equal to the data item key, modulo N
Stoica et al., op. cit.
- Key lookup
- Each node has a finger table giving the nodes that store keys K+1, K+2, K+4, K+8, . . ., where K is the node's own key
- To find a key, find the node in the finger table that is closest to the desired key without being greater than the desired key, and forward the query to that node
- With O(log N) messages, the query will reach the node with the desired key
Stoica et al., op. cit.
- Other DHT algorithms:
- CAN (Content Addressable Network)
-
S. Ratnasamy, P. Francis, M. Handley, R. Karp, and S. Shenker.
A scalable content-addressable network.
Proceedings of the 2001 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications (SIGCOMM 2001),
August 2001, pages 161-172.
- Kademlia
-
P. Maymounkov and D. Mazières.
Kademlia: A peer-to-peer information system based on the XOR metric.
First International Workshop on Peer-to-Peer Systems (IPTPS 2002),
LNCS 2429/2002, March 2002, pages 53-65.
BitTorrent
- BitTorrent Protocol Specification. http://www.bittorrent.org/beps/bep_0003.html
- Index of BitTorrent Enhancement Proposals. http://www.bittorrent.org/beps/bep_0000.html
- The origin peer, who wants to make a file available:
- Splits the file into pieces
- Computes the SHA-1 hash of each piece
- Sets up a seed file (.torrent file)
- Name of the file
- Length and hash of each piece
- URL of the tracker
- Publishes the seed file somewhere, e.g. a web site
- The tracker is a central server
- Keeps track of which peers have which pieces of the file
- The swarm peers, who want to download the file:
- Gets the seed file
- Contacts the tracker
- Tells the tracker to add itself to the swarm
- Gets other peers in the swarm from the tracker
- Downloads pieces of the file from peers
- Notifies the tracker which pieces it has downloaded
- Uploads those pieces to other peers
- Trackerless BitTorrent
- Torrent information is stored in a distributed hash table (DHT) instead of a central tracker
- The Kademlia DHT is used
- The DHT key is the SHA-1 hash of the piece of the file
- BitTorrent DHT Protocol. http://www.bittorrent.org/beps/bep_0005.html
- In theory, as time goes on, the peers download more and more from each other and less and less from the origin peer
- Pros
- Eliminates big server with enormous bandwidth, needed to download entire file to every user
- Faster downloads due to multiple simultaneous connections with different peers
- Cons
- For the system to work well, peers must upload as well as download file pieces
- Leecher peers that only download and never upload can degrade the system
- To combat leechers, detect peers that don't upload enough and refuse to download to them (tit for tat)
- But this unfairly penalizes users with limited upload bandwidth (like home users)
- There is little to no security against malicious peers
Map-Reduce Systems
|
Operating Systems I
|
|
•
|
|
4003-440-02
|
|
•
|
|
Winter Quarter 2012
|
|
Course Page
|
|
Alan Kaminsky
|
|
•
|
|
Department of Computer Science
|
|
•
|
|
Rochester Institute of Technology
|
|
•
|
|
4486 +
2220 =
6706
|
|
Home Page
|
Copyright © 2013 Alan Kaminsky.
All rights reserved.
Last updated 13-Feb-2013.
Please send comments to ark@cs.rit.edu.
|