Alan Kaminsky Department of Computer Science Rochester Institute of Technology 4486 + 2220 = 6706
Home Page
Operating Systems I 4003-440-02 Winter Quarter 2012
Course Page

4003-440-02 Operating Systems I
Module 10. File Systems and Disks -- Lecture Notes

Prof. Alan Kaminsky -- Winter Quarter 2012
Rochester Institute of Technology -- Department of Computer Science


The Linux Ext2 File System

  • The disk is allocated in units of fixed-size disk blocks
    • Disk block size on my Linux system = 4 KB (4,096 bytes); 1 KB and 2 KB are also used
       
  • Disk block number -- disk blocks are numbered starting at 0
     
  • The disk blocks are grouped into disk block groups
    • One disk block group = 32,768 disk blocks
       
  • Each disk block group is laid out as follows:
    • Block bitmap (one block) -- Indicates whether each block in the block area is free or in use
    • Inode bitmap (one block) -- Indicates whether each inode in the inode table is free or in use
    • Inode table (512 blocks) -- Each in-use inode contains metadata about one file or directory
    • Block area (32,254 blocks) -- Each in-use block contains a portion of one file's or directory's contents
       
  • The very first disk block group has a different layout and contains the superblock
    • The superblock (disk block number 0) contains metadata about the whole file system
       
  • An inode is a data structure containing metadata about one file or directory
    • Type -- regular file, directory, symbolic link, device special file, . . .
    • Owner
    • Group membership
    • Access permissions
    • Number of links (directory entries that refer to the inode)
    • Creation, access, and modification times
    • Size in bytes
    • Pointers (disk block numbers) to the disk blocks containing the file's contents
       
  • Inode number -- inodes are numbered starting at 0
    • Each disk block group has a chunk of 16,384 inodes in its inode table -- some free, some in use according to the inode bitmap
    • Thus, inode numbers 0 through 16,383 are in disk block group 0; inode numbers 16,384 through 32,767 are in disk block group 1; and so on
       
  • Disk block numbers in a file's inode
    • The inode contains 15 pointers (disk block numbers), four bytes each
    • The first 12 are direct pointers to the first 12 blocks of the file
    • The 13th is a single indirect pointer; it points to a disk block which contains pointers to the next 1,024 blocks of the file
    • The 14th is a double indirect pointer; it points to a disk block which contains 1,024 single indirect pointers; each of these points to a disk block which contains 1,024 direct pointers to blocks of the file
    • The 15th is a triple indirect pointer; it points to a disk block which contains 1,024 double indirect pointers; each of these points to a disk block which contains 1,024 single indirect pointers; each of these points to a disk block which contains 1,024 direct pointers to blocks of the file
    • Thus, the earliest bytes of a file can be accessed most quickly
    • This scheme works well if most files are short
       
  • Directory -- a special kind of file containing the names of other files and directories
    • Each directory is represented by an inode with type = "directory"
    • The root directory of the whole filesystem is inode 2
       
  • Directory entries
    • Each directory's contents consists of a sequence of directory entries
    • Directory entries are allocated dynamically in chunks of four bytes
    • Each directory entry corresponds to one file
    • Each directory entry consists of:
      • The inode number of the file (4-byte field)
      • The size of the directory entry in bytes (2-byte field) (used for allocation and deallocation)
      • The size of the file name in bytes (2-byte field)
      • The file name itself
      • (Possibly, unused extra bytes)
    • Normally, user programs cannot read or write a directory's contents directly; system calls are provided to manipulate the directory entries
       
  • Hierarchical directories
    • An entry in a directory can refer to another directory
    • Path names
       
  • Links
    • More than one directory entry can refer to the same inode
    • Hard link -- refers to a specific inode number
    • Soft link or symbolic link -- refers to a certain pathname, regardless of which inode number it is, and regardless of whether such a file exists
       
  • An interesting command (if you're the superuser)
    • debugfs /dev/hda2


Disk Organization

  • Disk hardware
    • Spindle
    • Platters, surfaces
    • Arm assembly, arms, read/write heads
    • Disk controller (in disk drive)
    • Device controller (in computer)
       
  • A disk has three-dimensional structure, so locating a piece of data requires three coordinates -- disk geometry
    • Cylinder -- in/out along the radius of the platters
    • Surface -- up/down among the platters
    • Sector -- clockwise/counterclockwise along the circumference of the platters
    • Track -- the intersection of a cylinder with a surface
    • Head -- the same as a surface, one head for each surface
       
  • Geometry of a 1.44-MB 3.5-inch floppy disk
    • 80 cylinders
    • 2 surfaces (double-sided)
    • 18 sectors per track
    • 512 bytes per sector
    • Capacity = 80 x 2 x 18 x 512 bytes = 1,474,560 bytes = 1,440 KB
       
  • Dissection of a 3.5-inch floppy disk and drive
     
  • Geometry of my laptop's 60-GB hard disk
    • 7,296 cylinders
    • 255 surfaces
    • 63 sectors per track
    • 512 bytes per sector
    • Capacity = 7,296 x 255 x 63 x 512 bytes = 60,011,642,880 bytes
       
  • The above is the disk's logical geometry as presented by the disk controller to the computer
    • A convenient fiction, necessary due to BIOS limitations
       
  • The disk's true physical geometry is more complicated and is hidden by the disk controller
    • Tens of thousands of cylinders
    • A few platters (surfaces) -- typically 1 to 5, maybe a dozen for a high-end server disk
    • Many hundreds of sectors per track -- perhaps 400 to 800
    • More sectors per track on the outer cylinders, where the tracks are physically longer -- zoned bit recording


Disk Data Access

  • Accessing a sector's worth of data requires three chunks of time:
     
  • Seek time -- Time to move the disk arm from its present cylinder to the desired cylinder
    • Average seek time -- from one randomly chosen cylinder to another -- typically about 8 to 10 msec
    • Track-to-track seek time -- from one cylinder (track) to an adjacent one -- typically about 1 msec
    • Full stroke seek time -- from innermost cylinder to outermost -- typically about 15 to 20 msec
       
  • Rotational latency time -- Time until the platter rotates the desired sector under the read/write head
    • Depends on the spindle speed -- 4,200 to 15,000 rotations per minute (rpm), typically 7,200 rpm
    • 7,200 rotations per minute = 120 rotations per second = 8.33 msec per rotation
    • Average rotational latency = one-half rotation = 4.17 msec
       
  • Data transfer time -- Time for the desired sector to rotate past the read/write head
    • Depends on the spindle speed
    • Also depends on the number of sectors per track, for this particular cylinder
       


Disk Head Scheduling

  • Goal: Reduce/minimize disk access time
    • First, reduce/minimize seek time
    • Then, reduce/minimize rotational latency time
       
  • Techniques for reducing seek time
     
  • Queue of sectors (cylinders) to be accessed
     
  • Disk head scheduling algorithms
    • First Come First Served, or First In First Out (FIFO)
    • Shortest Seek Time First (SSTF) -- can experience starvation
    • SCAN -- elevator algorithm
    • Circular SCAN (C-SCAN)
    • LOOK
    • Circular LOOK (C-LOOK)
       
  • Techniques for reducing rotational latency
    • Allocate files in chunks of contiguous sectors; read multiple sectors in one disk request
    • Start reading sectors in the middle of a request; don't wait for the first sector to rotate under the disk head
       
  • Older disk controllers can only service one request at a time
    • The operating system must perform the disk head scheduling algorithm
       
  • Newer disk controllers can handle multiple outstanding requests
    • Now the disk controller can keep its own queue of disk requests and perform the disk head scheduling algorithm
    • This is called Native Command Queuing (NCQ)
    • The disk controller can usually do a better job of disk head scheduling than the operating system can


Disk Head Scheduling Algorithm Simulations

  • Simulator structure
    • Disk model object
    • Disk request objects
    • Requestor objects (generators of disk requests)
    • Scheduler object
    • Simulation object
       
  • Common classes and abstract base classes for pluggable modules
  • Implementation class for a HD floppy disk
  • Implementation class for a source of disk requests (a process)
  • Implementation class for the FIFO disk head scheduling algorithm
  • Implementation class for the SSTF disk head scheduling algorithm
  • Simulator inputs
    • Disk head scheduling algorithm class name
    • Pseudorandom number generator seed
    • Number of requestors
    • Number of requests generated by each requestor
       
  • Simulator outputs
    • Total simulation time
    • Disk request service time statistics
      • Minimum
      • 10th percentile
      • 50th percentile (median)
      • 90th percentile
      • Maximum
    • Plot of disk request service time distribution
    • Plot of disk head position versus time
       
  • FIFO simulation
    $ java edu.rit.os1.disk.DiskSimulator edu.rit.os1.disk.FifoScheduler \
      142857 10 100 > fifo.txt
    Simulation time = 206.300000000 sec
    Service time distribution (sec):
    Minimum = 0.188888889
    10 %ile = 1.722222222
    50 %ile = 2.066666667
    90 %ile = 2.377777778
    Maximum = 2.777777778
    
    Service time distribution
    Disk head position versus time
     
  • SSTF simulation
    $ java edu.rit.os1.disk.DiskSimulator edu.rit.os1.disk.SstfScheduler \
      142857 10 100 > sstf.txt
    Simulation time = 141.500000000 sec
    Service time distribution (sec):
    Minimum = 0.011111111
    10 %ile = 0.188888889
    50 %ile = 1.011111111
    90 %ile = 3.011111111
    Maximum = 7.688888889
    
    Service time distribution
    Disk head position versus time
     
  • SCAN simulation
    $ java edu.rit.os1.disk.DiskSimulator ScanScheduler 142857 10 100 > scan.txt
    Simulation time = 144.300000000 sec
    Service time distribution (sec):
    Minimum = 0.000000000
    10 %ile = 0.133333333
    50 %ile = 1.155555556
    90 %ile = 2.677777778
    Maximum = 4.900000000
    
    Service time distribution
    Disk head position versus time
     
  • C-SCAN simulation
    $ java edu.rit.os1.disk.DiskSimulator CScanScheduler 142857 10 100 > cscan.txt
    Simulation time = 149.700000000 sec
    Service time distribution (sec):
    Minimum = 0.011111111
    10 %ile = 0.244444444
    50 %ile = 1.300000000
    90 %ile = 2.577777778
    Maximum = 3.600000000
    
    Service time distribution
    Disk head position versus time
     
  • LOOK simulation
    $ java edu.rit.os1.disk.DiskSimulator LookScheduler 142857 10 100 > look.txt
    Simulation time = 141.700000000 sec
    Service time distribution (sec):
    Minimum = 0.011111111
    10 %ile = 0.255555556
    50 %ile = 1.222222222
    90 %ile = 2.688888889
    Maximum = 4.900000000
    
    Service time distribution
    Disk head position versus time
     
  • C-LOOK simulation
    $ java edu.rit.os1.disk.DiskSimulator CLookScheduler 142857 10 100 > clook.txt
    Simulation time = 144.900000000 sec
    Service time distribution (sec):
    Minimum = 0.011111111
    10 %ile = 0.255555556
    50 %ile = 1.288888889
    90 %ile = 2.533333333
    Maximum = 7.622222222
    
    Service time distribution
    Disk head position versus time
     
  • (Note: Classes ScanScheduler, CScanScheduler, LookScheduler, and CLookScheduler are not part of the Computer Science Course Library)
     
  • Summary -- for a HD floppy disk drive:
                       FIFO    SSTF    SCAN  C-SCAN    LOOK  C-LOOK
                     ------  ------  ------  ------  ------  ------
    Simulation time  206.30  141.50  144.30  149.70  141.70  144.90
    Service time:
    Minimum            0.19    0.01    0.00    0.01    0.01    0.01
    10 %ile            1.72    0.19    0.13    0.24    0.26    0.26
    50 %ile            2.07    1.01    1.15    1.30    1.22    1.29
    90 %ile            2.38    3.01    2.67    2.58    2.69    2.53
    Maximum            2.78    7.69    4.90    3.60    4.90    7.62
    

Operating Systems I 4003-440-02 Winter Quarter 2012
Course Page
Alan Kaminsky Department of Computer Science Rochester Institute of Technology 4486 + 2220 = 6706
Home Page
Copyright © 2008 Alan Kaminsky. All rights reserved. Last updated 28-Oct-2008. Please send comments to ark­@­cs.rit.edu.