Alan Kaminsky Department of Computer Science Rochester Institute of Technology 4486 + 2220 = 6706
Home Page

Introductory CUDA Programming Seminar
Lecture Notes

Prof. Alan Kaminsky
Rochester Institute of Technology -- Department of Computer Science


Prerequisites and Outcomes

Prerequisites -- The student must be familiar with:

Outcomes -- The student will:


Agenda


The Problem of Speed


Acronyms


clairaut.cs.rit.edu


CUDA Programming Model -- Threading


CUDA Programming Model -- Memory Hierarchy


Stream Programming


Introductory Example -- Vector Outer Product


Programming Exercise 1

  1. Log into the guest account on clairaut.
    $ ssh guest@clairaut.cs.rit.edu
    

  2. Important: Create your own subdirectory in which to do your work. Name it using your initials plus four random digits; e.g. mine might be called "ark0114".
    $ mkdir ark0114
    

  3. Important: Change into your own subdirectory.
    $ cd ark0114
    

  4. Make copies of the example source files and makefile. (Note the period at the end of the command; in Unix, this refers to the current directory.)
    $ cp ../examples/* .
    

  5. Compile the example programs.
    $ make
    

  6. Try running the vector outer product programs with various arguments.
    $ ./OuterProductCPU 142857 1024
    $ ./OuterProductGPU 142857 1024
    $ ./OuterProductCPU 142857 2048
    $ ./OuterProductGPU 142857 2048
    

  7. Modify the example programs so that each element of the outer product matrix is calculated using the formula below. Note: sqrtf(x) returns the square root of float variable x. Change the names of the modified programs; add the modified programs to the Makefile.

    Cij = (Ai2 + Bj2)1/2, 0 ≤ i ≤ N−1, 0 ≤ j ≤ N−1

  8. Compile and run the modified programs with various arguments. Are they getting the right answers? How do the running times of the CPU and GPU versions compare?


Parallel Reduction


Compute Capabilities


Reduction Example -- Vector Angle


Programming Exercise 2

  1. If you haven't done Steps 1–5 of Programming Exercise 1, do them now.

  2. Try running the vector angle programs with various arguments.
    $ ./VectorAngleCPU 142857 1000000
    $ ./VectorAngleGPU 142857 1000000
    $ ./VectorAngleCPU 142857 10000000
    $ ./VectorAngleGPU 142857 10000000
    

  3. Modify the original vector outer product programs to calculate the norm of the matrix C, where C is the outer product of the vectors A and B. The norm of the matrix, |C|, is calculated with the formula below. Note: sqrtf(x) returns the square root of float variable x. Change the names of the modified programs; add the modified programs to the Makefile.

    N−1 N−1
    |C| = (Σ Σ Cij2)1/2
    i=0 j=0

  4. Compile and run the modified programs with various arguments. Are they getting the right answers? How do the running times of the CPU and GPU versions compare?


Another Reduction Example -- Statistical Test on a Block Cipher


Programming Exercise 3

  1. If you haven't done Steps 1–5 of Programming Exercise 1, do them now.

  2. Try running the statistical test program with various arguments.
    $ ./PresentStatTest 0000000000000000 0123456789abcdef 0123 10 10
    $ ./PresentStatTest 0000000000000000 0123456789abcdef 0123 20 20
    

  3. Does PRESENT-80 still look random with different plaintexts and different keys?

  4. Modify the PRESENT-80 statistical test program to introduce a fault in the cipher algorithm by inverting one or more bits in the ciphertext. Does the statistical test program detect the fault? Why or why not?

  5. Modify the PRESENT-80 statistical test program to introduce a fault in the cipher algorithm by setting one or more more ciphertext bits to 0 or 1 ("stuck-at-zero" or "stuck-at-one" fault). Does the statistical test program detect the fault? Why or why not?

  6. (Advanced) Modify the PRESENT-80 statistical test program to count, for each bit position in the ciphertext, the number of 1s that occur in that bit position in the series of ciphertexts. Do a chi-square test on the data for each bit position. Does each bit position of PRESENT-80 look random?


Parallel Crypto Research


For Further Information

Alan Kaminsky Department of Computer Science Rochester Institute of Technology 4486 + 2220 = 6706
Home Page
Copyright © 2012 Alan Kaminsky. All rights reserved. Last updated 09-May-2012. Please send comments to ark­@­cs.rit.edu.