Problem Solving Lab #1

Problem Solving Lab #1
Using Regular Expressions and Perl

This lab is being done in order to help you learn various topics discussed in lecture. For attendance, you will all be asked to submit to try. You are allowed to work in pairs (and some of you probably will have to since there aren't enough machines otherwise). Since this lab is for your learning and since different people will learn at different rates, you may not finish all of the lab. You are not required to do anything if you don't finish (although you may want to try some of the exercises to increase your understanding of the material).


Part I - Regular Expressions

Background: You'll use regular expressions to look for words from a dictionary. This dictionary is available at:

~jdb/public_html/plc/perlLab/dictionary.txt

The standard command for searching using regular expressions is grep (and no, it is not "GNU rep"). grep stands for something like "general regular expression parser" (it depends on who you ask as to what it stands for). egrep is the extended version of grep. You traditionally use egrep in a form like the following:

% egrep 'regexp' file

It is important to put the regular expression in single quotes so that the shell does not interpret special characters, like braces, parens, and stars. NOTE: if you are not using csh, then commands may be slightly different from what I show here.

Here are the basic mechanisms for building a grep-style regular expression. You should be able to find more in the man page.

What can you do with the output from egrep? You can put the results in a new file.

% egrep 'regexp' file > results-file

You can look at the results by piping them through less or more.

% egrep 'regexp' file | less

You can simply count the results by piping them through wc, the word-count program.

% egrep 'regexp' file | wc -l

You can even send the results through another invocation of egrep (which is a nice way to get values accepted by both regular expressions; that is, the interesection of two langauges).

% egrep 'regexp1' file | egrep 'regexp2' | ...

Questions: Write a regular expression for each of the following and determine how many words in the text file match the regular expression.

  1. Words starting with a
  2. Words that start with a or A
  3. Words with exactly four letters
  4. Words with exactly four letters and begins with a
  5. Words with four or more letters
  6. Words that contain a capital letter
  7. Words that start with non-capital and include a capital
  8. Words with more than one capital
  9. Words that neither begin nor end with a
  10. Words that begin and end with a vowel
  11. Words that neither begin nor end with a vowel
  12. Words that contain the vowels in order
  13. Words that contain your initials in order
  14. Words that contain only the letters of your first name


Part II - The Basics of an Emacs Development Environment for Perl

Background: Emacs is a handy tool for a lot of purposes. It has a Perl mode, but the default isn't too pretty. You can change the default behavior by adding code to your .emacs file in your home directory. Note that when you change this file you should re-run emacs to see the changes. Try adding the following eLisp code to your .emacs file in your home directory:

;; Use cperl-mode instead of the default perl-mode
(add-to-list 'auto-mode-alist '("\\.\\([pP][Llm]\\|al\\)\\'" . cperl-mode))
(add-to-list 'interpreter-mode-alist '("perl" . cperl-mode))
(add-to-list 'interpreter-mode-alist '("perl5" . cperl-mode))
(add-to-list 'interpreter-mode-alist '("miniperl" . cperl-mode))

This code will load a more comprehensive Perl mode (it even includes access to the perl debugger where you can step through code line-by-line!). If you don't like the defaults, you can change them:

(add-hook 'cperl-mode-hook 'n-cperl-mode-hook t)
(defun n-cperl-mode-hook ()
  (setq cperl-indent-level 4)
  (setq cperl-continued-statement-offset 0)
  (setq cperl-extra-newline-before-brace t)
  (set-face-background 'cperl-array-face "wheat")
  (set-face-background 'cperl-hash-face "wheat")
  )

If you want to automatically have font coloring turned on, you can add the following to your .emacs file:

(cond ((fboundp 'global-font-lock-mode)
       ;; Customize face attributes
       (setq font-lock-face-attributes
             ;; Symbol-for-Face Foreground Background Bold Italic Underline
             '((font-lock-comment-face       "DarkGreen")
               (font-lock-string-face        "Sienna")
               (font-lock-keyword-face       "RoyalBlue")
               (font-lock-function-name-face "Blue")
               (font-lock-variable-name-face "Black")
               (font-lock-type-face          "Black")
               (font-lock-reference-face     "Purple")
               ))
       ;; Load the font-lock package.
       (require 'font-lock)
       ;; Maximum colors
       (setq font-lock-maximum-decoration t)
       ;; Turn on font-lock in all modes that support it
       (global-font-lock-mode t)))


Part III - Working with Perl scripts

Background: I handed out some documents on basic Perl constructs that were very similar to an on-line tutorial. Do the exercises on the following tutorial pages:

  1. (optional) If you don't know how to run a perl program as a script try this: Basic Perl
    Make sure that you "use strict;" as mentioned in the handout.
  2. File Handling
    Make sure you've read about arrays and files, then copy the cat program and try the exercise at the bottom of the page. Follow the exercises through control structures and conditionals.
  3. Regular expressions
    Do the exercise at the bottom. It will build on your cat program.
  4. (optional - more advanced) Substitution and translation
    Do the exercise on the page.
  5. (optional - more advanced) Split
    Do the exercise on the page.


Part IV - CGI scripts

Background: Most CGI scripts are perl scripts where the .pl extension has been changed to .cgi.

  1. Start the CGI script tutorial at the University of Leeds . You won't finish it, but it's a start to making CGI scripts.


Submission

If you have not yet registered with try, your submission will not work. To register, type:

% try jdb-grd register /dev/null

Each of you should submit (from your own account) to try an empty file called: here.txt. In order to create the file, type:

% touch here.txt

You will be told the command to use for submitting the file in your lab.

Each of you should submit (from your own account) a file called reflections.txt after the lab is over. The goal of the file is for you to reflect on what you have learned in the lab, how far you got in the lab, what you had problems with, what I could do better next time, what you would have liked to do, and to give me feedback about the lab. In order to submit this file, type:

% try jdb-grd project11-1 reflections.txt

This file will be due this Thur. at midnight. The late deadline for this file (no penalty) is 11:59pm Friday. Files sent after the late deadline will not be accepted.