Notes
Outline
Week 5
Algorithm Analysis
Searching
Sorting
Reading: Main p.19-28, 548-560, 590-604, 780-781
The goal of this course is to learn how to create GOOD PROGRAMS
Good programs are those which
run correctly
run efficiently
are easy to debug
are easy to maintain or to modify
What does good mean?
Time analysis of algorithms
(AKA run-time analysis, time complexity, program complexity, order of magnitude)
Suppose you want to judge two algorithms, A and B, to find which was better in terms of running time. You implement both A and B, run them and discover that A took 2 min and B took 1 min 45 sec.
Does this prove that B is better than A?
What if you standardized a computer, OS, test data, etc.?
Instead of measuring time computer scientist would calculate a number of steps necessary to complete an algorithm. A step is a simple operation like an assignment, comparison, a simple arithmetic operation, etc.
Ex. Problem: Calculate the sum of all the integers from 1 to n.
Alg.#1: int sum=0; for (int count=1; count<=n; count++) sum += count;
Alg. #2 int sum=((n+1)*n)/2;
What does running time depend upon?
Input
Compiler
Machine
Algorithm
Most interesting algorithms depend on their input size, n
Comparison of algorithms
Algorithms are analyzed by considering n as unbounded and looking at the function of n: i.e. expressing the number of algorithm steps in n. As n becomes large we can ignore any housekeeping activities of a program and only consider constantly repeated tasks.
Algorithms are classified by the type of function that is related to them using "The Big Oh Notation". Big Oh gives a very general idea of the type of algorithm.
Big O mathematical definition
f(n) is actual function
associated with
algorithm execution
g(n) is simplest
function such that
and f(n) <= c g(n))
with n> n0
Growth Rate
Constant
Logarithmic
Linear
n log(n)
Quadratic
Cubic
Exponential
O(1)
O(log(n))
O(n)
O(n log(n))
O(n2)
O(n3)
O(2n)
Comparison
Program analysis
Another reason why programmers like big O is that it can often be concluded from the control structures in a program
E.g. , no loops or recursion – O(1)
A single counting loop or recursion that simplifies by a fixed amount – linear n
A double nested FOR loop – ?
Recursion that divides by a fixed constant - ?
Relative growth rate of functions
Table of Relative Growth Rates
Hard problems
If a polynomial time algorithm exists for a problem it is generally considered to be ‘well solved’ (polynomial functions are functions where the powers are constants, e.g. n5 + n3). A polynomial solution means some method has been found to solve a problem that is better than just ‘blind guessing’ (technically known as exhaustive search).
Algorithms that ‘just guess’ typically have an exponential growth rate in n (e.g. 2n). As n grows such problems quickly become too large to solve. An example of such an algorithm would be to sort a list by randomly rearranging elements and then testing the list to see if it is in order. If we had no insight into how to sort a list this would be our best approach.
Research in computer science has revealed several categories of ‘hard’ problem:
Undecidable problems (i.e. cannot be solved)
Nondeterministically’ intractable
‘Nondeterministically’ polynomial (NP)
NP-Complete and NP-Hard Problems
The third category of problem (NP problems) can be solved in polynomial time but only assuming an unbounded number of processors. This means there is no polynomial time algorithm yet developed for these problems that would work on a single processor - but no one can prove that such an algorithm may not be possible.
NP problems are recognised because they can all be reduced to a common base problem called the satisfiability problem. The reason NP problems are studied is because a large number of practical problems have been shown to be NP-complete or NP-hard (NP-complete have a yes/no answer whereas NP-hard problems have a more complex output).
The attempt to make computers ‘intelligent’ has hit the NP barrier. Most activities that we do naturally, like walking, talking, recognising objects, people and handwriting, etc, when specified to a computer become NP-complete search problems. Therefore much work has gone into developing algorithms that, while still exponential in theory, perform well in practice. The human brain is an excellent example of a computer that can solve NP problems efficiently.
Problem
Suppose we have a method whose worst running time worstTime(n) is a given function in n. Determine the effect of tripling n on the estimate of worst time. That is, estimate worstTime(3n) in terms of worstTime(n) if
1)  WorstTime(n) is linear in n
2)  WorstTime(n) is quadratic in n
3)  WorstTime(n) is constant
Linear Search
Numbers can be in any order.
Works for Linked Lists.
Linear Search
Binary Search
Need
List to be sorted.
To be able to do random accesses.
Slide 18
Other Uses of Binary Search
To find where a function is zero.
Compute functions.
Tree data structures.
Data processing.
Debugging code.
Sorting
The Problem.
To rearrange a set of items into order.
Applications
Aiding searches.
Finding duplicates.
Finding matching entries in different files.
Elementary Sorting
No best method of sorting.
First ones people think of.
Generalize to better ones.
Useful for a small number of items.
Useful for “almost sorted” items.
Overview
Sorting
Bubble Sort
Selection Sort
Insertion Sort
Bubble Sort
Swap
Slide 25
Selection Sort
Find Maximum
Slide 28
Insertion Sort
Slide 30
Comparison
Applications
Insertion Sort
Linear for almost sorted arrays.
Linear in arrays where items are a constant distance from their final position.
Selection Sort
Useful for large records.