CSCI-541: Programming Skills C++
Lab 3 - Places


Due Date: Thursday May 2 @ 11:59pm

The Problem

The goal of this assignment is to efficiently find the closest pair of places from a database of over 3 million geographical locations that are published freely by MaxMind. You will develop a program that can perform the following time measured operations:

Input Specification

The database is an ASCII text file that contains one place per line with comma separated values. The first line is a header that indicates the name for each of the fields. Every line after the first is a place. The places are organized in the file alphabetically by country code.

Each line contains 7 comma separated values:

  1. Country: The country code (a 2 character string)
  2. City: A city name (a string that may contain spaces and is usually all lower case)
  3. AccentCity: The city name again (a case sensitive string)
  4. Region: The cities region (a string)
  5. Population: A positive integer (if known), otherwise the field is empty
  6. Latitude: A floating point number for the geographical latitude
  7. Longitude: A floating point number for the geographical longitude

Sample input

Here are a couple of sample entries for Rochester, NYC, London and Tokyo.

	us,rochester,Rochester,NY,211354,43.1547222,-77.6158333
	us,new york,New York,NY,8107916,40.7141667,-74.0063889
	gb,london,London,H9,7421228,51.514125,-.093689
	jp,tokyo,Tokyo,40,31480498,35.685,139.751389

In some instances, multiple places can share identical geographic locations.

	us,ray,Ray,IN,,41.7597222,-84.8719444
	us,ray,Ray,MI,,41.7597222,-84.8719444
Notice with the previous example that the population is not specified.

Output Specification:

The program runs in a loop that accepts the following commands:

	closest: find the closest place to one place
	closestAll: find the closest pair from all places
	distance: find the distance between two places
	find: find a place
	help: display this help message
	read: read a new collection of places
	quit: exit the program

Sample Output:

	$ places
	> help
	closest: find the closest place to one place
	closestAll: find the closest pair from all places
	distance: find the distance between two places
	find: find a place
	help: display this help message
	read: read a new collection of places
	quit: exit the program
	*** elapsed time: 0 seconds.
	> read ny.txt
	Removed 1 duplicate places.
	There are 5833 unique places.
	*** elapsed time: 0 seconds.
	> find
	Enter Place...
	Country: us
	City: new york
	Region: NY
	Country: us, City: new york, Region: NY, Latitude: 40.7142, Longitude: -74.0064
	*** elapsed time: 0 seconds.
	> distance
	Enter Place...
	Country: us
	City: rochester
	Region: NY
	Enter Place...
	Country: us
	City: buffalo
	Region: NY
	Country: us, City: rochester, Region: NY, Latitude: 43.1547, Longitude: -77.6158
	Country: us, City: buffalo, Region: NY, Latitude: 42.8864, Longitude: -78.8786
	66.4655 miles away.
	*** elapsed time: 0 seconds.
	> closest
	Enter Place...
	Country: us
	City: hunter
	Region: NY
	Country: us, City: hunter, Region: NY, Latitude: 42.2136, Longitude: -74.2192
	Country: us, City: south jewett, Region: NY, Latitude: 42.2233, Longitude: -74.2564
	2.02078 miles apart.
	*** elapsed time: 0 seconds.
	> closestAll 
	Country: us, City: bronxdale, Region: NY, Latitude: 40.8506, Longitude: -73.8669
	Country: us, City: bronx, Region: NY, Latitude: 40.85, Longitude: -73.8667
	0.0410635 miles apart.
	*** elapsed time: 0 seconds.
	> find
	Enter Place...
	Country: xxxxxx
	City: xxxxxxxxx
	Region: xxxxxxxx
	Not found!
	*** elapsed time: 0 seconds.
	> quit

Explanation

To compute the approximate distance in miles between two points specified by longitude and latitude, you can use the Haversine Formula. This page shows the formula and also allows you to enter locations and calculate distances.

Finding the closest place from a specified point can be accomplished in linear time by comparing and finding the mimimum distance between the specified point and all others.

Likewise, finding the closest pair of points could be done in brute force manner (quadratic time) by comparing each point to every other point and taking the minimum. This will prove to be far too inefficient with any significantly sized data set. Instead, you are required to solve this problem in O(nlogn) time by using a recursive divide and conquer method.

Professor Butler's notes on the algorithm can be found here.

Code Requirements

Your solution must:

You are free to develop on your own systems, but please leave time to make sure your code compiles and runs on our CS Ubuntu machines.

You are developing this entire solution from scratch. Your main program must be called places. It should be run with no other command line arguments. If other arguments are present, a usage message should be displayed to standard error and the program should exit.

To assist you with testing, three tests files are supplied to you in this zip file

These test files can also be found in the course account pub directory under (so you can test on the CS machines without needing a local copy that could blow up your quota):
	/home/course/csci541cpp/pub/labs/places/data/

This is a sample run of the solution program using the large data set. Take note of the time it takes to read in the data set, as well as find the closest pair of points.

	$ ./places
	> read /home/course/csci541cpp/pub/labs/places/data/world.txt
	Removed 930491 duplicate places.
	There are 2243467 unique places.
	*** elapsed time: 14 seconds.
	> closestAll
	Country: zr, City: malembeka, Region: 05, Latitude: -10.73, Longitude: 26.5714
	Country: cd, City: malembeka, Region: 05, Latitude: -10.73, Longitude: 26.5714
	6.79237e-06 miles away.
	*** elapsed time: 49 seconds.

Your solution should be able to find the closest pair of all points in world.txt in under 2 minutes on the CS machines to receive full credit.

To do time measurement for each command, use chrono::system_clock. Documentation can be found here.

Grading

The grade breakdown for this assignment is:

The major part of this assignment is implementing the closest pair of points algorithm in O(nlogn) time. The other commands serve as a way for you to "build" towards implementing this algorithm.

You must try to incorporate the new features of C++11 into your program and it must incorporate an object oriented design. For example, when reading input you must make use of C++'s I/O Stream library.

Submission

Submit your source code and Git log to try using the command:

    try grd-541cpp lab3-1 places.cpp {other .cpp and .h files} log.txt