| Home Page |
| Course Page |
YYYY TTTTTQ
0029029070999991901010106004+64333+023450FM-12+000599999V0202701N015919999999N0000001N9-00781+99999102001ADDGF108991999999999999999999
0029029070999991901010113004+64333+023450FM-12+000599999V0202901N008219999999N0000001N9-00721+99999102001ADDGF104991999999999999999999
0029029070999991901010120004+64333+023450FM-12+000599999V0209991C000019999999N0000001N9-00941+99999102001ADDGF108991999999999999999999
0029029070999991901010206004+64333+023450FM-12+000599999V0201801N008219999999N0000001N9-00611+99999101831ADDGF108991999999999999999999
0029029070999991901010213004+64333+023450FM-12+000599999V0201801N009819999999N0000001N9-00561+99999101761ADDGF108991999999999999999999
0029029070999991901010220004+64333+023450FM-12+000599999V0201801N009819999999N0000001N9-00281+99999101751ADDGF108991999999999999999999
0029029070999991901010306004+64333+023450FM-12+000599999V0202001N009819999999N0000001N9-00671+99999101701ADDGF106991999999999999999999
0029029070999991901010313004+64333+023450FM-12+000599999V0202301N011819999999N0000001N9-00331+99999101741ADDGF108991999999999999999999
0029029070999991901010320004+64333+023450FM-12+000599999V0202301N011819999999N0000001N9-00281+99999101741ADDGF108991999999999999999999
0029029070999991901010406004+64333+023450FM-12+000599999V0209991C000019999999N0000001N9-00331+99999102311ADDGF108991999999999999999999
Input data
Figure 2-1. MapReduce logical data flow
White, op. cit.
Figure 2-2. MapReduce data flow with a single reduce task
White, op. cit.
Figure 2-3. MapReduce data flow with multiple reduce tasks
White, op. cit.
Figure 2-4. MapReduce data flow with no reduce tasks
White, op. cit.
$ hadoop MaxTemperature '*.txt' output
$ ls -l output total 4 -rwxrwxrwx 1 ark ark 18 2011-10-23 14:27 part-00000 -rwxrwxrwx 1 ark ark 0 2011-10-23 14:27 _SUCCESS $ cat output/part-00000 1901 317 1902 244
$ hadoop MaxTemperatureWithCombiner '*.txt' output2
$ ls -l output2 total 4 -rwxrwxrwx 1 ark ark 18 2011-10-23 14:29 part-00000 -rwxrwxrwx 1 ark ark 0 2011-10-23 14:29 _SUCCESS $ cat output2/part-00000 1901 317 1902 244
| Map-reduce | RDBMS | |||
| Data size | Petabytes | Gigabytes | ||
| Disk access | Streaming (batch) | Random (interactive or batch) | ||
| Structure | Unstructured or structured | Highly structured | ||
| Updates | Write once, read many times | Write and read many times |
| Map-reduce | Parallel computing | |||
| Type of problems | Mainly data-intensive | Mainly CPU-intensive | ||
| Thread/process coupling | Minimal | Minimal to maximal | ||
| Programming patterns | Just one | Anything | ||
| Programming effort | Small | Medium to large | ||
| Fault tolerance | Automatic | Manual |
| Course Page |
| Home Page |