Problem in the way reducer and combiner operate(Merging)

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Problem in the way reducer and combiner operate(Merging)

nikjosh

2009-10-19 17:54:16,221 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
Initializing JVM Metrics with processName=SHUFFLE, sessionId=
2009-10-19 17:54:17,632 INFO org.apache.hadoop.mapred.ReduceTask:
ShuffleRamManager: MemoryLimit=78643200, MaxSingleShuffleLimit=19660800
2009-10-19 17:54:17,761 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_200910191750_0001_r_000000_0 Thread started: Thread for merging
on-disk files
2009-10-19 17:54:17,762 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_200910191750_0001_r_000000_0 Thread waiting: Thread for merging
on-disk files
2009-10-19 17:54:17,763 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_200910191750_0001_r_000000_0 Thread started: Thread for merging in
memory files
2009-10-19 17:54:17,782 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_200910191750_0001_r_000000_0 Need another 2 map output(s) where 0 is
already in progress
2009-10-19 17:54:17,793 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_200910191750_0001_r_000000_0: Got 0 new map-outputs & number of
known map outputs is 0
2009-10-19 17:54:17,793 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_200910191750_0001_r_000000_0 Scheduled 0 of 0 known outputs (0 slow
hosts and 0 dup hosts)
2009-10-19 17:54:22,816 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_200910191750_0001_r_000000_0: Got 2 new map-outputs & number of
known map outputs is 2
2009-10-19 17:54:22,818 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_200910191750_0001_r_000000_0 Scheduled 1 of 2 known outputs (0 slow
hosts and 1 dup hosts)
2009-10-19 17:54:23,041 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling
44 bytes (44 raw bytes) into RAM from attempt_200910191750_0001_m_000000_0
2009-10-19 17:54:23,043 INFO org.apache.hadoop.mapred.ReduceTask: Read 44
bytes from map-output for attempt_200910191750_0001_m_000000_0
2009-10-19 17:54:23,047 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
from attempt_200910191750_0001_m_000000_0 -> (4, 8) from localhost
2009-10-19 17:54:24,808 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_200910191750_0001_r_000000_0 Scheduled 1 of 1 known outputs (0 slow
hosts and 0 dup hosts)
2009-10-19 17:54:24,824 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling
44 bytes (44 raw bytes) into RAM from attempt_200910191750_0001_m_000001_0
2009-10-19 17:54:24,825 INFO org.apache.hadoop.mapred.ReduceTask: Read 44
bytes from map-output for attempt_200910191750_0001_m_000001_0
2009-10-19 17:54:24,825 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
from attempt_200910191750_0001_m_000001_0 -> (4, 8) from localhost
2009-10-19 17:54:25,800 INFO org.apache.hadoop.mapred.ReduceTask: Closed ram
manager
2009-10-19 17:54:25,817 INFO org.apache.hadoop.mapred.ReduceTask:
Interleaved on-disk merge complete: 0 files left.
2009-10-19 17:54:25,973 INFO org.apache.hadoop.mapred.ReduceTask: Initiating
in-memory merge with 2 segments...
2009-10-19 17:54:25,984 INFO org.apache.hadoop.mapred.Merger: Merging 2
sorted segments
2009-10-19 17:54:25,985 INFO org.apache.hadoop.mapred.Merger: Down to the
last merge-pass, with 2 segments left of total size: 88 bytes
2009-10-19 17:54:25,991 FATAL mma.CombA: Running Identity Combiner...
2009-10-19 17:54:25,992 FATAL mma.CombA: Identity Key is1 Combiner processed
#: 2
2009-10-19 17:54:25,992 FATAL mma.CombA: Running Identity Combiner...
2009-10-19 17:54:25,992 FATAL mma.CombA: Identity Key is4 Combiner processed
#: 1
2009-10-19 17:54:25,993 FATAL mma.CombA: Running Identity Combiner...
2009-10-19 17:54:25,993 FATAL mma.CombA: Identity Key is3 Combiner processed
#: 1
2009-10-19 17:54:25,993 FATAL mma.CombA: Running Identity Combiner...
2009-10-19 17:54:25,993 FATAL mma.CombA: Identity Key is4 Combiner processed
#: 1
2009-10-19 17:54:25,993 FATAL mma.CombA: Running Identity Combiner...
2009-10-19 17:54:25,993 FATAL mma.CombA: Identity Key is3 Combiner processed
#: 1
2009-10-19 17:54:25,996 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_200910191750_0001_r_000000_0 Merge of the 2 files in-memory
complete. Local file is
/var/lib/hadoop/cache/hadoop/mapred/local/taskTracker/jobcache/job_200910191750_0001/attempt_200910191750_0001_r_000000_0/output/map_0.out
of size 86
2009-10-19 17:54:25,996 INFO org.apache.hadoop.mapred.ReduceTask: In-memory
merge complete: 0 files left.
2009-10-19 17:54:25,998 INFO org.apache.hadoop.mapred.ReduceTask: Initiating
final on-disk merge with 1 files
2009-10-19 17:54:25,999 INFO org.apache.hadoop.mapred.Merger: Merging 1
sorted segments
2009-10-19 17:54:26,012 INFO org.apache.hadoop.mapred.Merger: Down to the
last merge-pass, with 1 segments left of total size: 86 bytes
2009-10-19 17:54:26,081 FATAL mma.RedA: THE KEY WAS 'PAVG'
2009-10-19 17:54:26,081 FATAL mma.RedA: AVG ENCTRD:0.0
2009-10-19 17:54:26,082 FATAL mma.RedA: AVG ENCTRD:3.0
2009-10-19 17:54:26,083 FATAL mma.RedA: Executed with key:1 Reducer with2.0
2009-10-19 17:54:26,084 FATAL mma.RedA: THE KEY WAS 'MIN'
2009-10-19 17:54:26,084 FATAL mma.RedA: MIN ENCTRD:3.0
2009-10-19 17:54:26,084 FATAL mma.RedA: Executed with key:4 Reducer with1.0
2009-10-19 17:54:26,084 FATAL mma.RedA: THE KEY WAS 'MAX'
2009-10-19 17:54:26,084 FATAL mma.RedA: MAX ENCTRD:3.0
2009-10-19 17:54:26,084 FATAL mma.RedA: Executed with key:3 Reducer with1.0
2009-10-19 17:54:26,084 FATAL mma.RedA: THE KEY WAS 'MIN'
2009-10-19 17:54:26,084 FATAL mma.RedA: MIN ENCTRD:1.0
2009-10-19 17:54:26,084 FATAL mma.RedA: Executed with key:4 Reducer with1.0
2009-10-19 17:54:26,085 FATAL mma.RedA: THE KEY WAS 'MAX'
2009-10-19 17:54:26,086 FATAL mma.RedA: MAX ENCTRD:2.0
2009-10-19 17:54:26,087 FATAL mma.RedA: Executed with key:3 Reducer with1.0
2009-10-19 17:54:26,215 INFO org.apache.hadoop.mapred.TaskRunner: Task
'attempt_200910191750_0001_r_000000_0' done.

The above is the log of my program, which simply does following:
1.Map outputs a kv pair, the key is set to value UNPROCESSED (the key is
intwritable)
2.Combiner takes kv pair with value UNPROCESSED and from the list for this
key, finds out min,max,avg and outputs
   three kv pairs, one for min with key MIN, one for MAX,one for AVG.

3.Reducer should then get the list of values for key MIN, list for MAX, and
list for AVG.

Problem:

As seen from log above during merging process, the output isnt sorted as per
the key, thus my reducer seemingly gets two lists for MIN and two list for
MAX instead of getting 1 key and a list for MAX, and same for min.


I wonder what the problem is, I concur that the problem is in merging
step....

Does anyone know whats wrong in above ,from the log?%-|
--
View this message in context: http://www.nabble.com/Problem-in-the-way-reducer-and-combiner-operate%28Merging%29-tp25982924p25982924.html
Sent from the Hadoop lucene-users mailing list archive at Nabble.com.