XmlInputFormat and xml Files

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

XmlInputFormat and xml Files

Hi All

I was using StreamInputFormat to process xml files but now due to user
comments about StreamXmlRecordReader, i am planning to use*
*  XmlInputFormat.java.

I was using this command to process the jobs first

hadoop jar
/usr/lib/hadoop-0.20/contrib/streaming/hadoop-streaming-0.20.2+320.jar \
-D mapred.child.java.opts=-Xmx1024m \
-inputformat StreamInputFormat \
-inputreader "StreamXmlRecordReader,begin=<abc>,end=</abc>" \
-input  /user/root/file1 \
-jobconf mapred.map.tasks=1 \
-jobconf mapred.reduce.tasks=0 \
-output output1 \
-mapper /home/mapper.groovy \
-reducer org.apache.hadoop.mapred.lib.IdentityReducer \

Now my question is how i use this xmlinputformat to parse the xml files.
will i need to place this java class in somewhere in hadoop to make it
accessible in streaming command or what else?

do anybody have working example of using xmlinputformat?


Shuja-ur-Rehman Baig
MS CS - School of Science and Engineering
Lahore University of Management Sciences (LUMS)
Sector U, DHA, Lahore, 54792, Pakistan
Cell: +92 3214207445