Measuring Hadoop Execution time automatically

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Measuring Hadoop Execution time automatically

Or Raz

I am writing a MapReduce Job using Hadoop natively and I want to measure the execution time of the job and I am collecting it using Date (I know that also in the UI I can see the execution time). In order to find the correct execution time I am running this job 3 times (using one hadoop jar and a for loop that call the job) and I am getting very strange results.

It seems like the first run has much shorter time than the others and I get the same output (I know that the location of the containers might be the reason for the change of execution time but I am not sure why the first run is always the fastest). An example of the code of what I am using

public int run (String[] args ) throws Exception {
Configuration conf = getConf();
conf.set(...)
.
.
.
for (int i = 0; i < 3; i++)
{
Job job =job.getInstance(conf, "OR-MR");
.
.
.
if (!job.waitForCompletion(true))
  { 
    System.exit(1);
  }
}//for
}//run

By the way, if I am running the job 3 times (one per hadoop command) than I get the fast execution time.

Is there any other way to do measure the execution time without running it manually three times?