Including Additional Jars

classic Classic list List threaded Threaded
17 messages Options
Reply | Threaded
Open this post in threaded view
|

Including Additional Jars

shujamughal
Hi All

I have created a map reduce job and to run on it on the cluster, i have
bundled all jars(hadoop, hbase etc) into single jar which increases the size
of overall file. During the development process, i need to copy again and
again this complete file which is very time consuming so is there any way
that i just copy the program jar only and do not need to copy the lib files
again and again. i am using net beans to develop the program.

kindly let me know how to solve this issue?

Thanks

--
Regards
Shuja-ur-Rehman Baig
<http://pk.linkedin.com/in/shujamughal>
Reply | Threaded
Open this post in threaded view
|

Re: Including Additional Jars

Mark Kerzner-3
Shuja,

here is what I do in NB environment

#!/bin/sh
cd ../dist
jar -xf Chapter1.jar
jar -cmf META-INF/MANIFEST.MF  ../Chapter3-for-Hadoop.jar *
cd ../bin
echo "Repackaged for Hadoop"

and it does the job. I run it only when I want to build this jar.

Mark

On Mon, Apr 4, 2011 at 10:06 AM, Shuja Rehman <[hidden email]> wrote:

> Hi All
>
> I have created a map reduce job and to run on it on the cluster, i have
> bundled all jars(hadoop, hbase etc) into single jar which increases the
> size
> of overall file. During the development process, i need to copy again and
> again this complete file which is very time consuming so is there any way
> that i just copy the program jar only and do not need to copy the lib files
> again and again. i am using net beans to develop the program.
>
> kindly let me know how to solve this issue?
>
> Thanks
>
> --
> Regards
> Shuja-ur-Rehman Baig
> <http://pk.linkedin.com/in/shujamughal>
>
Reply | Threaded
Open this post in threaded view
|

Re: Including Additional Jars

Mark Kerzner-3
That was for my book (chapter 1 attached, you may find other things useful), but you would substitute it with your project name.

Mark

On Mon, Apr 4, 2011 at 10:17 AM, Mark Kerzner <[hidden email]> wrote:
Shuja,

here is what I do in NB environment

#!/bin/sh
cd ../dist
jar -xf Chapter1.jar
jar -cmf META-INF/MANIFEST.MF  ../Chapter3-for-Hadoop.jar *
cd ../bin
echo "Repackaged for Hadoop"

and it does the job. I run it only when I want to build this jar.

Mark

On Mon, Apr 4, 2011 at 10:06 AM, Shuja Rehman <[hidden email]> wrote:
Hi All

I have created a map reduce job and to run on it on the cluster, i have
bundled all jars(hadoop, hbase etc) into single jar which increases the size
of overall file. During the development process, i need to copy again and
again this complete file which is very time consuming so is there any way
that i just copy the program jar only and do not need to copy the lib files
again and again. i am using net beans to develop the program.

kindly let me know how to solve this issue?

Thanks

--
Regards
Shuja-ur-Rehman Baig
<http://pk.linkedin.com/in/shujamughal>


Reply | Threaded
Open this post in threaded view
|

HBase schema design

Miguel Costa

Hi,

 

I need some help to a schema design on HBase.

 

I have 5 dimensions (Time,Site,Referrer Keyword,Country).

My row key is Site+Time.

 

Now I want to answer some questions like what is the top Referrer by Keyword for a site on a Period of Time.

Basically I want to cross all the dimensions that I have. And if I have 30 dimensions?

 

What is the best schema design.

 

Please let me know  if this isn’t the right mailing list.

 

Thank you for your time.

 

Miguel

 

 

                                                                                                                              

 

 


smime.p7s (6K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: HBase schema design

Ted Dunning-2
The hbase list would be more appropriate.

See http://hbase.apache.org/mail-lists.html

<http://hbase.apache.org/mail-lists.html>There is an active IRC channel, but
your question fits the mailing list better so pop on over and I will give
you some comments.

In the meantime, take a look at OpenTSDB who are doing something very much
like what you want to do.

On Mon, Apr 4, 2011 at 8:43 AM, Miguel Costa <[hidden email]>wrote:

> Hi,
>
>
>
> I need some help to a schema design on HBase.
>
>
>
> I have 5 dimensions (Time,Site,Referrer Keyword,Country).
>
> My row key is Site+Time.
>
>
>
> Now I want to answer some questions like what is the top Referrer by
> Keyword for a site on a Period of Time.
>
> Basically I want to cross all the dimensions that I have. And if I have 30
> dimensions?
>
>
>
> What is the best schema design.
>
>
>
> Please let me know  if this isn’t the right mailing list.
>
>
>
> Thank you for your time.
>
>
>
> Miguel
>
>
>
>
>
>
>
>
>
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Including Additional Jars

Mark Kerzner-3
In reply to this post by Mark Kerzner-3
Then it seems you want to do the opposite of what I have done in this
script. I AM combining all the jars in one jar, and you already have that.

Rather, you want to distribute only your app jar, and put the other ones in
the lib folder on the server.

I know that when you run a standard MR job, you only need to mention your
jar, and the other Hadoop jars already come from the lib. In other words,
you should be able to run it like this:

hadoop jar your-jar parameters

Since you are using Cloudera distro, this runs the following

/usr/bin/hadoop-0.20

which in turn runs this script

#!/bin/sh
export HADOOP_HOME=/usr/lib/hadoop-0.20
exec /usr/lib/hadoop-0.20/bin/hadoop "$@"

Since HADOOP_HOME is set, it knows that the libraries are in here

/usr/lib/hadoop-0.20/lib/

therefore, I think that if you put your additional libraries in the same
folder, it should just pick them up.

Sincerely,
Mark


On Mon, Apr 4, 2011 at 11:31 AM, Shuja Rehman <[hidden email]> wrote:

> hi,
> i do not understand it. can u take my explain it with my example?
>
> I have following jars in lib folder of dist created by netbeans
> (dist/lib/).
>
> commons-logging-1.1.1.jar
> guava-r07.jar
> hadoop-0.20.2+737-core.jar
> hbase.jar
> hbase-0.89.20100924+28.jar
> log4j-1.2.15.jar
> mysql-connector-java-5.1.7-bin.jar
> UIDataTransporter.jar
> zookeeper.jar
>
> and dist folder contains only
>
> MyProgram.jar
>
>
> at the moment, i am combining all jars files to produce the single file.
> but now i want to just put the dist/lib/ *.jars for once on server and only
> MyProgram.jar should be copied everytime i change the code.
>
> so can u transfer ur code according to my example???
> Thanks
>
>
>
>
> On Mon, Apr 4, 2011 at 8:17 PM, Mark Kerzner <[hidden email]>wrote:
>
>> Shuja,
>>
>> here is what I do in NB environment
>>
>> #!/bin/sh
>> cd ../dist
>> jar -xf Chapter1.jar
>> jar -cmf META-INF/MANIFEST.MF  ../Chapter3-for-Hadoop.jar *
>> cd ../bin
>> echo "Repackaged for Hadoop"
>>
>> and it does the job. I run it only when I want to build this jar.
>>
>> Mark
>>
>> On Mon, Apr 4, 2011 at 10:06 AM, Shuja Rehman <[hidden email]>wrote:
>>
>>> Hi All
>>>
>>> I have created a map reduce job and to run on it on the cluster, i have
>>> bundled all jars(hadoop, hbase etc) into single jar which increases the
>>> size
>>> of overall file. During the development process, i need to copy again and
>>> again this complete file which is very time consuming so is there any way
>>> that i just copy the program jar only and do not need to copy the lib
>>> files
>>> again and again. i am using net beans to develop the program.
>>>
>>> kindly let me know how to solve this issue?
>>>
>>> Thanks
>>>
>>> --
>>> Regards
>>> Shuja-ur-Rehman Baig
>>> <http://pk.linkedin.com/in/shujamughal>
>>>
>>
>>
>
>
> --
> Regards
> Shuja-ur-Rehman Baig
> <http://pk.linkedin.com/in/shujamughal>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Including Additional Jars

Allen Wittenauer
In reply to this post by shujamughal

On Apr 4, 2011, at 8:06 AM, Shuja Rehman wrote:

> Hi All
>
> I have created a map reduce job and to run on it on the cluster, i have
> bundled all jars(hadoop, hbase etc) into single jar which increases the size
> of overall file. During the development process, i need to copy again and
> again this complete file which is very time consuming so is there any way
> that i just copy the program jar only and do not need to copy the lib files
> again and again. i am using net beans to develop the program.
>
> kindly let me know how to solve this issue?

        This was in the FAQ, but in a non-obvious place.  I've updated it to be more visible (hopefully):

http://wiki.apache.org/hadoop/FAQ#How_do_I_submit_extra_content_.28jars.2C_static_files.2C_etc.29_for_my_job_to_use_during_runtime.3F
Reply | Threaded
Open this post in threaded view
|

Re: Including Additional Jars

Marco Didonna
On 04/04/2011 07:06 PM, Allen Wittenauer wrote:

>
> On Apr 4, 2011, at 8:06 AM, Shuja Rehman wrote:
>
>> Hi All
>>
>> I have created a map reduce job and to run on it on the cluster, i have
>> bundled all jars(hadoop, hbase etc) into single jar which increases the size
>> of overall file. During the development process, i need to copy again and
>> again this complete file which is very time consuming so is there any way
>> that i just copy the program jar only and do not need to copy the lib files
>> again and again. i am using net beans to develop the program.
>>
>> kindly let me know how to solve this issue?
>
> This was in the FAQ, but in a non-obvious place.  I've updated it to be more visible (hopefully):
>
> http://wiki.apache.org/hadoop/FAQ#How_do_I_submit_extra_content_.28jars.2C_static_files.2C_etc.29_for_my_job_to_use_during_runtime.3F

Does the same apply to jar containing libraries? Let's suppose I need
lucene-core.jar to run my project. Can I put my this jar into my job jar
and have hadoop "see" lucene's classes? Or should I use distributed cache??

MD

Reply | Threaded
Open this post in threaded view
|

Re: Including Additional Jars

Mark Kerzner-3
I think you can put them either in your jar or in distributed cache.

As Allen pointed out, my idea of putting them into hadoop lib jar was wrong.

Mark

On Mon, Apr 4, 2011 at 12:16 PM, Marco Didonna <[hidden email]>wrote:

> On 04/04/2011 07:06 PM, Allen Wittenauer wrote:
>
>>
>> On Apr 4, 2011, at 8:06 AM, Shuja Rehman wrote:
>>
>>  Hi All
>>>
>>> I have created a map reduce job and to run on it on the cluster, i have
>>> bundled all jars(hadoop, hbase etc) into single jar which increases the
>>> size
>>> of overall file. During the development process, i need to copy again and
>>> again this complete file which is very time consuming so is there any way
>>> that i just copy the program jar only and do not need to copy the lib
>>> files
>>> again and again. i am using net beans to develop the program.
>>>
>>> kindly let me know how to solve this issue?
>>>
>>
>>        This was in the FAQ, but in a non-obvious place.  I've updated it
>> to be more visible (hopefully):
>>
>>
>> http://wiki.apache.org/hadoop/FAQ#How_do_I_submit_extra_content_.28jars.2C_static_files.2C_etc.29_for_my_job_to_use_during_runtime.3F
>>
>
> Does the same apply to jar containing libraries? Let's suppose I need
> lucene-core.jar to run my project. Can I put my this jar into my job jar and
> have hadoop "see" lucene's classes? Or should I use distributed cache??
>
> MD
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Including Additional Jars

shujamughal
well...i think to put in distributed cache is good idea. do u have any
working example how to put extra jars in distributed cache and how to make
available these jars for job?
Thanks

On Mon, Apr 4, 2011 at 10:20 PM, Mark Kerzner <[hidden email]> wrote:

> I think you can put them either in your jar or in distributed cache.
>
> As Allen pointed out, my idea of putting them into hadoop lib jar was
> wrong.
>
> Mark
>
> On Mon, Apr 4, 2011 at 12:16 PM, Marco Didonna <[hidden email]
> >wrote:
>
> > On 04/04/2011 07:06 PM, Allen Wittenauer wrote:
> >
> >>
> >> On Apr 4, 2011, at 8:06 AM, Shuja Rehman wrote:
> >>
> >>  Hi All
> >>>
> >>> I have created a map reduce job and to run on it on the cluster, i have
> >>> bundled all jars(hadoop, hbase etc) into single jar which increases the
> >>> size
> >>> of overall file. During the development process, i need to copy again
> and
> >>> again this complete file which is very time consuming so is there any
> way
> >>> that i just copy the program jar only and do not need to copy the lib
> >>> files
> >>> again and again. i am using net beans to develop the program.
> >>>
> >>> kindly let me know how to solve this issue?
> >>>
> >>
> >>        This was in the FAQ, but in a non-obvious place.  I've updated it
> >> to be more visible (hopefully):
> >>
> >>
> >>
> http://wiki.apache.org/hadoop/FAQ#How_do_I_submit_extra_content_.28jars.2C_static_files.2C_etc.29_for_my_job_to_use_during_runtime.3F
> >>
> >
> > Does the same apply to jar containing libraries? Let's suppose I need
> > lucene-core.jar to run my project. Can I put my this jar into my job jar
> and
> > have hadoop "see" lucene's classes? Or should I use distributed cache??
> >
> > MD
> >
> >
>



--
Regards
Shuja-ur-Rehman Baig
<http://pk.linkedin.com/in/shujamughal>
Reply | Threaded
Open this post in threaded view
|

Re: Including Additional Jars

James Seigel Tynt
James’ quick and dirty, get your job running guideline:

-libjars <-- for jars you want accessible by the mappers and reducers
classpath or bundled in the main jar <-- for jars you want accessible to the runner

Cheers
James.



On 2011-04-04, at 12:31 PM, Shuja Rehman wrote:

> well...i think to put in distributed cache is good idea. do u have any
> working example how to put extra jars in distributed cache and how to make
> available these jars for job?
> Thanks
>
> On Mon, Apr 4, 2011 at 10:20 PM, Mark Kerzner <[hidden email]> wrote:
>
>> I think you can put them either in your jar or in distributed cache.
>>
>> As Allen pointed out, my idea of putting them into hadoop lib jar was
>> wrong.
>>
>> Mark
>>
>> On Mon, Apr 4, 2011 at 12:16 PM, Marco Didonna <[hidden email]
>>> wrote:
>>
>>> On 04/04/2011 07:06 PM, Allen Wittenauer wrote:
>>>
>>>>
>>>> On Apr 4, 2011, at 8:06 AM, Shuja Rehman wrote:
>>>>
>>>> Hi All
>>>>>
>>>>> I have created a map reduce job and to run on it on the cluster, i have
>>>>> bundled all jars(hadoop, hbase etc) into single jar which increases the
>>>>> size
>>>>> of overall file. During the development process, i need to copy again
>> and
>>>>> again this complete file which is very time consuming so is there any
>> way
>>>>> that i just copy the program jar only and do not need to copy the lib
>>>>> files
>>>>> again and again. i am using net beans to develop the program.
>>>>>
>>>>> kindly let me know how to solve this issue?
>>>>>
>>>>
>>>>       This was in the FAQ, but in a non-obvious place.  I've updated it
>>>> to be more visible (hopefully):
>>>>
>>>>
>>>>
>> http://wiki.apache.org/hadoop/FAQ#How_do_I_submit_extra_content_.28jars.2C_static_files.2C_etc.29_for_my_job_to_use_during_runtime.3F
>>>>
>>>
>>> Does the same apply to jar containing libraries? Let's suppose I need
>>> lucene-core.jar to run my project. Can I put my this jar into my job jar
>> and
>>> have hadoop "see" lucene's classes? Or should I use distributed cache??
>>>
>>> MD
>>>
>>>
>>
>
>
>
> --
> Regards
> Shuja-ur-Rehman Baig
> <http://pk.linkedin.com/in/shujamughal>

Reply | Threaded
Open this post in threaded view
|

Re: Including Additional Jars

Bill Graham
Shuja, I haven't tried this, but from what I've read it seems you
could just add all your jars required by the Mapper and Reducer to
HDFS and then add them to the classpath in your run() method like
this:

DistributedCache.addFileToClassPath(new Path("/myapp/mylib.jar"), job);

I think that's all there is to it, but like I said, I haven't tried
it. Just be sure your run() method isn't in the same class as your
mapper/reducer if they import packages from any of the distributed
cache jars.


On Mon, Apr 4, 2011 at 11:40 AM, James Seigel <[hidden email]> wrote:

> James’ quick and dirty, get your job running guideline:
>
> -libjars <-- for jars you want accessible by the mappers and reducers
> classpath or bundled in the main jar <-- for jars you want accessible to the runner
>
> Cheers
> James.
>
>
>
> On 2011-04-04, at 12:31 PM, Shuja Rehman wrote:
>
>> well...i think to put in distributed cache is good idea. do u have any
>> working example how to put extra jars in distributed cache and how to make
>> available these jars for job?
>> Thanks
>>
>> On Mon, Apr 4, 2011 at 10:20 PM, Mark Kerzner <[hidden email]> wrote:
>>
>>> I think you can put them either in your jar or in distributed cache.
>>>
>>> As Allen pointed out, my idea of putting them into hadoop lib jar was
>>> wrong.
>>>
>>> Mark
>>>
>>> On Mon, Apr 4, 2011 at 12:16 PM, Marco Didonna <[hidden email]
>>>> wrote:
>>>
>>>> On 04/04/2011 07:06 PM, Allen Wittenauer wrote:
>>>>
>>>>>
>>>>> On Apr 4, 2011, at 8:06 AM, Shuja Rehman wrote:
>>>>>
>>>>> Hi All
>>>>>>
>>>>>> I have created a map reduce job and to run on it on the cluster, i have
>>>>>> bundled all jars(hadoop, hbase etc) into single jar which increases the
>>>>>> size
>>>>>> of overall file. During the development process, i need to copy again
>>> and
>>>>>> again this complete file which is very time consuming so is there any
>>> way
>>>>>> that i just copy the program jar only and do not need to copy the lib
>>>>>> files
>>>>>> again and again. i am using net beans to develop the program.
>>>>>>
>>>>>> kindly let me know how to solve this issue?
>>>>>>
>>>>>
>>>>>       This was in the FAQ, but in a non-obvious place.  I've updated it
>>>>> to be more visible (hopefully):
>>>>>
>>>>>
>>>>>
>>> http://wiki.apache.org/hadoop/FAQ#How_do_I_submit_extra_content_.28jars.2C_static_files.2C_etc.29_for_my_job_to_use_during_runtime.3F
>>>>>
>>>>
>>>> Does the same apply to jar containing libraries? Let's suppose I need
>>>> lucene-core.jar to run my project. Can I put my this jar into my job jar
>>> and
>>>> have hadoop "see" lucene's classes? Or should I use distributed cache??
>>>>
>>>> MD
>>>>
>>>>
>>>
>>
>>
>>
>> --
>> Regards
>> Shuja-ur-Rehman Baig
>> <http://pk.linkedin.com/in/shujamughal>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Including Additional Jars

shujamughal
In reply to this post by James Seigel Tynt
-libjars is not working nor distributed cache, any other
solution??????????????????????????????????????????

On Mon, Apr 4, 2011 at 11:40 PM, James Seigel <[hidden email]> wrote:

> James’ quick and dirty, get your job running guideline:
>
> -libjars <-- for jars you want accessible by the mappers and reducers
> classpath or bundled in the main jar <-- for jars you want accessible to
> the runner
>
> Cheers
> James.
>
>
>
> On 2011-04-04, at 12:31 PM, Shuja Rehman wrote:
>
> > well...i think to put in distributed cache is good idea. do u have any
> > working example how to put extra jars in distributed cache and how to
> make
> > available these jars for job?
> > Thanks
> >
> > On Mon, Apr 4, 2011 at 10:20 PM, Mark Kerzner <[hidden email]>
> wrote:
> >
> >> I think you can put them either in your jar or in distributed cache.
> >>
> >> As Allen pointed out, my idea of putting them into hadoop lib jar was
> >> wrong.
> >>
> >> Mark
> >>
> >> On Mon, Apr 4, 2011 at 12:16 PM, Marco Didonna <[hidden email]
> >>> wrote:
> >>
> >>> On 04/04/2011 07:06 PM, Allen Wittenauer wrote:
> >>>
> >>>>
> >>>> On Apr 4, 2011, at 8:06 AM, Shuja Rehman wrote:
> >>>>
> >>>> Hi All
> >>>>>
> >>>>> I have created a map reduce job and to run on it on the cluster, i
> have
> >>>>> bundled all jars(hadoop, hbase etc) into single jar which increases
> the
> >>>>> size
> >>>>> of overall file. During the development process, i need to copy again
> >> and
> >>>>> again this complete file which is very time consuming so is there any
> >> way
> >>>>> that i just copy the program jar only and do not need to copy the lib
> >>>>> files
> >>>>> again and again. i am using net beans to develop the program.
> >>>>>
> >>>>> kindly let me know how to solve this issue?
> >>>>>
> >>>>
> >>>>       This was in the FAQ, but in a non-obvious place.  I've updated
> it
> >>>> to be more visible (hopefully):
> >>>>
> >>>>
> >>>>
> >>
> http://wiki.apache.org/hadoop/FAQ#How_do_I_submit_extra_content_.28jars.2C_static_files.2C_etc.29_for_my_job_to_use_during_runtime.3F
> >>>>
> >>>
> >>> Does the same apply to jar containing libraries? Let's suppose I need
> >>> lucene-core.jar to run my project. Can I put my this jar into my job
> jar
> >> and
> >>> have hadoop "see" lucene's classes? Or should I use distributed cache??
> >>>
> >>> MD
> >>>
> >>>
> >>
> >
> >
> >
> > --
> > Regards
> > Shuja-ur-Rehman Baig
> > <http://pk.linkedin.com/in/shujamughal>
>
>


--
Regards
Shuja-ur-Rehman Baig
<http://pk.linkedin.com/in/shujamughal>
Reply | Threaded
Open this post in threaded view
|

Re: Including Additional Jars

Bill Graham
If you could share more specifics regarding just how it's not working
(i.e., job specifics, stack traces, how you're invoking it, etc), you
might get more assistance in troubleshooting.


On Wed, Apr 6, 2011 at 1:44 AM, Shuja Rehman <[hidden email]> wrote:

> -libjars is not working nor distributed cache, any other
> solution??????????????????????????????????????????
>
> On Mon, Apr 4, 2011 at 11:40 PM, James Seigel <[hidden email]> wrote:
>
>> James’ quick and dirty, get your job running guideline:
>>
>> -libjars <-- for jars you want accessible by the mappers and reducers
>> classpath or bundled in the main jar <-- for jars you want accessible to
>> the runner
>>
>> Cheers
>> James.
>>
>>
>>
>> On 2011-04-04, at 12:31 PM, Shuja Rehman wrote:
>>
>> > well...i think to put in distributed cache is good idea. do u have any
>> > working example how to put extra jars in distributed cache and how to
>> make
>> > available these jars for job?
>> > Thanks
>> >
>> > On Mon, Apr 4, 2011 at 10:20 PM, Mark Kerzner <[hidden email]>
>> wrote:
>> >
>> >> I think you can put them either in your jar or in distributed cache.
>> >>
>> >> As Allen pointed out, my idea of putting them into hadoop lib jar was
>> >> wrong.
>> >>
>> >> Mark
>> >>
>> >> On Mon, Apr 4, 2011 at 12:16 PM, Marco Didonna <[hidden email]
>> >>> wrote:
>> >>
>> >>> On 04/04/2011 07:06 PM, Allen Wittenauer wrote:
>> >>>
>> >>>>
>> >>>> On Apr 4, 2011, at 8:06 AM, Shuja Rehman wrote:
>> >>>>
>> >>>> Hi All
>> >>>>>
>> >>>>> I have created a map reduce job and to run on it on the cluster, i
>> have
>> >>>>> bundled all jars(hadoop, hbase etc) into single jar which increases
>> the
>> >>>>> size
>> >>>>> of overall file. During the development process, i need to copy again
>> >> and
>> >>>>> again this complete file which is very time consuming so is there any
>> >> way
>> >>>>> that i just copy the program jar only and do not need to copy the lib
>> >>>>> files
>> >>>>> again and again. i am using net beans to develop the program.
>> >>>>>
>> >>>>> kindly let me know how to solve this issue?
>> >>>>>
>> >>>>
>> >>>>       This was in the FAQ, but in a non-obvious place.  I've updated
>> it
>> >>>> to be more visible (hopefully):
>> >>>>
>> >>>>
>> >>>>
>> >>
>> http://wiki.apache.org/hadoop/FAQ#How_do_I_submit_extra_content_.28jars.2C_static_files.2C_etc.29_for_my_job_to_use_during_runtime.3F
>> >>>>
>> >>>
>> >>> Does the same apply to jar containing libraries? Let's suppose I need
>> >>> lucene-core.jar to run my project. Can I put my this jar into my job
>> jar
>> >> and
>> >>> have hadoop "see" lucene's classes? Or should I use distributed cache??
>> >>>
>> >>> MD
>> >>>
>> >>>
>> >>
>> >
>> >
>> >
>> > --
>> > Regards
>> > Shuja-ur-Rehman Baig
>> > <http://pk.linkedin.com/in/shujamughal>
>>
>>
>
>
> --
> Regards
> Shuja-ur-Rehman Baig
> <http://pk.linkedin.com/in/shujamughal>
>
Reply | Threaded
Open this post in threaded view
|

Re: Including Additional Jars

shujamughal
i am using the following command

*hadoop jar myjar.jar -libjars /home/shuja/lib/mylib.jar  param1 param2
param3*

but the program still giving the error and does not find the mylib.jar. can
u confirm the syntax of command?
thnx



On Wed, Apr 6, 2011 at 8:29 PM, Bill Graham <[hidden email]> wrote:

> If you could share more specifics regarding just how it's not working
> (i.e., job specifics, stack traces, how you're invoking it, etc), you
> might get more assistance in troubleshooting.
>
>
> On Wed, Apr 6, 2011 at 1:44 AM, Shuja Rehman <[hidden email]>
> wrote:
> > -libjars is not working nor distributed cache, any other
> > solution??????????????????????????????????????????
> >
> > On Mon, Apr 4, 2011 at 11:40 PM, James Seigel <[hidden email]> wrote:
> >
> >> James’ quick and dirty, get your job running guideline:
> >>
> >> -libjars <-- for jars you want accessible by the mappers and reducers
> >> classpath or bundled in the main jar <-- for jars you want accessible to
> >> the runner
> >>
> >> Cheers
> >> James.
> >>
> >>
> >>
> >> On 2011-04-04, at 12:31 PM, Shuja Rehman wrote:
> >>
> >> > well...i think to put in distributed cache is good idea. do u have any
> >> > working example how to put extra jars in distributed cache and how to
> >> make
> >> > available these jars for job?
> >> > Thanks
> >> >
> >> > On Mon, Apr 4, 2011 at 10:20 PM, Mark Kerzner <[hidden email]>
> >> wrote:
> >> >
> >> >> I think you can put them either in your jar or in distributed cache.
> >> >>
> >> >> As Allen pointed out, my idea of putting them into hadoop lib jar was
> >> >> wrong.
> >> >>
> >> >> Mark
> >> >>
> >> >> On Mon, Apr 4, 2011 at 12:16 PM, Marco Didonna <
> [hidden email]
> >> >>> wrote:
> >> >>
> >> >>> On 04/04/2011 07:06 PM, Allen Wittenauer wrote:
> >> >>>
> >> >>>>
> >> >>>> On Apr 4, 2011, at 8:06 AM, Shuja Rehman wrote:
> >> >>>>
> >> >>>> Hi All
> >> >>>>>
> >> >>>>> I have created a map reduce job and to run on it on the cluster, i
> >> have
> >> >>>>> bundled all jars(hadoop, hbase etc) into single jar which
> increases
> >> the
> >> >>>>> size
> >> >>>>> of overall file. During the development process, i need to copy
> again
> >> >> and
> >> >>>>> again this complete file which is very time consuming so is there
> any
> >> >> way
> >> >>>>> that i just copy the program jar only and do not need to copy the
> lib
> >> >>>>> files
> >> >>>>> again and again. i am using net beans to develop the program.
> >> >>>>>
> >> >>>>> kindly let me know how to solve this issue?
> >> >>>>>
> >> >>>>
> >> >>>>       This was in the FAQ, but in a non-obvious place.  I've
> updated
> >> it
> >> >>>> to be more visible (hopefully):
> >> >>>>
> >> >>>>
> >> >>>>
> >> >>
> >>
> http://wiki.apache.org/hadoop/FAQ#How_do_I_submit_extra_content_.28jars.2C_static_files.2C_etc.29_for_my_job_to_use_during_runtime.3F
> >> >>>>
> >> >>>
> >> >>> Does the same apply to jar containing libraries? Let's suppose I
> need
> >> >>> lucene-core.jar to run my project. Can I put my this jar into my job
> >> jar
> >> >> and
> >> >>> have hadoop "see" lucene's classes? Or should I use distributed
> cache??
> >> >>>
> >> >>> MD
> >> >>>
> >> >>>
> >> >>
> >> >
> >> >
> >> >
> >> > --
> >> > Regards
> >> > Shuja-ur-Rehman Baig
> >> > <http://pk.linkedin.com/in/shujamughal>
> >>
> >>
> >
> >
> > --
> > Regards
> > Shuja-ur-Rehman Baig
> > <http://pk.linkedin.com/in/shujamughal>
> >
>



--
Regards
Shuja-ur-Rehman Baig
<http://pk.linkedin.com/in/shujamughal>
Reply | Threaded
Open this post in threaded view
|

Re: Including Additional Jars

Bill Graham
You need to pass the mainClass after the jar:

http://hadoop.apache.org/common/docs/r0.21.0/commands_manual.html#jar

On Wed, Apr 6, 2011 at 11:31 AM, Shuja Rehman <[hidden email]> wrote:

> i am using the following command
>
> hadoop jar myjar.jar -libjars /home/shuja/lib/mylib.jar  param1 param2
> param3
>
> but the program still giving the error and does not find the mylib.jar. can
> u confirm the syntax of command?
> thnx
>
>
>
> On Wed, Apr 6, 2011 at 8:29 PM, Bill Graham <[hidden email]> wrote:
>>
>> If you could share more specifics regarding just how it's not working
>> (i.e., job specifics, stack traces, how you're invoking it, etc), you
>> might get more assistance in troubleshooting.
>>
>>
>> On Wed, Apr 6, 2011 at 1:44 AM, Shuja Rehman <[hidden email]>
>> wrote:
>> > -libjars is not working nor distributed cache, any other
>> > solution??????????????????????????????????????????
>> >
>> > On Mon, Apr 4, 2011 at 11:40 PM, James Seigel <[hidden email]> wrote:
>> >
>> >> James’ quick and dirty, get your job running guideline:
>> >>
>> >> -libjars <-- for jars you want accessible by the mappers and reducers
>> >> classpath or bundled in the main jar <-- for jars you want accessible
>> >> to
>> >> the runner
>> >>
>> >> Cheers
>> >> James.
>> >>
>> >>
>> >>
>> >> On 2011-04-04, at 12:31 PM, Shuja Rehman wrote:
>> >>
>> >> > well...i think to put in distributed cache is good idea. do u have
>> >> > any
>> >> > working example how to put extra jars in distributed cache and how to
>> >> make
>> >> > available these jars for job?
>> >> > Thanks
>> >> >
>> >> > On Mon, Apr 4, 2011 at 10:20 PM, Mark Kerzner <[hidden email]>
>> >> wrote:
>> >> >
>> >> >> I think you can put them either in your jar or in distributed cache.
>> >> >>
>> >> >> As Allen pointed out, my idea of putting them into hadoop lib jar
>> >> >> was
>> >> >> wrong.
>> >> >>
>> >> >> Mark
>> >> >>
>> >> >> On Mon, Apr 4, 2011 at 12:16 PM, Marco Didonna
>> >> >> <[hidden email]
>> >> >>> wrote:
>> >> >>
>> >> >>> On 04/04/2011 07:06 PM, Allen Wittenauer wrote:
>> >> >>>
>> >> >>>>
>> >> >>>> On Apr 4, 2011, at 8:06 AM, Shuja Rehman wrote:
>> >> >>>>
>> >> >>>> Hi All
>> >> >>>>>
>> >> >>>>> I have created a map reduce job and to run on it on the cluster,
>> >> >>>>> i
>> >> have
>> >> >>>>> bundled all jars(hadoop, hbase etc) into single jar which
>> >> >>>>> increases
>> >> the
>> >> >>>>> size
>> >> >>>>> of overall file. During the development process, i need to copy
>> >> >>>>> again
>> >> >> and
>> >> >>>>> again this complete file which is very time consuming so is there
>> >> >>>>> any
>> >> >> way
>> >> >>>>> that i just copy the program jar only and do not need to copy the
>> >> >>>>> lib
>> >> >>>>> files
>> >> >>>>> again and again. i am using net beans to develop the program.
>> >> >>>>>
>> >> >>>>> kindly let me know how to solve this issue?
>> >> >>>>>
>> >> >>>>
>> >> >>>>       This was in the FAQ, but in a non-obvious place.  I've
>> >> >>>> updated
>> >> it
>> >> >>>> to be more visible (hopefully):
>> >> >>>>
>> >> >>>>
>> >> >>>>
>> >> >>
>> >>
>> >> http://wiki.apache.org/hadoop/FAQ#How_do_I_submit_extra_content_.28jars.2C_static_files.2C_etc.29_for_my_job_to_use_during_runtime.3F
>> >> >>>>
>> >> >>>
>> >> >>> Does the same apply to jar containing libraries? Let's suppose I
>> >> >>> need
>> >> >>> lucene-core.jar to run my project. Can I put my this jar into my
>> >> >>> job
>> >> jar
>> >> >> and
>> >> >>> have hadoop "see" lucene's classes? Or should I use distributed
>> >> >>> cache??
>> >> >>>
>> >> >>> MD
>> >> >>>
>> >> >>>
>> >> >>
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > Regards
>> >> > Shuja-ur-Rehman Baig
>> >> > <http://pk.linkedin.com/in/shujamughal>
>> >>
>> >>
>> >
>> >
>> > --
>> > Regards
>> > Shuja-ur-Rehman Baig
>> > <http://pk.linkedin.com/in/shujamughal>
>> >
>
>
>
> --
> Regards
> Shuja-ur-Rehman Baig
>
>
>
Reply | Threaded
Open this post in threaded view
|

RE: Including Additional Jars

Guy Doulberg
Or to set the Main class in the manifest of the Jar,



-----Original Message-----
From: Bill Graham [mailto:[hidden email]]
Sent: Wednesday, April 06, 2011 11:17 PM
To: Shuja Rehman
Cc: [hidden email]
Subject: Re: Including Additional Jars

You need to pass the mainClass after the jar:

http://hadoop.apache.org/common/docs/r0.21.0/commands_manual.html#jar

On Wed, Apr 6, 2011 at 11:31 AM, Shuja Rehman <[hidden email]> wrote:

> i am using the following command
>
> hadoop jar myjar.jar -libjars /home/shuja/lib/mylib.jar  param1 param2
> param3
>
> but the program still giving the error and does not find the mylib.jar. can
> u confirm the syntax of command?
> thnx
>
>
>
> On Wed, Apr 6, 2011 at 8:29 PM, Bill Graham <[hidden email]> wrote:
>>
>> If you could share more specifics regarding just how it's not working
>> (i.e., job specifics, stack traces, how you're invoking it, etc), you
>> might get more assistance in troubleshooting.
>>
>>
>> On Wed, Apr 6, 2011 at 1:44 AM, Shuja Rehman <[hidden email]>
>> wrote:
>> > -libjars is not working nor distributed cache, any other
>> > solution??????????????????????????????????????????
>> >
>> > On Mon, Apr 4, 2011 at 11:40 PM, James Seigel <[hidden email]> wrote:
>> >
>> >> James’ quick and dirty, get your job running guideline:
>> >>
>> >> -libjars <-- for jars you want accessible by the mappers and reducers
>> >> classpath or bundled in the main jar <-- for jars you want accessible
>> >> to
>> >> the runner
>> >>
>> >> Cheers
>> >> James.
>> >>
>> >>
>> >>
>> >> On 2011-04-04, at 12:31 PM, Shuja Rehman wrote:
>> >>
>> >> > well...i think to put in distributed cache is good idea. do u have
>> >> > any
>> >> > working example how to put extra jars in distributed cache and how to
>> >> make
>> >> > available these jars for job?
>> >> > Thanks
>> >> >
>> >> > On Mon, Apr 4, 2011 at 10:20 PM, Mark Kerzner <[hidden email]>
>> >> wrote:
>> >> >
>> >> >> I think you can put them either in your jar or in distributed cache.
>> >> >>
>> >> >> As Allen pointed out, my idea of putting them into hadoop lib jar
>> >> >> was
>> >> >> wrong.
>> >> >>
>> >> >> Mark
>> >> >>
>> >> >> On Mon, Apr 4, 2011 at 12:16 PM, Marco Didonna
>> >> >> <[hidden email]
>> >> >>> wrote:
>> >> >>
>> >> >>> On 04/04/2011 07:06 PM, Allen Wittenauer wrote:
>> >> >>>
>> >> >>>>
>> >> >>>> On Apr 4, 2011, at 8:06 AM, Shuja Rehman wrote:
>> >> >>>>
>> >> >>>> Hi All
>> >> >>>>>
>> >> >>>>> I have created a map reduce job and to run on it on the cluster,
>> >> >>>>> i
>> >> have
>> >> >>>>> bundled all jars(hadoop, hbase etc) into single jar which
>> >> >>>>> increases
>> >> the
>> >> >>>>> size
>> >> >>>>> of overall file. During the development process, i need to copy
>> >> >>>>> again
>> >> >> and
>> >> >>>>> again this complete file which is very time consuming so is there
>> >> >>>>> any
>> >> >> way
>> >> >>>>> that i just copy the program jar only and do not need to copy the
>> >> >>>>> lib
>> >> >>>>> files
>> >> >>>>> again and again. i am using net beans to develop the program.
>> >> >>>>>
>> >> >>>>> kindly let me know how to solve this issue?
>> >> >>>>>
>> >> >>>>
>> >> >>>>       This was in the FAQ, but in a non-obvious place.  I've
>> >> >>>> updated
>> >> it
>> >> >>>> to be more visible (hopefully):
>> >> >>>>
>> >> >>>>
>> >> >>>>
>> >> >>
>> >>
>> >> http://wiki.apache.org/hadoop/FAQ#How_do_I_submit_extra_content_.28jars.2C_static_files.2C_etc.29_for_my_job_to_use_during_runtime.3F
>> >> >>>>
>> >> >>>
>> >> >>> Does the same apply to jar containing libraries? Let's suppose I
>> >> >>> need
>> >> >>> lucene-core.jar to run my project. Can I put my this jar into my
>> >> >>> job
>> >> jar
>> >> >> and
>> >> >>> have hadoop "see" lucene's classes? Or should I use distributed
>> >> >>> cache??
>> >> >>>
>> >> >>> MD
>> >> >>>
>> >> >>>
>> >> >>
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > Regards
>> >> > Shuja-ur-Rehman Baig
>> >> > <http://pk.linkedin.com/in/shujamughal>
>> >>
>> >>
>> >
>> >
>> > --
>> > Regards
>> > Shuja-ur-Rehman Baig
>> > <http://pk.linkedin.com/in/shujamughal>
>> >
>
>
>
> --
> Regards
> Shuja-ur-Rehman Baig
>
>
>