New to hadoop

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

New to hadoop

luca paganotti
Hi all, I'm absolutely new to hadoop and trying to learn something about it. I'm following and reading this book: "Big Data Analytics With Hadoop 3". I'm at the very begining.
I'm able to start and stop dfs and yarn via shell scripts (start/stop _dfs.sh and start/stop _yarn.sh).
The book takes as a reference 
Unfortunately this version is not available anymore and I downloaded 3.2.0.
Now trying to setup correctly YARN Timeline service v2.0 I've managed to install and start an HBase cluster as suggested downloading Hbase 1.2.10 from http://mirror.cogentco.com/pub/apache/hbase/1.2.10/
HBase is up and running.
The next step should be "Enabling co-processor", these are the substeps involved
  1. setup a co-processor location in HDFS
    1. hadoop fs -mkdir /hbase/coprocessor
    2. hadoop fs -put hadoop-yarn-server-timelineservice-hbase-3.0.0-alpha1-SNAPSHOT.jar/hbase/coprocessor/hadoop-yarn-server-timelineservice.jar
But this command is failing, I'm not able to locate the right jar as in $HADOOP_HOME/share/hadoop/yarn/timelineservice I find different files:
hadoop-yarn-server-timelineservice-3.2.0.jar
hadoop-yarn-server-timelineservice-hbase-client-3.2.0.jar
hadoop-yarn-server-timelineservice-hbase-common-3.2.0.jar
hadoop-yarn-server-timelineservice-hbase-coprocessor-3.2.0.jar

which one I should use?

The apache hadoop online documention at https://hadoop.apache.org/docs/r3.2.0/hadoop-yarn/hadoop-yarn-site/TimelineServiceV2.html says something similar mentioning the hadoop-yarn-server-timelineservice-hbase-coprocessor-3.2.0-SNAPSHOT.jar file, but again I'm not able to find it. Why the SNAPSHOT suffix? It's not present in my hadoop distribution.

More, I need to correctly setup the HADOOP_CLASSPATH environment variable.
The book tells me to assign to it the path to the lib folder in HBase distribution with $HADOOP_HOME/sharehadoop/yarn/timelineservice folder.

I'm not sure which is the cause of the error I get issuing this command:

hadoop org.apache.hadoop.yarn.server.timelineservice.storage.TimelineSchemaCreator -create -skipExistingTable

I get:
$HADOOP_HOME/libexec/hadoop-functions.sh: riga 2364: HADOOP_ORG.APACHE.HADOOP.YARN.SERVER.TIMELINESERVICE.STORAGE.TIMELINESCHEMACREATOR_USER: sostituzione errata
$HADOOP_HOME/libexec/hadoop-functions.sh: riga 2459: HADOOP_ORG.APACHE.HADOOP.YARN.SERVER.TIMELINESERVICE.STORAGE.TIMELINESCHEMACREATOR_OPTS: sostituzione errata
Errore: impossibile trovare o caricare la classe principale org.apache.hadoop.yarn.server.timelineservice.storage.TimelineSchemaCreator

is it because my classpath is not complete? or is it because I've not the right jars? or both?

My HADOOP_HOME envaronment variable is set to the folder where I extracted all the hadoop files.

I'm sorry for this very long text, I'm trying to be as clear as possible and in the meantime writing in a language I do not know very well.

Thanks for any answer.

-- lp