streaming data from HDFS outside of hadoop

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

streaming data from HDFS outside of hadoop

Stephane Brossier
I am trying to stream data from HDFS on a workstation outside of hadoop.
I have a small method to initialize the DistributedFileSystem and i  
pass the
IP and port of the namenode, but that fails with the following stack.

Note that:
. I tried to telnet to that ip/port and the connection works well.
. The namenode is working well, i can access it through my browser
. Hadoop is up and running, i can run MR jobs.

Am i missing something in the code below? What can be wrong?

Thanks,

S.


---------
        private void initHadoop(String ip, int port) {
           Configuration mConf  = new Configuration();

        URI mUri =   URI.create("hdfs://" + ip + ":" + port);
        mHDFS = new DistributedFileSystem();

        try {
            mHDFS.initialize(mUri, mConf);
        } catch (IOException ioe) {
        ioe.printStackTrace();
            log.error("Failed to initialize Hadoop (Namenode) " +  
ioe.getMessage());
        }
        log.info("Initialized HDFS");
        }

-----------------

2633 [main] DEBUG org.apache.hadoop.security.UserGroupInformation  -  
Unix Login: stephane,staff,com.apple.sharepoint.group.
1,_lpadmin,_appserveradm,com.apple.sharepoint.group.
2,_appserverusr,admin
2662 [main] DEBUG org.apache.hadoop.ipc.Client  - The ping interval  
is60000ms.
2792 [main] DEBUG org.apache.hadoop.ipc.Client  - Connecting to /
10.15.38.76:50070
2889 [main] DEBUG org.apache.hadoop.ipc.Client  - IPC Client (47)  
connection to /10.15.38.76:50070 from stephane sending #0
2891 [IPC Client (47) connection to /10.15.38.76:50070 from stephane]  
DEBUG org.apache.hadoop.ipc.Client  - IPC Client (47) connection to /
10.15.38.76:50070 from stephane: starting, having connections 1
2906 [IPC Client (47) connection to /10.15.38.76:50070 from stephane]  
DEBUG org.apache.hadoop.ipc.Client  - closing ipc connection to /
10.15.38.76:50070: null
java.io.EOFException
        at java.io.DataInputStream.readInt(DataInputStream.java:375)
        at org.apache.hadoop.ipc.Client
$Connection.receiveResponse(Client.java:493)
        at org.apache.hadoop.ipc.Client$Connection.run(Client.java:438)
2908 [IPC Client (47) connection to /10.15.38.76:50070 from stephane]  
DEBUG org.apache.hadoop.ipc.Client  - IPC Client (47) connection to /
10.15.38.76:50070 from stephane: closed
2908 [IPC Client (47) connection to /10.15.38.76:50070 from stephane]  
DEBUG org.apache.hadoop.ipc.Client  - IPC Client (47) connection to /
10.15.38.76:50070 from stephane: stopped, remaining connections 0
java.io.IOException: Call to /10.15.38.76:50070 failed on local  
exception: null
        at org.apache.hadoop.ipc.Client.call(Client.java:699)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
        at $Proxy0.getProtocolVersion(Unknown Source)
        at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:319)
        at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:
104)
        at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:177)
        at  
org
.apache
.hadoop
.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:74)
        at com.ning.viking.tools.VisitReader.initHadoop(VisitReader.java:44)
        at com.ning.viking.tools.VisitReader.<init>(VisitReader.java:33)
        at com.ning.viking.tools.VisitReader.main(VisitReader.java:127)
Caused by: java.io.EOFException
        at java.io.DataInputStream.readInt(DataInputStream.java:375)
        at org.apache.hadoop.ipc.Client
$Connection.receiveResponse(Client.java:493)
        at org.apache.hadoop.ipc.Client$Connection.run(Client.java:438)
Reply | Threaded
Open this post in threaded view
|

Re: streaming data from HDFS outside of hadoop

Aaron Kimball
You shouldn't directly instantiate and intialize FileSystem implementations;
there's a factory method you should use.

Do instead:

private void initHadoop(String ip, int port) throws IOException {
Configuration conf = new Configuration();
String fsUri = "hdfs://" + ip + ":" + port;
conf.set("fs.default.name", fsUri); // magic config string to indicate what
FS to use.
mHDFS = FileSystem.get(conf);
}

Cheers,
- Aaron

On Tue, Oct 20, 2009 at 5:55 PM, Stephane Brossier <
[hidden email]> wrote:

> I am trying to stream data from HDFS on a workstation outside of hadoop.
> I have a small method to initialize the DistributedFileSystem and i pass
> the
> IP and port of the namenode, but that fails with the following stack.
>
> Note that:
> . I tried to telnet to that ip/port and the connection works well.
> . The namenode is working well, i can access it through my browser
> . Hadoop is up and running, i can run MR jobs.
>
> Am i missing something in the code below? What can be wrong?
>
> Thanks,
>
> S.
>
>
> ---------
>        private void initHadoop(String ip, int port) {
>           Configuration mConf  = new Configuration();
>
>       URI mUri =   URI.create("hdfs://" + ip + ":" + port);
>       mHDFS = new DistributedFileSystem();
>
>       try {
>           mHDFS.initialize(mUri, mConf);
>       } catch (IOException ioe) {
>           ioe.printStackTrace();
>           log.error("Failed to initialize Hadoop (Namenode) " +
> ioe.getMessage());
>       }
>       log.info("Initialized HDFS");
>        }
>
> -----------------
>
> 2633 [main] DEBUG org.apache.hadoop.security.UserGroupInformation  - Unix
> Login:
> stephane,staff,com.apple.sharepoint.group.1,_lpadmin,_appserveradm,com.apple.sharepoint.group.2,_appserverusr,admin
> 2662 [main] DEBUG org.apache.hadoop.ipc.Client  - The ping interval
> is60000ms.
> 2792 [main] DEBUG org.apache.hadoop.ipc.Client  - Connecting to /
> 10.15.38.76:50070
> 2889 [main] DEBUG org.apache.hadoop.ipc.Client  - IPC Client (47)
> connection to /10.15.38.76:50070 from stephane sending #0
> 2891 [IPC Client (47) connection to /10.15.38.76:50070 from stephane]
> DEBUG org.apache.hadoop.ipc.Client  - IPC Client (47) connection to /
> 10.15.38.76:50070 from stephane: starting, having connections 1
> 2906 [IPC Client (47) connection to /10.15.38.76:50070 from stephane]
> DEBUG org.apache.hadoop.ipc.Client  - closing ipc connection to /
> 10.15.38.76:50070: null
> java.io.EOFException
>        at java.io.DataInputStream.readInt(DataInputStream.java:375)
>        at
> org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:493)
>        at org.apache.hadoop.ipc.Client$Connection.run(Client.java:438)
> 2908 [IPC Client (47) connection to /10.15.38.76:50070 from stephane]
> DEBUG org.apache.hadoop.ipc.Client  - IPC Client (47) connection to /
> 10.15.38.76:50070 from stephane: closed
> 2908 [IPC Client (47) connection to /10.15.38.76:50070 from stephane]
> DEBUG org.apache.hadoop.ipc.Client  - IPC Client (47) connection to /
> 10.15.38.76:50070 from stephane: stopped, remaining connections 0
> java.io.IOException: Call to /10.15.38.76:50070 failed on local exception:
> null
>        at org.apache.hadoop.ipc.Client.call(Client.java:699)
>        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
>        at $Proxy0.getProtocolVersion(Unknown Source)
>        at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:319)
>        at
> org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:104)
>        at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:177)
>        at
> org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:74)
>        at com.ning.viking.tools.VisitReader.initHadoop(VisitReader.java:44)
>        at com.ning.viking.tools.VisitReader.<init>(VisitReader.java:33)
>        at com.ning.viking.tools.VisitReader.main(VisitReader.java:127)
> Caused by: java.io.EOFException
>        at java.io.DataInputStream.readInt(DataInputStream.java:375)
>        at
> org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:493)
>        at org.apache.hadoop.ipc.Client$Connection.run(Client.java:438)