HDFS file replication to slave nodes not working

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

HDFS file replication to slave nodes not working

Bhushan Pathak
Hello,

I have hadoop 2.7.3 running on a 3-node cluster [1 master, 2 slaves]. The hdfs-site.xml file has the following config -
    <property>
        <name>dfs.namenode.name.dir</name>
        <value>file:/mnt/hadoop_store/datanode</value>
    </property>
    <property>
        <name>dfs.datanode.name.dir</name>
        <value>file:/mnt/hadoop_store/namenode</value>
    </property>

I used the 'hdfs -put' command to upload 3 csv files to HDFS, which was successful.

My assumption is that the 3 csv files should be present on all 3 nodes, either under the datanode or the namenode directory. On the master, I can see the following files -

[hadoop@master hadoop-2.7.3]$ bin/hdfs dfs -ls /usr/hadoop
Found 3 items
-rw-r--r--   3 hadoop supergroup     124619 2017-06-14 14:34 /usr/hadoop/Final_Album_file.csv
-rw-r--r--   3 hadoop supergroup      68742 2017-06-14 14:34 /usr/hadoop/Final_Artist_file.csv
-rw-r--r--   3 hadoop supergroup    2766110 2017-06-14 14:34 /usr/hadoop/Final_Tracks_file.csv
[hadoop@master hadoop-2.7.3]$ ls /mnt/hadoop_store/namenode/
[hadoop@master hadoop-2.7.3]$ ls /mnt/hadoop_store/datanode/
current  in_use.lock
[hadoop@master hadoop-2.7.3]$ ls /mnt/hadoop_store/datanode/current/
edits_0000000000000000001-0000000000000000002  edits_0000000000000000027-0000000000000000028  edits_0000000000000000055-0000000000000000056
edits_0000000000000000003-0000000000000000004  edits_0000000000000000029-0000000000000000030  edits_0000000000000000057-0000000000000000058
edits_0000000000000000005-0000000000000000006  edits_0000000000000000031-0000000000000000032  edits_0000000000000000059-0000000000000000060
edits_0000000000000000007-0000000000000000008  edits_0000000000000000033-0000000000000000034  edits_0000000000000000061-0000000000000000064
edits_0000000000000000009-0000000000000000010  edits_0000000000000000035-0000000000000000036  edits_0000000000000000065-0000000000000000096
edits_0000000000000000011-0000000000000000012  edits_0000000000000000037-0000000000000000038  edits_inprogress_0000000000000000097
edits_0000000000000000013-0000000000000000014  edits_0000000000000000039-0000000000000000040  fsimage_0000000000000000064
edits_0000000000000000015-0000000000000000016  edits_0000000000000000041-0000000000000000042  fsimage_0000000000000000064.md5
edits_0000000000000000017-0000000000000000017  edits_0000000000000000043-0000000000000000044  fsimage_0000000000000000096
edits_0000000000000000018-0000000000000000019  edits_0000000000000000045-0000000000000000046  fsimage_0000000000000000096.md5
edits_0000000000000000020-0000000000000000020  edits_0000000000000000047-0000000000000000048  seen_txid
edits_0000000000000000021-0000000000000000022  edits_0000000000000000049-0000000000000000050  VERSION
edits_0000000000000000023-0000000000000000024  edits_0000000000000000051-0000000000000000052
edits_0000000000000000025-0000000000000000026  edits_0000000000000000053-0000000000000000054
[hadoop@master hadoop-2.7.3]$

While on the 2 slave nodes, there are only empty directories. Is my assumption that the 3 csv files should be replicated to slave nodes as well correct? If yes, why are  they missing from the slave nodes? Additionally, are the files that I see in datanode/current directory of master the actual csv files that I have uploaded?


Thanks
Bhushan Pathak
Reply | Threaded
Open this post in threaded view
|

RE: HDFS file replication to slave nodes not working

Brahma Reddy Battula

 

 

Please see my comments inline.

 

 

 

Regards

Brahma Reddy Battula

 

From: Bhushan Pathak [mailto:[hidden email]]
Sent: 14 June 2017 17:14
To: [hidden email]
Subject: HDFS file replication to slave nodes not working

 

Hello,

 

I have hadoop 2.7.3 running on a 3-node cluster [1 master, 2 slaves]. The hdfs-site.xml file has the following config -

    <property>

        <name>dfs.namenode.name.dir</name>

        <value>file:/mnt/hadoop_store/datanode</value>

    </property>

    <property>

        <name>dfs.datanode.name.dir</name>

        <value>file:/mnt/hadoop_store/namenode</value>

</property>

 

=è property should be “dfs.datanode.data.dir”. Please have a look following for all default configurations.

http://hadoop.apache.org/docs/r2.7.3/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml

 

I used the 'hdfs -put' command to upload 3 csv files to HDFS, which was successful.

 

My assumption is that the 3 csv files should be present on all 3 nodes, either under the datanode or the namenode directory. On the master, I can see the following files -

 

[hadoop@master hadoop-2.7.3]$ bin/hdfs dfs -ls /usr/hadoop

Found 3 items

-rw-r--r--   3 hadoop supergroup     124619 2017-06-14 14:34 /usr/hadoop/Final_Album_file.csv

-rw-r--r--   3 hadoop supergroup      68742 2017-06-14 14:34 /usr/hadoop/Final_Artist_file.csv

-rw-r--r--   3 hadoop supergroup    2766110 2017-06-14 14:34 /usr/hadoop/Final_Tracks_file.csv

[hadoop@master hadoop-2.7.3]$ ls /mnt/hadoop_store/namenode/

[hadoop@master hadoop-2.7.3]$ ls /mnt/hadoop_store/datanode/

current  in_use.lock

[hadoop@master hadoop-2.7.3]$ ls /mnt/hadoop_store/datanode/current/

edits_0000000000000000001-0000000000000000002  edits_0000000000000000027-0000000000000000028  edits_0000000000000000055-0000000000000000056

edits_0000000000000000003-0000000000000000004  edits_0000000000000000029-0000000000000000030  edits_0000000000000000057-0000000000000000058

edits_0000000000000000005-0000000000000000006  edits_0000000000000000031-0000000000000000032  edits_0000000000000000059-0000000000000000060

edits_0000000000000000007-0000000000000000008  edits_0000000000000000033-0000000000000000034  edits_0000000000000000061-0000000000000000064

edits_0000000000000000009-0000000000000000010  edits_0000000000000000035-0000000000000000036  edits_0000000000000000065-0000000000000000096

edits_0000000000000000011-0000000000000000012  edits_0000000000000000037-0000000000000000038  edits_inprogress_0000000000000000097

edits_0000000000000000013-0000000000000000014  edits_0000000000000000039-0000000000000000040  fsimage_0000000000000000064

edits_0000000000000000015-0000000000000000016  edits_0000000000000000041-0000000000000000042  fsimage_0000000000000000064.md5

edits_0000000000000000017-0000000000000000017  edits_0000000000000000043-0000000000000000044  fsimage_0000000000000000096

edits_0000000000000000018-0000000000000000019  edits_0000000000000000045-0000000000000000046  fsimage_0000000000000000096.md5

edits_0000000000000000020-0000000000000000020  edits_0000000000000000047-0000000000000000048  seen_txid

edits_0000000000000000021-0000000000000000022  edits_0000000000000000049-0000000000000000050  VERSION

edits_0000000000000000023-0000000000000000024  edits_0000000000000000051-0000000000000000052

edits_0000000000000000025-0000000000000000026  edits_0000000000000000053-0000000000000000054

[hadoop@master hadoop-2.7.3]$

 

While on the 2 slave nodes, there are only empty directories. Is my assumption that the 3 csv files should be replicated to slave nodes as well correct? If yes, why are  they missing from the slave nodes? Additionally, are the files that I see in datanode/current directory of master the actual csv files that I have uploaded?

 

 

Yes, it will replicate to 3 nodes.(it’s based on dfs.replicationwhich is “3” by default)

The location which you are checking is wrong, since property is wrong..and by default it will stored under “/tmp/Hadoop-${user-name}}.

Data under “datanode/current directory” is meta data for all operations.

 

Please go through the following design to know more about HDFS.

 

Thanks

Bhushan Pathak

Reply | Threaded
Open this post in threaded view
|

Re: HDFS file replication to slave nodes not working

Bhushan Pathak
Any way I can tell hadoop to use the /mnt dir instead of /tmp/hadoop-{user-name} directory to store the files?

Thanks
Bhushan Pathak

Thanks
Bhushan Pathak

On Wed, Jun 14, 2017 at 3:06 PM, Brahma Reddy Battula <[hidden email]> wrote:

 

 

Please see my comments inline.

 

 

 

Regards

Brahma Reddy Battula

 

From: Bhushan Pathak [mailto:[hidden email]]
Sent: 14 June 2017 17:14
To: [hidden email]
Subject: HDFS file replication to slave nodes not working

 

Hello,

 

I have hadoop 2.7.3 running on a 3-node cluster [1 master, 2 slaves]. The hdfs-site.xml file has the following config -

    <property>

        <name>dfs.namenode.name.dir</name>

        <value>file:/mnt/hadoop_store/datanode</value>

    </property>

    <property>

        <name>dfs.datanode.name.dir</name>

        <value>file:/mnt/hadoop_store/namenode</value>

</property>

 

=è property should be “dfs.datanode.data.dir”. Please have a look following for all default configurations.

http://hadoop.apache.org/docs/r2.7.3/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml

 

I used the 'hdfs -put' command to upload 3 csv files to HDFS, which was successful.

 

My assumption is that the 3 csv files should be present on all 3 nodes, either under the datanode or the namenode directory. On the master, I can see the following files -

 

[hadoop@master hadoop-2.7.3]$ bin/hdfs dfs -ls /usr/hadoop

Found 3 items

-rw-r--r--   3 hadoop supergroup     124619 2017-06-14 14:34 /usr/hadoop/Final_Album_file.csv

-rw-r--r--   3 hadoop supergroup      68742 2017-06-14 14:34 /usr/hadoop/Final_Artist_file.csv

-rw-r--r--   3 hadoop supergroup    2766110 2017-06-14 14:34 /usr/hadoop/Final_Tracks_file.csv

[hadoop@master hadoop-2.7.3]$ ls /mnt/hadoop_store/namenode/

[hadoop@master hadoop-2.7.3]$ ls /mnt/hadoop_store/datanode/

current  in_use.lock

[hadoop@master hadoop-2.7.3]$ ls /mnt/hadoop_store/datanode/current/

edits_0000000000000000001-0000000000000000002  edits_0000000000000000027-0000000000000000028  edits_0000000000000000055-0000000000000000056

edits_0000000000000000003-0000000000000000004  edits_0000000000000000029-0000000000000000030  edits_0000000000000000057-0000000000000000058

edits_0000000000000000005-0000000000000000006  edits_0000000000000000031-0000000000000000032  edits_0000000000000000059-0000000000000000060

edits_0000000000000000007-0000000000000000008  edits_0000000000000000033-0000000000000000034  edits_0000000000000000061-0000000000000000064

edits_0000000000000000009-0000000000000000010  edits_0000000000000000035-0000000000000000036  edits_0000000000000000065-0000000000000000096

edits_0000000000000000011-0000000000000000012  edits_0000000000000000037-0000000000000000038  edits_inprogress_0000000000000000097

edits_0000000000000000013-0000000000000000014  edits_0000000000000000039-0000000000000000040  fsimage_0000000000000000064

edits_0000000000000000015-0000000000000000016  edits_0000000000000000041-0000000000000000042  fsimage_0000000000000000064.md5

edits_0000000000000000017-0000000000000000017  edits_0000000000000000043-0000000000000000044  fsimage_0000000000000000096

edits_0000000000000000018-0000000000000000019  edits_0000000000000000045-0000000000000000046  fsimage_0000000000000000096.md5

edits_0000000000000000020-0000000000000000020  edits_0000000000000000047-0000000000000000048  seen_txid

edits_0000000000000000021-0000000000000000022  edits_0000000000000000049-0000000000000000050  VERSION

edits_0000000000000000023-0000000000000000024  edits_0000000000000000051-0000000000000000052

edits_0000000000000000025-0000000000000000026  edits_0000000000000000053-0000000000000000054

[hadoop@master hadoop-2.7.3]$

 

While on the 2 slave nodes, there are only empty directories. Is my assumption that the 3 csv files should be replicated to slave nodes as well correct? If yes, why are  they missing from the slave nodes? Additionally, are the files that I see in datanode/current directory of master the actual csv files that I have uploaded?

 

 

Yes, it will replicate to 3 nodes.(it’s based on dfs.replicationwhich is “3” by default)

The location which you are checking is wrong, since property is wrong..and by default it will stored under “/tmp/Hadoop-${user-name}}.

Data under “datanode/current directory” is meta data for all operations.

 

Please go through the following design to know more about HDFS.

 

Thanks

Bhushan Pathak


Reply | Threaded
Open this post in threaded view
|

RE: HDFS file replication to slave nodes not working

Brahma Reddy Battula

 

Yes, you can configure “dfs.datanode.data.dir”.

 

Reference for default configurations:

http://hadoop.apache.org/docs/r2.7.3/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml

 

 

Regards

Brahma Reddy Battula

 

From: Bhushan Pathak [mailto:[hidden email]]
Sent: 14 June 2017 17:47
To: Brahma Reddy Battula
Cc: [hidden email]
Subject: Re: HDFS file replication to slave nodes not working

 

Any way I can tell hadoop to use the /mnt dir instead of /tmp/hadoop-{user-name} directory to store the files?

 

Thanks

Bhushan Pathak


Thanks

Bhushan Pathak

 

On Wed, Jun 14, 2017 at 3:06 PM, Brahma Reddy Battula <[hidden email]> wrote:

 

 

Please see my comments inline.

 

 

 

Regards

Brahma Reddy Battula

 

From: Bhushan Pathak [mailto:[hidden email]]
Sent: 14 June 2017 17:14
To: [hidden email]
Subject: HDFS file replication to slave nodes not working

 

Hello,

 

I have hadoop 2.7.3 running on a 3-node cluster [1 master, 2 slaves]. The hdfs-site.xml file has the following config -

    <property>

        <name>dfs.namenode.name.dir</name>

        <value>file:/mnt/hadoop_store/datanode</value>

    </property>

    <property>

        <name>dfs.datanode.name.dir</name>

        <value>file:/mnt/hadoop_store/namenode</value>

</property>

 

=è property should be “dfs.datanode.data.dir”. Please have a look following for all default configurations.

http://hadoop.apache.org/docs/r2.7.3/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml

 

I used the 'hdfs -put' command to upload 3 csv files to HDFS, which was successful.

 

My assumption is that the 3 csv files should be present on all 3 nodes, either under the datanode or the namenode directory. On the master, I can see the following files -

 

[hadoop@master hadoop-2.7.3]$ bin/hdfs dfs -ls /usr/hadoop

Found 3 items

-rw-r--r--   3 hadoop supergroup     124619 2017-06-14 14:34 /usr/hadoop/Final_Album_file.csv

-rw-r--r--   3 hadoop supergroup      68742 2017-06-14 14:34 /usr/hadoop/Final_Artist_file.csv

-rw-r--r--   3 hadoop supergroup    2766110 2017-06-14 14:34 /usr/hadoop/Final_Tracks_file.csv

[hadoop@master hadoop-2.7.3]$ ls /mnt/hadoop_store/namenode/

[hadoop@master hadoop-2.7.3]$ ls /mnt/hadoop_store/datanode/

current  in_use.lock

[hadoop@master hadoop-2.7.3]$ ls /mnt/hadoop_store/datanode/current/

edits_0000000000000000001-0000000000000000002  edits_0000000000000000027-0000000000000000028  edits_0000000000000000055-0000000000000000056

edits_0000000000000000003-0000000000000000004  edits_0000000000000000029-0000000000000000030  edits_0000000000000000057-0000000000000000058

edits_0000000000000000005-0000000000000000006  edits_0000000000000000031-0000000000000000032  edits_0000000000000000059-0000000000000000060

edits_0000000000000000007-0000000000000000008  edits_0000000000000000033-0000000000000000034  edits_0000000000000000061-0000000000000000064

edits_0000000000000000009-0000000000000000010  edits_0000000000000000035-0000000000000000036  edits_0000000000000000065-0000000000000000096

edits_0000000000000000011-0000000000000000012  edits_0000000000000000037-0000000000000000038  edits_inprogress_0000000000000000097

edits_0000000000000000013-0000000000000000014  edits_0000000000000000039-0000000000000000040  fsimage_0000000000000000064

edits_0000000000000000015-0000000000000000016  edits_0000000000000000041-0000000000000000042  fsimage_0000000000000000064.md5

edits_0000000000000000017-0000000000000000017  edits_0000000000000000043-0000000000000000044  fsimage_0000000000000000096

edits_0000000000000000018-0000000000000000019  edits_0000000000000000045-0000000000000000046  fsimage_0000000000000000096.md5

edits_0000000000000000020-0000000000000000020  edits_0000000000000000047-0000000000000000048  seen_txid

edits_0000000000000000021-0000000000000000022  edits_0000000000000000049-0000000000000000050  VERSION

edits_0000000000000000023-0000000000000000024  edits_0000000000000000051-0000000000000000052

edits_0000000000000000025-0000000000000000026  edits_0000000000000000053-0000000000000000054

[hadoop@master hadoop-2.7.3]$

 

While on the 2 slave nodes, there are only empty directories. Is my assumption that the 3 csv files should be replicated to slave nodes as well correct? If yes, why are  they missing from the slave nodes? Additionally, are the files that I see in datanode/current directory of master the actual csv files that I have uploaded?

 

 

Yes, it will replicate to 3 nodes.(it’s based on dfs.replication” which is “3” by default)

The location which you are checking is wrong, since property is wrong..and by default it will stored under “/tmp/Hadoop-${user-name}}.

Data under “datanode/current directory” is meta data for all operations.

 

Please go through the following design to know more about HDFS.

 

Thanks

Bhushan Pathak