Hadoop All data corrupted

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Hadoop All data corrupted

pranav.puri

Hi

I have a accumulo cluster setup over a hadoop cluster(ver.- 2.9.0). Since I had to make changes into the accumulo config files, I had to shutdown hadoop multiple times.

After some changes the hadoop started showing this when FSCK command is run:

Total size:    986079103985 B (Total open files size: 372 B)
 Total dirs:    2011
 Total files:    15530
 Total symlinks:        0 (Files currently being written: 4)
 Total blocks (validated):    19663 (avg. block size 50148965 B) (Total open file blocks (not validated): 4)
  ********************************
  UNDER MIN REPL'D BLOCKS:    19663 (100.0 %)
  dfs.namenode.replication.min:    1
  CORRUPT FILES:    15310
  MISSING BLOCKS:    19663
  MISSING SIZE:        986079103985 B
  CORRUPT BLOCKS:     19663

All the changes were done in the accumulo config files, while no hadoop config files were changed. What are the troubleshooting steps.

Regards
Pranav



Reply | Threaded
Open this post in threaded view
|

Re: Hadoop All data corrupted

karthik p

Pranav,

Can you do the following checklist before confirming the missing/corrupt blocks?


Missing Block: Mark missing if all of the block replicas of that file is not reported to Namenode.

Corrupt Block: Mark corrupt if all of the block replicas of that file is corrupted (Or) none of them are reported to Namenode.


1. Check if all datanodes are running in the cluster

2. Check if you see dead datanodes

3. Check if disk failure from multiple datanode

4. Check if disk out of space from multiple datanode

5. Check if block report is rejected by namenode (It can be seen from namenode log as a warning/error)

6. Check if you changed any config groups

7. Check if block physically exists in local filesystem or removed by users unknowingly. Ex: "find <dfs.datanode.data.dir> -type f -iname <blkid>*". Repeat the same step in all datanodes

8. Check if too many blocks hosted in a single datanode

9. Check if block report fails with "exceeding max RPC size", default 64 MB. You can see this warning from namenode log "Protocol message was too large. May be malicious"

10. Check if mount point is unmounted because of filesystem failure

11. Check if block is written into root volume because of disk auto unmount. Data might be hidden if you remount the filesystem on top of existing datanode dir.


Thanks,

Karthik


On Mon, Mar 4, 2019 at 3:27 AM pranav.puri <[hidden email]> wrote:

Hi

I have a accumulo cluster setup over a hadoop cluster(ver.- 2.9.0). Since I had to make changes into the accumulo config files, I had to shutdown hadoop multiple times.

After some changes the hadoop started showing this when FSCK command is run:

Total size:    986079103985 B (Total open files size: 372 B)
 Total dirs:    2011
 Total files:    15530
 Total symlinks:        0 (Files currently being written: 4)
 Total blocks (validated):    19663 (avg. block size 50148965 B) (Total open file blocks (not validated): 4)
  ********************************
  UNDER MIN REPL'D BLOCKS:    19663 (100.0 %)
  dfs.namenode.replication.min:    1
  CORRUPT FILES:    15310
  MISSING BLOCKS:    19663
  MISSING SIZE:        986079103985 B
  CORRUPT BLOCKS:     19663

All the changes were done in the accumulo config files, while no hadoop config files were changed. What are the troubleshooting steps.

Regards
Pranav





--
Thank you,
Karthik Palanisamy
Bangalore, India
Mobile : +91 9940089181
Skype : karthik.p01