Files vs blocks

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Files vs blocks

Sudhir Babu Pothineni

One of Hadoop cluster I am working

85,985,789 files and directories, 58,399,919 blocks = 144,385,717 total file system objects

Heap memory used 132.0 GB of 256 GB Heap Memory.

I feel it’s odd the ratio of files vs blocks way higher showing more of small files problem,

But the cluster working fine. Am I worrying unnecessarily? we are using Hadoop 2.6.0

Thanks
Sudhir
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Files vs blocks

Ramdas Singh
Hi Sudhir,

According to my calculations based on the number of block (144,385,717) comes out close to 132 GB of heap memory. I think you are doing fine.

Thanks,

Ramdas

On Tue, Jan 29, 2019 at 5:09 PM Sudhir Babu Pothineni <[hidden email]> wrote:

One of Hadoop cluster I am working

85,985,789 files and directories, 58,399,919 blocks = 144,385,717 total file system objects

Heap memory used 132.0 GB of 256 GB Heap Memory.

I feel it’s odd the ratio of files vs blocks way higher showing more of small files problem,

But the cluster working fine. Am I worrying unnecessarily? we are using Hadoop 2.6.0

Thanks
Sudhir
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Files vs blocks

Wei-Chiu Chuang-3
In reply to this post by Sudhir Babu Pothineni
I don't feel this is strictly a small file issue (since I am not seeing the average file size)
But it looks like your directory/file ratio is way too low. I've seen that when Hive creates too many partitions. That can render Hive queries inefficient.

On Tue, Jan 29, 2019 at 2:09 PM Sudhir Babu Pothineni <[hidden email]> wrote:

One of Hadoop cluster I am working

85,985,789 files and directories, 58,399,919 blocks = 144,385,717 total file system objects

Heap memory used 132.0 GB of 256 GB Heap Memory.

I feel it’s odd the ratio of files vs blocks way higher showing more of small files problem,

But the cluster working fine. Am I worrying unnecessarily? we are using Hadoop 2.6.0

Thanks
Sudhir
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Files vs blocks

Ramdas Singh
As a rule of thumb for sizing purposes, we should have 1000 MB memory for one million blocks.

Thanks,

Ramdas


On Tue, Jan 29, 2019 at 5:53 PM Wei-Chiu Chuang <[hidden email]> wrote:
I don't feel this is strictly a small file issue (since I am not seeing the average file size)
But it looks like your directory/file ratio is way too low. I've seen that when Hive creates too many partitions. That can render Hive queries inefficient.

On Tue, Jan 29, 2019 at 2:09 PM Sudhir Babu Pothineni <[hidden email]> wrote:

One of Hadoop cluster I am working

85,985,789 files and directories, 58,399,919 blocks = 144,385,717 total file system objects

Heap memory used 132.0 GB of 256 GB Heap Memory.

I feel it’s odd the ratio of files vs blocks way higher showing more of small files problem,

But the cluster working fine. Am I worrying unnecessarily? we are using Hadoop 2.6.0

Thanks
Sudhir
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Files vs blocks

Sudhir Babu Pothineni
Thanks Ramdas and Wei,  memory is fine, my only worry about ratio of of files-directories vs 
Blocks as Wei-Chou mentioned. I will work on this, it’s over partitioned.

On Jan 29, 2019, at 5:02 PM, Ramdas Singh <[hidden email]> wrote:

As a rule of thumb for sizing purposes, we should have 1000 MB memory for one million blocks.

Thanks,

Ramdas


On Tue, Jan 29, 2019 at 5:53 PM Wei-Chiu Chuang <[hidden email]> wrote:
I don't feel this is strictly a small file issue (since I am not seeing the average file size)
But it looks like your directory/file ratio is way too low. I've seen that when Hive creates too many partitions. That can render Hive queries inefficient.

On Tue, Jan 29, 2019 at 2:09 PM Sudhir Babu Pothineni <[hidden email]> wrote:

One of Hadoop cluster I am working

85,985,789 files and directories, 58,399,919 blocks = 144,385,717 total file system objects

Heap memory used 132.0 GB of 256 GB Heap Memory.

I feel it’s odd the ratio of files vs blocks way higher showing more of small files problem,

But the cluster working fine. Am I worrying unnecessarily? we are using Hadoop 2.6.0

Thanks
Sudhir
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]