How to estimate hardware needed for a Hadoop CLuster

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

How to estimate hardware needed for a Hadoop CLuster

Amine Tengilimoglu
Hi all;

   I want to learn how can i estimate the hardware nedeed for hadoop cluster. is there any standart or other things?

  for example I have 10TB data, and i will analiyze it... My replication factor will be 2.  

   How much  ram do i need for one node? how can I estimate it?
   How much disk do i need for one node ? how can I estimate it?
   How many  core - CPU do i need for one node?


thanks in advance..
    
Reply | Threaded
Open this post in threaded view
|

Re: How to estimate hardware needed for a Hadoop CLuster

Or Raz

That's a tricky question, and it depends mostly on how do you plan to use Hadoop, more specifically what is your use case (for example, word count).
The answer should be divided for storage (disk) and computation limits (disk, CPU, and memory).
1. Disk-  if you are using the default File System and with default block size then you would need 20 TB space to store the input and there will be 2*(10000000/128) = 156,250 blocks.
Afterward, it depends on your output size for the Map function (which will be deleted at the end of shuffle) and the Reduce function (which probably won't be the bottleneck).
If you believe that your Map output won't be larger than the input (the number of and size of the tuples), then I think around 40 TB would be enough.
2. Memory- if you want that the whole computation would be concurrent as possible then it depends on the amount of memory you specify for the containers (AM, Mappers, and Reducers) in the cluster configuration (yarn-site.xml), the number of containers and the use case demands (maybe each mapper should have at least 2056 MB). Otherwise, some of the containers would have to wait for space (formerly, the task assignment depends only on the Memory)
3. Cpu- the same as Memory but it could be irrelevant if it won't affect your container computation and assignment.
*When you do configure your cluster please pay attention also to the heap size.

Good luck


On 2018/10/21 08:25:51, Amine Tengilimoglu <[hidden email]> wrote:

> Hi all;
>
>    I want to learn how can i estimate the hardware nedeed for hadoop
> cluster. is there any standart or other things?
>
>   for example I have 10TB data, and i will analiyze it... My replication
> factor will be 2.
>
>    How much  ram do i need for one node? how can I estimate it?
>    How much disk do i need for one node ? how can I estimate it?
>    How many  core - CPU do i need for one node?
>
>
> thanks in advance..
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]