Does HDFS read blocks simultaneously in multi-threaded way?

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Does HDFS read blocks simultaneously in multi-threaded way?

Daegyu Han
Hi all,

Assuming HDFS has a 1GB file input.dat and a block size of 128MB.

Can the user read multithreaded when reading the input.dat file?

In other words, is not the block being read sequentially, but reading
multiple blocks at the same time?

If not, is it difficult to implement a multi-threaded block read?

Best Regards,
Daegyu

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Does HDFS read blocks simultaneously in multi-threaded way?

Arpit Agarwal-2
HDFS reads blocks sequentially. We can implement a multi-threaded block reader in theory.


> On Jun 26, 2019, at 5:05 AM, Daegyu Han <[hidden email]> wrote:
>
> Hi all,
>
> Assuming HDFS has a 1GB file input.dat and a block size of 128MB.
>
> Can the user read multithreaded when reading the input.dat file?
>
> In other words, is not the block being read sequentially, but reading
> multiple blocks at the same time?
>
> If not, is it difficult to implement a multi-threaded block read?
>
> Best Regards,
> Daegyu
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Does HDFS read blocks simultaneously in multi-threaded way?

Daegyu Han
Thank you for your response.

Assuming HDFS blocks (blk1~blk8) for file input.dat are on the local data node, 
does the map task read these blocks sequentially when trying to read local blocks?


2019년 6월 27일 (목) 02:45, Arpit Agarwal <[hidden email]>님이 작성:
HDFS reads blocks sequentially. We can implement a multi-threaded block reader in theory.


> On Jun 26, 2019, at 5:05 AM, Daegyu Han <[hidden email]> wrote:
>
> Hi all,
>
> Assuming HDFS has a 1GB file input.dat and a block size of 128MB.
>
> Can the user read multithreaded when reading the input.dat file?
>
> In other words, is not the block being read sequentially, but reading
> multiple blocks at the same time?
>
> If not, is it difficult to implement a multi-threaded block read?
>
> Best Regards,
> Daegyu
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>

Reply | Threaded
Open this post in threaded view
|

Re: Does HDFS read blocks simultaneously in multi-threaded way?

Arpit Agarwal-2
Correct. The blocks will be read sequentially.


On Jun 26, 2019, at 10:51 AM, Daegyu Han <[hidden email]> wrote:

Thank you for your response.

Assuming HDFS blocks (blk1~blk8) for file input.dat are on the local data node, 
does the map task read these blocks sequentially when trying to read local blocks?


2019년 6월 27일 (목) 02:45, Arpit Agarwal <[hidden email]>님이 작성:
HDFS reads blocks sequentially. We can implement a multi-threaded block reader in theory.


> On Jun 26, 2019, at 5:05 AM, Daegyu Han <[hidden email]> wrote:
>
> Hi all,
>
> Assuming HDFS has a 1GB file input.dat and a block size of 128MB.
>
> Can the user read multithreaded when reading the input.dat file?
>
> In other words, is not the block being read sequentially, but reading
> multiple blocks at the same time?
>
> If not, is it difficult to implement a multi-threaded block read?
>
> Best Regards,
> Daegyu
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>


Reply | Threaded
Open this post in threaded view
|

Re: Does HDFS read blocks simultaneously in multi-threaded way?

Jeff Hubbs
In reply to this post by Arpit Agarwal-2
I'm not sure if I get the point of so doing, though.

With replication set to the default of three, your only-1-GB file will
get cut up into a mere 24 blocks spread among some number of your worker
nodes. The "multithreaded" comes in when the various worker nodes are
reading these blocks at once.?? Your disk I/O is only going to be so fast
no matter how many threads on a machine are trying to read from them;
Hadoop gets you beyond that by parallelizing that I/O across multiple
entire machines.

That being said, if all you're trafficking in are 1-GB data files, why
are you even messing with Hadoop?

On 6/26/19 1:44 PM, Arpit Agarwal wrote:

> HDFS reads blocks sequentially. We can implement a multi-threaded block reader in theory.
>
>
>> On Jun 26, 2019, at 5:05 AM, Daegyu Han <[hidden email]> wrote:
>>
>> Hi all,
>>
>> Assuming HDFS has a 1GB file input.dat and a block size of 128MB.
>>
>> Can the user read multithreaded when reading the input.dat file?
>>
>> In other words, is not the block being read sequentially, but reading
>> multiple blocks at the same time?
>>
>> If not, is it difficult to implement a multi-threaded block read?
>>
>> Best Regards,
>> Daegyu
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]