NVMe Over fabric performance on HDFS

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

NVMe Over fabric performance on HDFS

Daegyu Han
Hi all,

I am using storage disaggregation by mounting nvme ssds on the storage node.

When we connect the compute node and the storage node with nvme over
fabric (nvmeof) and test it, performance is much lower than that of
local storage (DAS).

In general, we know that applications need to increase io parallelism
and io size to improve the performance of nvmeof.

How can I change the settings of hdfs specifically to improve the io
performance of NVMeOF in HDFS?

Best regards,
Daegyu

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: NVMe Over fabric performance on HDFS

Anu Engineer-3
Is your NVMe shared and all datanodes sending I/O to the same set of disks ? Is it possible for you to see the I/O queue length of the NVMe Devices?
I would suggest that you try to find out what is causing the perf issue, and once we know in ball park where the issue is -- that is, is it disks or HDFS, it might be possible to see what we can do.



Thanks
Anu


On Tue, Jun 25, 2019 at 7:20 AM Daegyu Han <[hidden email]> wrote:
Hi all,

I am using storage disaggregation by mounting nvme ssds on the storage node.

When we connect the compute node and the storage node with nvme over
fabric (nvmeof) and test it, performance is much lower than that of
local storage (DAS).

In general, we know that applications need to increase io parallelism
and io size to improve the performance of nvmeof.

How can I change the settings of hdfs specifically to improve the io
performance of NVMeOF in HDFS?

Best regards,
Daegyu

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: NVMe Over fabric performance on HDFS

Daegyu Han
Hi Anu,

Each datanode has own Samsung NVMe SSD which is on storage node.
In other words, just separate compute node and storage (nvme ssd).

I know that the maximum bandwidth of my Samsung NVMe SSD is about 3GB / s.

Experimental results of TestDFSIO and HDFS_API show that the
performance of local NVMe SSD is up to 2GB / s, while NVMeOF SSD has
500 ~ 800MB / s performance.
Even IPoIB using InfiniBand has a bandwidth of 1GB / s.

In research papers evaluating NVMeOF through FIO or KV Store
applications, the performance of NVMeOF is similar to that of local
SSD.
They said also, in order to improve NVMeOF performance as much as
local level, it is required to perform parallel IO.
Why does not the performance of NVMeOF IO bandwidth in HDFS be as good as local?

Regards,
Daegyu

2019년 6월 26일 (수) 오전 12:04, Anu Engineer <[hidden email]>님이 작성:

>
> Is your NVMe shared and all datanodes sending I/O to the same set of disks ? Is it possible for you to see the I/O queue length of the NVMe Devices?
> I would suggest that you try to find out what is causing the perf issue, and once we know in ball park where the issue is -- that is, is it disks or HDFS, it might be possible to see what we can do.
>
>
>
> Thanks
> Anu
>
>
> On Tue, Jun 25, 2019 at 7:20 AM Daegyu Han <[hidden email]> wrote:
>>
>> Hi all,
>>
>> I am using storage disaggregation by mounting nvme ssds on the storage node.
>>
>> When we connect the compute node and the storage node with nvme over
>> fabric (nvmeof) and test it, performance is much lower than that of
>> local storage (DAS).
>>
>> In general, we know that applications need to increase io parallelism
>> and io size to improve the performance of nvmeof.
>>
>> How can I change the settings of hdfs specifically to improve the io
>> performance of NVMeOF in HDFS?
>>
>> Best regards,
>> Daegyu
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: NVMe Over fabric performance on HDFS

Wei-Chiu Chuang-2
There are a few Intel folks contributor NVMe related features in HDFS. They are probably the best source for this questions.

Without having access to the NVMe hardware, it is hard to tell. I learned GCE offers Intel Optane DC Persistent Memory attached instances. That can be used for tests if any one is interested.

I personally have not received reports regarding unexpected performance issue with NVMe with HDFS. A lot of test tuning could result in better performance. File size can have a great impact in a TestDFSIO, for example. You should also make sure you saturate the local NVMe rather than network bandwidth. Try set replication factor=1? With the default replication factor you pretty much saturate network rather than storage, I guess.

The Intel folks elected to implement DCPMM as a HDFS cache rather than a storage. There's probably some consideration behind that.

On Tue, Jun 25, 2019 at 10:29 AM Daegyu Han <[hidden email]> wrote:
Hi Anu,

Each datanode has own Samsung NVMe SSD which is on storage node.
In other words, just separate compute node and storage (nvme ssd).

I know that the maximum bandwidth of my Samsung NVMe SSD is about 3GB / s.

Experimental results of TestDFSIO and HDFS_API show that the
performance of local NVMe SSD is up to 2GB / s, while NVMeOF SSD has
500 ~ 800MB / s performance.
Even IPoIB using InfiniBand has a bandwidth of 1GB / s.

In research papers evaluating NVMeOF through FIO or KV Store
applications, the performance of NVMeOF is similar to that of local
SSD.
They said also, in order to improve NVMeOF performance as much as
local level, it is required to perform parallel IO.
Why does not the performance of NVMeOF IO bandwidth in HDFS be as good as local?

Regards,
Daegyu

2019년 6월 26일 (수) 오전 12:04, Anu Engineer <[hidden email]>님이 작성:
>
> Is your NVMe shared and all datanodes sending I/O to the same set of disks ? Is it possible for you to see the I/O queue length of the NVMe Devices?
> I would suggest that you try to find out what is causing the perf issue, and once we know in ball park where the issue is -- that is, is it disks or HDFS, it might be possible to see what we can do.
>
>
>
> Thanks
> Anu
>
>
> On Tue, Jun 25, 2019 at 7:20 AM Daegyu Han <[hidden email]> wrote:
>>
>> Hi all,
>>
>> I am using storage disaggregation by mounting nvme ssds on the storage node.
>>
>> When we connect the compute node and the storage node with nvme over
>> fabric (nvmeof) and test it, performance is much lower than that of
>> local storage (DAS).
>>
>> In general, we know that applications need to increase io parallelism
>> and io size to improve the performance of nvmeof.
>>
>> How can I change the settings of hdfs specifically to improve the io
>> performance of NVMeOF in HDFS?
>>
>> Best regards,
>> Daegyu
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: NVMe Over fabric performance on HDFS

Radhakrishnan Potty, Rakeshr

Hi Daegyu,

 

It’s interesting. IMHO, we could also explore the impact of latencies - the latency of a remote NVMe target storage device connected over an appropriate networking fabric Vs the latency of a NVMe storage device using a local server's PCIe bus.

 

Could you please tell me the network usage of your test with NVMeOF. How about increasing network bandwidth may be >=10 Gbps ?

 

Yes, you can increase the parallelism in TestDFSIO . One idea is to play with "-nrFiles" arguments and run more mappers.

For example, we can test 500GB of cluster data with "-nrFiles 100 -fileSize 5GB" or with "-nrFiles 5 -fileSize 100GB". In both cases the parallelism to HDFS is different.

 

Could you please elaborate on your cluster, TestDFSIO benchmark.

              

What type of test you are doing:

- sequential read/write

                              - random read/write

 

Have you connected NVMe device in DAX mode - https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/storage_administration_guide/configuring-persistent-memory-for-use-in-device-dax-mode

 

Thanks,

Rakesh

From: Wei-Chiu Chuang [mailto:[hidden email]]
Sent: Wednesday, June 26, 2019 1:00 AM
To: Daegyu Han <[hidden email]>
Cc: Anu Engineer <[hidden email]>; user.hadoop <[hidden email]>
Subject: Re: NVMe Over fabric performance on HDFS

 

There are a few Intel folks contributor NVMe related features in HDFS. They are probably the best source for this questions.

 

Without having access to the NVMe hardware, it is hard to tell. I learned GCE offers Intel Optane DC Persistent Memory attached instances. That can be used for tests if any one is interested.

 

I personally have not received reports regarding unexpected performance issue with NVMe with HDFS. A lot of test tuning could result in better performance. File size can have a great impact in a TestDFSIO, for example. You should also make sure you saturate the local NVMe rather than network bandwidth. Try set replication factor=1? With the default replication factor you pretty much saturate network rather than storage, I guess.

 

The Intel folks elected to implement DCPMM as a HDFS cache rather than a storage. There's probably some consideration behind that.

 

On Tue, Jun 25, 2019 at 10:29 AM Daegyu Han <[hidden email]> wrote:

Hi Anu,

Each datanode has own Samsung NVMe SSD which is on storage node.
In other words, just separate compute node and storage (nvme ssd).

I know that the maximum bandwidth of my Samsung NVMe SSD is about 3GB / s.

Experimental results of TestDFSIO and HDFS_API show that the
performance of local NVMe SSD is up to 2GB / s, while NVMeOF SSD has
500 ~ 800MB / s performance.
Even IPoIB using InfiniBand has a bandwidth of 1GB / s.

In research papers evaluating NVMeOF through FIO or KV Store
applications, the performance of NVMeOF is similar to that of local
SSD.
They said also, in order to improve NVMeOF performance as much as
local level, it is required to perform parallel IO.
Why does not the performance of NVMeOF IO bandwidth in HDFS be as good as local?

Regards,
Daegyu

2019 6 26 () 오전 12:04, Anu Engineer <[hidden email]>님이 작성:
>
> Is your NVMe shared and all datanodes sending I/O to the same set of disks ? Is it possible for you to see the I/O queue length of the NVMe Devices?
> I would suggest that you try to find out what is causing the perf issue, and once we know in ball park where the issue is -- that is, is it disks or HDFS, it might be possible to see what we can do.
>
>
>
> Thanks
> Anu
>
>
> On Tue, Jun 25, 2019 at 7:20 AM Daegyu Han <[hidden email]> wrote:
>>
>> Hi all,
>>
>> I am using storage disaggregation by mounting nvme ssds on the storage node.
>>
>> When we connect the compute node and the storage node with nvme over
>> fabric (nvmeof) and test it, performance is much lower than that of
>> local storage (DAS).
>>
>> In general, we know that applications need to increase io parallelism
>> and io size to improve the performance of nvmeof.
>>
>> How can I change the settings of hdfs specifically to improve the io
>> performance of NVMeOF in HDFS?
>>
>> Best regards,
>> Daegyu
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]