[jira] [Commented] (HADOOP-14535) Support for random access and seek of block blobs

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[jira] [Commented] (HADOOP-14535) Support for random access and seek of block blobs

JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/HADOOP-14535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16054670#comment-16054670 ]

Thomas commented on HADOOP-14535:
---------------------------------

Thanks for the feedback.  I looked at ITestS3AInputStreamPerformance and will do something similar.  I do not have an Azure account with which I can share a file publicly, but I can write a test to generate the source for the test. I am currently working on a few other things, so won't be able to jump on this immediately.  Would you like to hold off on this change until the instrumentation and unit test is complete, or would end-to-end test results be sufficient motivation to move forward on this task while I continue to work on the other tasks?  

By the way, this work was done to address https://issues.apache.org/jira/browse/HADOOP-14478, which has a dependency on a change in the Azure Storage SDK for Java.  The ask was for the SDK to use InputStream.mark(readLimit) as a hint to disregard the default network read size and use readLimit instead.  Since this is not the intended use of mark, rather than pursue unusual dependencies between these two projects I provided the implementation in the patch as a solution.

> Support for random access and seek of block blobs
> -------------------------------------------------
>
>                 Key: HADOOP-14535
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14535
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs/azure
>            Reporter: Thomas
>            Assignee: Thomas
>         Attachments: 0001-Random-access-and-seek-imporvements-to-azure-file-system.patch, 0003-Random-access-and-seek-imporvements-to-azure-file-system.patch, 0004-Random-access-and-seek-imporvements-to-azure-file-system.patch
>
>
> This change adds a seek-able stream for reading block blobs to the wasb:// file system.
> If seek() is not used or if only forward seek() is used, the behavior of read() is unchanged.
> That is, the stream is optimized for sequential reads by reading chunks (over the network) in
> the size specified by "fs.azure.read.request.size" (default is 4 megabytes).
> If reverse seek() is used, the behavior of read() changes in favor of reading the actual number
> of bytes requested in the call to read(), with some constraints.  If the size requested is smaller
> than 16 kilobytes and cannot be satisfied by the internal buffer, the network read will be 16
> kilobytes.  If the size requested is greater than 4 megabytes, it will be satisfied by sequential
> 4 megabyte reads over the network.
> This change improves the performance of FSInputStream.seek() by not closing and re-opening the
> stream, which for block blobs also involves a network operation to read the blob metadata. Now
> NativeAzureFsInputStream.seek() checks if the stream is seek-able and moves the read position.
> [^attachment-name.zip]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Loading...