-
Notifications
You must be signed in to change notification settings - Fork 9.1k
HADOOP-19613. [ABFS][ReadAheadV2] Refactor ReadBufferManager to isolate new code with the current working code #7801
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: trunk
Are you sure you want to change the base?
Conversation
============================================================
|
* Enable or disable readahead V2 in AbfsInputStream. This will work independent of V1. | ||
* Value: {@value}. | ||
*/ | ||
public static final String FS_AZURE_ENABLE_READAHEAD_V2 = "fs.azure.enable.readahead.v2"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If both V1 and V2 are enabled, does V2 take precedence over V1? If so, would it be inaccurate to refer to them as independent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By independent I mean, users don't have to explicitly enable fs.azure.enable.readahead
for fs.azure.enable.readahead.v2
to work.
And yes if both are set, preference will be given to V2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add this precedence thing in the comments just for better visibility?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Already Added in AbfsInputStream where this precedence is actually taken into account.
@VisibleForTesting | ||
public boolean isReadAheadEnabled() { | ||
return readAheadEnabled; | ||
return (readAheadEnabled || readAheadV2Enabled) && readBufferManager != null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The method name suggests we're only checking readAhead, but we're also checking readBufferManager here. Should we either move this check outside the method or update the method name to reflect its functionality?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
method is to check if readahaead is enabled or not and for that we need readBufferManager to be not null.
This is added only as a precaution to avoid NPE
return lengthToCopy; | ||
} | ||
@VisibleForTesting | ||
void testResetReadBufferManager(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why does the method name in production have test ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was there already in RBMV1. It makes sense to keep it this way because this should never be called from production flow and its critical to test some really important scenarios so we can't void it.
private static ReadBufferManagerV1 bufferManager; // singleton, initialized in static initialization block | ||
private static final ReentrantLock LOCK = new ReentrantLock(); | ||
|
||
static ReadBufferManagerV1 getBufferManager() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since we are refactoring the code, we can add javadocs for all the methods as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. Will take it
* | ||
* @param stream The {@link AbfsInputStream} for which to do the read-ahead | ||
* @param requestedOffset The offset in the file which shoukd be read | ||
* @param requestedLength The length to read |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tracing context missing in params
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Taken
public ReadBuffer getNextBlockToRead() throws InterruptedException { | ||
ReadBuffer buffer = null; | ||
synchronized (this) { | ||
//buffer = readAheadQueue.take(); // blocking method |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Taken
@@ -26,7 +26,7 @@ | |||
|
|||
import static org.apache.hadoop.fs.azurebfs.contracts.services.ReadBufferStatus.READ_FAILED; | |||
|
|||
class ReadBuffer { | |||
public class ReadBuffer { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why did we make it public? The new classes seem to be under the same package as this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reverted
* @param position the offset in the file to do a read for | ||
* @param length the length to read | ||
* @param buffer the buffer to read data into. Note that the buffer will be written into from offset 0. | ||
* @return the number of bytes read |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: we can add the @throws IOException line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added in base class
* Returns buffers that failed or passed from completed queue. | ||
* @param stream | ||
* @param requestedOffset | ||
* @return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: missing @return ReadBuffer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added in base class
@@ -51,7 +51,7 @@ public void run() { | |||
} catch (InterruptedException ex) { | |||
Thread.currentThread().interrupt(); | |||
} | |||
ReadBufferManager bufferManager = ReadBufferManager.getBufferManager(); | |||
ReadBufferManager bufferManager = ReadBufferManagerV1.getBufferManager(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will this always require ReadBufferManagerV1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, this will be change in next PR when ReadBufferManagerV2 will be introduced.
|
||
private ReadBufferManager getBufferManager() { | ||
return ReadBufferManagerV1.getBufferManager(); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: add EOL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Taken
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
🎊 +1 overall
This message was automatically generated. |
This is the first PR in series of work done under Parent Jira: HADOOP-19596 to improve the performance of sequential reads in ABFS Driver.
Please refer to Parent JIRA for more details.
Description of PR
Jira: https://issues.apache.org/jira/browse/HADOOP-19613
Read Buffer Manager used today was introduced way back and has been stable for quite a while.
Read Buffer Manager to be introduced as part of HADOOP-19596 will introduce many changes incrementally over time. While the development goes on and we are able to fully stabilise the optimized version we need the current flow to be functional and undisturbed.
This work item is to isolate that from new code by refactoring ReadBufferManager class to have 2 different implementations with same public interfaces: ReadBufferManagerV1 and ReadBufferManagerV2.
This will also introduce new configs that can be used to toggle between new and old code.
How was this patch tested?
Existing tests were modified to work with the Refactored Classes.
More tests will be added with coming up PRs where new implementation will be introduced.
Test suite result added.