-
Notifications
You must be signed in to change notification settings - Fork 9.1k
HADOOP-19472: [ABFS] Improve write workload performance for ABFS by efficient concurrency utilization #7669
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: trunk
Are you sure you want to change the base?
Conversation
============================================================
|
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
… HADOOP-19472
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
LOG.error("Error freeing leases", e); | ||
} finally { | ||
IOUtils.cleanupWithLogger(LOG, getClient()); | ||
IOUtils.cleanupWithLogger(LOG, poolSizeManager, getClient()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the non-dynamic pool- how are we closing the boundedThreadPool?
Would we need HadoopExecutors.shutdown(..) for it?
this.maxThreadPoolSize = Math.max(computedMaxPoolSize, initialPoolSize); | ||
|
||
/* Initialize the bounded thread pool executor */ | ||
this.boundedThreadPool = Executors.newFixedThreadPool(initialPoolSize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we're naming the threads in non-dynamic pool and the manager pool for dynamic write pool. Should we also name the threads for dynamic case?
🎊 +1 overall
This message was automatically generated. |
com.sun.management.OperatingSystemMXBean sunOsBean | ||
= (com.sun.management.OperatingSystemMXBean) osBean; | ||
double cpuLoad = sunOsBean.getSystemCpuLoad(); | ||
if (cpuLoad >= 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if cpuLoad is -1.0, should we log it?
FS_AZURE_ABFS_ENABLE_CHECKSUM_VALIDATION, DefaultValue = DEFAULT_ENABLE_ABFS_CHECKSUM_VALIDATION) | ||
private boolean isChecksumValidationEnabled; | ||
|
||
@BooleanConfigurationValidatorAnnotation(ConfigurationKey = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: we can remove it and related ones below (part of prev PR)
💔 -1 overall
This message was automatically generated. |
… HADOOP-19472
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
… HADOOP-19472
💔 -1 overall
This message was automatically generated. |
============================================================
|
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
Enhance the performance of ABFS Driver for write-heavy workloads by improving concurrency within writes.
The proposed design advocates for a centralized
WriteThreadPoolSizeManager
class to handle the collective thread allocation required for all write operations across the system, replacing the current CachedThreadPool in AzureBlobFileSystemStore. This centralized approach ensures that the initial thread pool size is set at4 * number of available processors
and dynamically adjusts the pool size based on the system's current CPU utilization. This adaptive scaling and descaling mechanism optimizes resource usage and responsiveness. Moreover, this shared thread pool is accessible and utilized by all output streams, streamlining resource management and promoting efficient concurrency across write operations.