HADOOP-18296. Memory fragmentation in ChecksumFileSystem readVectored() #7732

steveloughran · 2025-06-09T18:16:39Z

Description of PR

Lets you turn off checksumming in local fs (but not hdfs!) with option file.verify-checksum
Has copy of Parquet's TrackingByteBufferAllocator, modified for Hadoop APIs and named
TrackingByteBufferPool; not yet used in tests.
New capability "fs.capability.vectoredio.sliced" to declare that you slice buffers.

How was this patch tested?

no tests yet.

For code changes:

Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')?
Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation?
If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
If applicable, have you updated the LICENSE, LICENSE-binary, NOTICE-binary files?

Copilot

Pull Request Overview

This PR addresses memory fragmentation issues in the vectored IO implementation within ChecksumFileSystem by introducing configurable checksum verification, enhanced logging, and improved buffer management. Key changes include:

Adding a configurable option (file.verify-checksum) and corresponding accessor in LocalFileSystem and ChecksumFileSystem.
Introducing a new TrackingByteBufferPool for tracking buffer usage.
Updating vectored IO logic and related documentation, including buffer slicing and new capability definitions.

Reviewed Changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
TestViewFileSystemDelegation.java	Added an @OverRide annotation to the getVerifyChecksum() method.
fsdatainputstream.md	Updated and expanded documentation sections covering direct buffer reads and buffer slicing.
Sizes.java	Introduced a new constant S_0 for representing 0 bytes.
TrackingByteBufferPool.java	Added a new buffer pool implementation to track allocations and releases.
VectoredReadUtils.java	Added a new hasVectorIOCapability() method that supports vectored IO capability lookup.
StreamCapabilities.java	Added a new constant VECTOREDIO_BUFFERS_SLICED and updated interface documentation.
RawLocalFileSystem.java	Updated capability handling and logging for vectored read operations.
LocalFileSystemConfigKeys.java	Introduced the LOCAL_FS_VERIFY_CHECKSUM config key with documentation.
LocalFileSystem.java	Integrated checksum verification setting during initialization with logging.
ChecksumFileSystem.java	Added a getVerifyChecksum() method, updated vectored read logic and capability handling.

Comments suppressed due to low confidence (4)

hadoop-common-project/hadoop-common/src/site/markdown/filesystem/fsdatainputstream.md:692

[nitpick] The sentence in the Buffer Slicing section appears to have an extraneous comma and could be rephrased for clarity. Consider revising it to clearly state the conditions under which buffer slicing is applied.

those with checksums,

hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/Sizes.java:35

[nitpick] The constant name 'S_0' may be unclear at first glance. Consider renaming it to 'ZERO_BYTES' for improved readability.

public static final int S_0 = 0;

hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/ChecksumFileSystem.java:1186

Ensure that returning getVerifyChecksum() for the VECTOREDIO_BUFFERS_SLICED capability correctly reflects the intended semantics for when buffers are sliced. It may be beneficial to explicitly document this behavior or consider whether a separate flag is more appropriate.

case StreamCapabilities.VECTOREDIO_BUFFERS_SLICED:

hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/VectoredReadUtils.java:488

Consider expanding the hasVectorIOCapability() method to either support the 'fs.capability.vectoredio.sliced' capability or document that it intentionally only handles 'in:readvectored'.

switch (capability.toLowerCase(Locale.ENGLISH)) {

hadoop-yetus · 2025-06-09T22:01:52Z

💔 -1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	0m 40s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 0s		No case conflicting files found.
+0 🆗	codespell	0m 0s		codespell was not available.
+0 🆗	detsecrets	0m 0s		detect-secrets was not available.
+0 🆗	markdownlint	0m 0s		markdownlint was not available.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
+1 💚	test4tests	0m 0s		The patch appears to include 1 new or modified test files.
			_ branch-3.4 Compile Tests _
+1 💚	mvninstall	40m 9s		branch-3.4 passed
+1 💚	compile	19m 6s		branch-3.4 passed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04
+1 💚	compile	16m 52s		branch-3.4 passed with JDK Private Build-1.8.0_452-8u452-ga~~us1-0ubuntu1~~20.04-b09
+1 💚	checkstyle	1m 16s		branch-3.4 passed
+1 💚	mvnsite	1m 36s		branch-3.4 passed
+1 💚	javadoc	1m 8s		branch-3.4 passed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04
+1 💚	javadoc	0m 51s		branch-3.4 passed with JDK Private Build-1.8.0_452-8u452-ga~~us1-0ubuntu1~~20.04-b09
+1 💚	spotbugs	2m 32s		branch-3.4 passed
+1 💚	shadedclient	37m 6s		branch has no errors when building and testing our client artifacts.
			_ Patch Compile Tests _
+1 💚	mvninstall	0m 56s		the patch passed
+1 💚	compile	17m 47s		the patch passed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04
+1 💚	javac	17m 47s		the patch passed
+1 💚	compile	16m 15s		the patch passed with JDK Private Build-1.8.0_452-8u452-ga~~us1-0ubuntu1~~20.04-b09
+1 💚	javac	16m 15s		the patch passed
-1 ❌	blanks	0m 0s	/blanks-eol.txt	The patch has 1 line(s) that end in blanks. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply
-0 ⚠️	checkstyle	1m 8s	/results-checkstyle-hadoop-common-project_hadoop-common.txt	hadoop-common-project/hadoop-common: The patch generated 10 new + 71 unchanged - 0 fixed = 81 total (was 71)
+1 💚	mvnsite	1m 34s		the patch passed
+1 💚	javadoc	1m 9s		the patch passed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04
+1 💚	javadoc	0m 50s		the patch passed with JDK Private Build-1.8.0_452-8u452-ga~~us1-0ubuntu1~~20.04-b09
+1 💚	spotbugs	2m 43s		the patch passed
+1 💚	shadedclient	38m 23s		patch has no errors when building and testing our client artifacts.
			_ Other Tests _
+1 💚	unit	19m 24s		hadoop-common in the patch passed.
+1 💚	asflicense	1m 5s		The patch does not generate ASF License warnings.
		223m 45s

Subsystem	Report/Notes
Docker	ClientAPI=1.50 ServerAPI=1.50 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7732/1/artifact/out/Dockerfile
GITHUB PR	#7732
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets markdownlint
uname	Linux a0b236b49d33 5.15.0-136-generic #147-Ubuntu SMP Sat Mar 15 15:53:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/bin/hadoop.sh
git revision	branch-3.4 / `a19a65f`
Default Java	Private Build-1.8.0_452-8u452-ga~~us1-0ubuntu1~~20.04-b09
Multi-JDK versions	/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_452-8u452-ga~~us1-0ubuntu1~~20.04-b09
Test Results	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7732/1/testReport/
Max. process+thread count	2093 (vs. ulimit of 5500)
modules	C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7732/1/console
versions	git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by	Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

hadoop-yetus · 2025-06-09T23:10:51Z

🎊 +1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	19m 10s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 0s		No case conflicting files found.
+0 🆗	codespell	0m 0s		codespell was not available.
+0 🆗	detsecrets	0m 0s		detect-secrets was not available.
+0 🆗	markdownlint	0m 0s		markdownlint was not available.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
+1 💚	test4tests	0m 0s		The patch appears to include 1 new or modified test files.
			_ branch-3.4 Compile Tests _
+1 💚	mvninstall	39m 7s		branch-3.4 passed
+1 💚	compile	19m 28s		branch-3.4 passed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04
+1 💚	compile	20m 30s		branch-3.4 passed with JDK Private Build-1.8.0_452-8u452-ga~~us1-0ubuntu1~~20.04-b09
+1 💚	checkstyle	1m 26s		branch-3.4 passed
+1 💚	mvnsite	1m 54s		branch-3.4 passed
+1 💚	javadoc	1m 27s		branch-3.4 passed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04
+1 💚	javadoc	0m 54s		branch-3.4 passed with JDK Private Build-1.8.0_452-8u452-ga~~us1-0ubuntu1~~20.04-b09
+1 💚	spotbugs	3m 4s		branch-3.4 passed
+1 💚	shadedclient	42m 58s		branch has no errors when building and testing our client artifacts.
			_ Patch Compile Tests _
+1 💚	mvninstall	0m 56s		the patch passed
+1 💚	compile	18m 20s		the patch passed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04
+1 💚	javac	18m 20s		the patch passed
+1 💚	compile	17m 52s		the patch passed with JDK Private Build-1.8.0_452-8u452-ga~~us1-0ubuntu1~~20.04-b09
+1 💚	javac	17m 52s		the patch passed
+1 💚	blanks	0m 0s		The patch has no blanks issues.
-0 ⚠️	checkstyle	1m 16s	/results-checkstyle-hadoop-common-project_hadoop-common.txt	hadoop-common-project/hadoop-common: The patch generated 10 new + 71 unchanged - 0 fixed = 81 total (was 71)
+1 💚	mvnsite	1m 36s		the patch passed
+1 💚	javadoc	1m 9s		the patch passed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04
+1 💚	javadoc	0m 50s		the patch passed with JDK Private Build-1.8.0_452-8u452-ga~~us1-0ubuntu1~~20.04-b09
+1 💚	spotbugs	2m 43s		the patch passed
+1 💚	shadedclient	39m 43s		patch has no errors when building and testing our client artifacts.
			_ Other Tests _
+1 💚	unit	19m 35s		hadoop-common in the patch passed.
+1 💚	asflicense	1m 4s		The patch does not generate ASF License warnings.
		256m 12s

Subsystem	Report/Notes
Docker	ClientAPI=1.50 ServerAPI=1.50 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7732/2/artifact/out/Dockerfile
GITHUB PR	#7732
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets markdownlint
uname	Linux eae6368fc6d0 5.15.0-139-generic #149-Ubuntu SMP Fri Apr 11 22:06:13 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/bin/hadoop.sh
git revision	branch-3.4 / `86d6bac`
Default Java	Private Build-1.8.0_452-8u452-ga~~us1-0ubuntu1~~20.04-b09
Multi-JDK versions	/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_452-8u452-ga~~us1-0ubuntu1~~20.04-b09
Test Results	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7732/2/testReport/
Max. process+thread count	1263 (vs. ulimit of 5500)
modules	C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7732/2/console
versions	git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by	Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

steveloughran · 2025-06-10T12:35:22Z

@mukund-thakur @ahmarsuhail @cnauroth

trying to put this together. No attempts to address, just

explicit ability to turn it off in localfs
capability to declare when slicing still takes place (which does nothing for older releases)

Wire up TrackingByteBufferPool to vector read tests, with test suite AbstractContractVectoredReadTest adding test testBufferSlicing() to generate conditions which may trigger slicing. Only files which declare that they may slicing buffers are permitted to return buffers to the pool which aren't known about, or to close the pool with outstanding entries. So: no fix, just public declaration of behavior and test to verify that no other streams are doing it.

steveloughran · 2025-06-11T14:36:39Z

./hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/impl/TrackingByteBufferPool.java:101:  public static class LeakDetectorHeapByteBufferPoolException extends RuntimeException {: Class LeakDetectorHeapByteBufferPoolException should be declared as final. [FinalClass]
./hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/contract/AbstractContractVectoredReadTest.java:643:    TrackingByteBufferPool pool = TrackingByteBufferPool.wrap(getPool());:28: 'pool' hides a field. [HiddenField]
./hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/contract/AbstractContractVectoredReadTest.java:679:      LOG.info("Slicing is enabled; we saw leaked buffers: {} after {} releases of unknown bufferfs",: Line is longer than 100 characters (found 101). [LineLength]

steveloughran · 2025-06-11T14:39:00Z

...mmon-project/hadoop-common/src/main/java/org/apache/hadoop/fs/LocalFileSystemConfigKeys.java

+   * <p>
+   * {@value}.
+   */
+  public static final String LOCAL_FS_VERIFY_CHECKSUM = "file.verify-checksum";


rename fs.file.checksum.very and delclare in core-defaults to make more visible

YanivKunda · 2025-06-12T20:30:20Z

...on-project/hadoop-common/src/main/java/org/apache/hadoop/fs/impl/TrackingByteBufferPool.java

+   * The key maps by the object id of the buffer, and refers to either a common stack trace
+   * or one dynamically created for each allocation.
+   */
+  private final Map<Key, ByteBufferAllocationStacktraceException> allocated =


I think this can be replaced with java.util.IdentityHashMap, which will make the Key class redundant:

Suggested change

private final Map<Key, ByteBufferAllocationStacktraceException> allocated =

private final Map<ByteBuffer, ByteBufferAllocationStacktraceException> allocated = new IdentityHashMap();

makes sense; I just lifted the parquet code

hadoop-yetus · 2025-06-17T17:48:45Z

🎊 +1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	12m 20s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 0s		No case conflicting files found.
+0 🆗	codespell	0m 1s		codespell was not available.
+0 🆗	detsecrets	0m 1s		detect-secrets was not available.
+0 🆗	xmllint	0m 1s		xmllint was not available.
+0 🆗	markdownlint	0m 1s		markdownlint was not available.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
+1 💚	test4tests	0m 0s		The patch appears to include 5 new or modified test files.
			_ branch-3.4 Compile Tests _
+0 🆗	mvndep	1m 44s		Maven dependency ordering for branch
+1 💚	mvninstall	36m 15s		branch-3.4 passed
+1 💚	compile	17m 45s		branch-3.4 passed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04
+1 💚	compile	16m 8s		branch-3.4 passed with JDK Private Build-1.8.0_452-8u452-ga~~us1-0ubuntu1~~20.04-b09
+1 💚	checkstyle	4m 32s		branch-3.4 passed
+1 💚	mvnsite	2m 33s		branch-3.4 passed
+1 💚	javadoc	1m 51s		branch-3.4 passed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04
+1 💚	javadoc	1m 45s		branch-3.4 passed with JDK Private Build-1.8.0_452-8u452-ga~~us1-0ubuntu1~~20.04-b09
+1 💚	spotbugs	3m 56s		branch-3.4 passed
+1 💚	shadedclient	35m 56s		branch has no errors when building and testing our client artifacts.
			_ Patch Compile Tests _
+0 🆗	mvndep	0m 32s		Maven dependency ordering for patch
+1 💚	mvninstall	1m 28s		the patch passed
+1 💚	compile	17m 59s		the patch passed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04
+1 💚	javac	17m 59s		the patch passed
+1 💚	compile	16m 48s		the patch passed with JDK Private Build-1.8.0_452-8u452-ga~~us1-0ubuntu1~~20.04-b09
+1 💚	javac	16m 48s		the patch passed
+1 💚	blanks	0m 0s		The patch has no blanks issues.
-0 ⚠️	checkstyle	4m 29s	/results-checkstyle-root.txt	root: The patch generated 1 new + 72 unchanged - 0 fixed = 73 total (was 72)
+1 💚	mvnsite	2m 30s		the patch passed
+1 💚	javadoc	1m 52s		the patch passed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04
+1 💚	javadoc	1m 43s		the patch passed with JDK Private Build-1.8.0_452-8u452-ga~~us1-0ubuntu1~~20.04-b09
+1 💚	spotbugs	4m 12s		the patch passed
+1 💚	shadedclient	36m 6s		patch has no errors when building and testing our client artifacts.
			_ Other Tests _
+1 💚	unit	19m 30s		hadoop-common in the patch passed.
+1 💚	unit	3m 12s		hadoop-aws in the patch passed.
+1 💚	asflicense	1m 7s		The patch does not generate ASF License warnings.
		251m 9s

Subsystem	Report/Notes
Docker	ClientAPI=1.50 ServerAPI=1.50 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7732/6/artifact/out/Dockerfile
GITHUB PR	#7732
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint markdownlint
uname	Linux 9cb8ad193468 5.15.0-136-generic #147-Ubuntu SMP Sat Mar 15 15:53:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/bin/hadoop.sh
git revision	branch-3.4 / `de78242`
Default Java	Private Build-1.8.0_452-8u452-ga~~us1-0ubuntu1~~20.04-b09
Multi-JDK versions	/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_452-8u452-ga~~us1-0ubuntu1~~20.04-b09
Test Results	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7732/6/testReport/
Max. process+thread count	3148 (vs. ulimit of 5500)
modules	C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: .
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7732/6/console
versions	git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by	Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

mukund-thakur · 2025-06-17T19:52:35Z

hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/RawLocalFileSystem.java

@@ -400,7 +398,9 @@ private void initiateRead() {
      for(int i = 0; i < ranges.size(); ++i) {
        FileRange range = ranges.get(i);
        buffers[i] = allocateRelease.getBuffer(false, range.getLength());
-        channel.read(buffers[i], range.getOffset(), i, this);
+        final ByteBuffer buffer = buffers[i];


Any specific reason for this refactoring and making it final?

mukund-thakur · 2025-06-17T21:43:44Z

hadoop-common-project/hadoop-common/src/main/resources/core-default.xml

+    into the file formats themselves, or the underlying storage system.
+    Even when verification is enabled, files without associated checksum files
+    .$FILENAME.crc are never be verified.
+    When checksum verification is disabled, vector reads of date will always returne


typo: returne

mukund-thakur · 2025-06-17T21:45:08Z

hadoop-common-project/hadoop-common/src/site/markdown/filesystem/fsdatainputstream.md

+The local filesystem will not slice buffers if the checksum file
+of `filename + ".crc"` is not found. This is not declared in the
+filesystem `hasPathCapability(filename, "fs.capability.vectoredio.sliced")`
+call, as no checks for the checksum file are made then.ddddddddddddddddd


typo: then.ddddddddddddddddd

iwasakims · 2025-06-18T22:36:29Z

LGTM overall pending comments about typos.

The expected message was emitted on AbstractContractVectoredReadTest.

2025-06-18 22:19:40,235 [JUnit-testBufferSlicing[Buffer type : array]] INFO  contract.AbstractContractVectoredReadTest (AbstractContractVectoredReadTest.java:testBufferSlicing(679)) - Slicing is enabled; we saw leaked buffers: 16 after 8 releases of unknown buffers

The modified ITestS3AContractVectoredRead passed against Tokyo region without messages mentioning sliced/leaked buffers.

$ mvn verify -Dtest=x -Dit.test=ITestS3AContractVectoredRead
...
[INFO] -------------------------------------------------------
[INFO]  T E S T S
[INFO] -------------------------------------------------------
[INFO] Running org.apache.hadoop.fs.contract.s3a.ITestS3AContractVectoredRead
[INFO] Tests run: 60, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 38.202 s - in org.apache.hadoop.fs.contract.s3a.ITestS3AContractVectoredRead

iwasakims · 2025-06-19T00:10:35Z

The failure of TestCommonConfigurationFields looks caused by the property added to core-default.xml.

[ERROR] testCompareXmlAgainstConfigurationClass(org.apache.hadoop.conf.TestCommonConfigurationFields)  Time elapsed: 0.27 s  <<< FAILURE!
java.lang.AssertionError: core-default.xml has 1 properties missing in  class org.apache.hadoop.fs.CommonConfigurationKeys  class org.apache.hadoop.fs.CommonConfigurationKeysPublic  class org.apache.hadoop.fs.local.LocalConfigKeys  class org.apache.hadoop.fs.ftp.FtpConfigKeys  class org.apache.hadoop.ha.SshFenceByTcpPort  class org.apache.hadoop.security.LdapGroupsMapping  class org.apache.hadoop.ha.ZKFailoverController  class org.apache.hadoop.security.ssl.SSLFactory  class org.apache.hadoop.security.CompositeGroupsMapping  class org.apache.hadoop.io.erasurecode.CodecUtil  class org.apache.hadoop.security.RuleBasedLdapGroupsMapping Entries:   fs.file.checksum.verify expected:<0> but was:<1>

cnauroth · 2025-06-20T19:16:56Z

...on-project/hadoop-common/src/main/java/org/apache/hadoop/fs/impl/TrackingByteBufferPool.java

+ * Configuring this class to log at DEBUG will trigger their collection.
+ * @see ByteBufferAllocationStacktraceException
+ * <p>
+ * Adapted from Parquet class {@code org.apache.parquet.bytes.TrackingByteBufferAllocator}.


I suggest mentioning which version of Parquet you adapted this from. In the future, that will help us understand if our copy needs to pick up additional bug fixes.

cnauroth · 2025-06-20T19:24:59Z

hadoop-common-project/hadoop-common/src/site/markdown/filesystem/fsdatainputstream.md

+reading data may not be detected during the read operation.
+Use with care in production.)
+
+Filesystem instances which spit buffersduring vector read operations


Typo: "...which split buffers during..."

cnauroth · 2025-06-20T19:32:55Z

...mmon-project/hadoop-common/src/main/java/org/apache/hadoop/fs/LocalFileSystemConfigKeys.java

@@ -42,5 +42,6 @@ public class LocalFileSystemConfigKeys extends CommonConfigurationKeys {
  public static final String  LOCAL_FS_CLIENT_WRITE_PACKET_SIZE_KEY =
                                                     "file.client-write-packet-size";
  public static final int     LOCAL_FS_CLIENT_WRITE_PACKET_SIZE_DEFAULT = 64*1024;
+


Unnecessary change?

cnauroth · 2025-06-20T19:37:54Z

hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/RawLocalFileSystem.java

        return true;
      default:
-        return false;
+        return hasVectorIOCapability(capability);


Does this actually change the logic? It seems like hasVectorIOCapability checks StreamCapabilities.VECTOREDIO, just like the old code did.

cnauroth · 2025-06-20T19:40:24Z

...on-project/hadoop-common/src/main/java/org/apache/hadoop/fs/impl/TrackingByteBufferPool.java

+ * <p>
+ * Adapted from Parquet class {@code org.apache.parquet.bytes.TrackingByteBufferAllocator}.
+ */
+public final class TrackingByteBufferPool implements ByteBufferPool, AutoCloseable {


Should we put visibility annotations on this?

Alternatively, if it's only ever intended for use in tests, should it go into the test path instead of main?

steveloughran marked this pull request as draft June 9, 2025 18:16

steveloughran requested a review from Copilot June 9, 2025 18:16

Copilot AI reviewed Jun 9, 2025

View reviewed changes

HADOOP-18296. Memory fragmentation in ChecksumFileSystem readVectored()

86d6bac

steveloughran force-pushed the s3/HADOOP-18296-readbuffer-leak-branch-3.4 branch from a19a65f to 86d6bac Compare June 9, 2025 18:53

steveloughran force-pushed the s3/HADOOP-18296-readbuffer-leak-branch-3.4 branch from 1096112 to 995fcc4 Compare June 10, 2025 16:47

This comment was marked as outdated.

Sign in to view

explicitly disable analytics on the classic vector read tests

3c855c0

steveloughran commented Jun 11, 2025

View reviewed changes

review and checkstyle

0ea6638

steveloughran marked this pull request as ready for review June 11, 2025 15:29

This comment was marked as outdated.

Sign in to view

YanivKunda suggested changes Jun 12, 2025

View reviewed changes

steveloughran added 2 commits June 16, 2025 16:52

HADOOP-18296. review feedback

b51f2e8

HADOOP-18296. test failure

de78242

YanivKunda approved these changes Jun 17, 2025

View reviewed changes

mukund-thakur reviewed Jun 17, 2025

View reviewed changes

cnauroth requested changes Jun 20, 2025

View reviewed changes

	private final Map<Key, ByteBufferAllocationStacktraceException> allocated =
	private final Map<ByteBuffer, ByteBufferAllocationStacktraceException> allocated = new IdentityHashMap();

HADOOP-18296. Memory fragmentation in ChecksumFileSystem readVectored() #7732

Are you sure you want to change the base?

HADOOP-18296. Memory fragmentation in ChecksumFileSystem readVectored() #7732

Conversation

steveloughran commented Jun 9, 2025

Description of PR

How was this patch tested?

For code changes:

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

hadoop-yetus commented Jun 9, 2025

Uh oh!

hadoop-yetus commented Jun 9, 2025

Uh oh!

steveloughran commented Jun 10, 2025

Uh oh!

This comment was marked as outdated.

This comment was marked as outdated.

steveloughran commented Jun 11, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

This comment was marked as outdated.

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hadoop-yetus commented Jun 17, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

iwasakims commented Jun 18, 2025

Uh oh!

iwasakims commented Jun 19, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!