Wide Record reader fix. #288

prince-cs · 2025-06-10T15:32:06Z

Currently records from a single batch are being read in a parallel stream processing and the result is being stored in its entirety. This could lead to out of memory(OOM).

The fix is to ensure that the batches are being read with the help of an iterator rather storing the entire result. Made the changes accordingly to SalesforceWideRecordReader.java

Validation has been done by making a query of length more than 20k and ensuring the pipeline passes.

JIRA: https://cdap.atlassian.net/browse/PLUGIN-1897

MrRahulSharma · 2025-06-11T06:29:00Z

Please add more details in the description, like the issue, RCA, fix, how did we validate etc.

MrRahulSharma · 2025-06-12T10:54:04Z

src/main/java/io/cdap/plugin/salesforce/SalesforceConstants.java

@@ -47,7 +47,7 @@ public class SalesforceConstants {
  public static final int RANGE_FILTER_MIN_VALUE = 0;
  public static final int SOQL_MAX_LENGTH = 20000;

-  public static final int DEFAULT_CONNECTION_TIMEOUT_MS = 30000;
+  public static final int DEFAULT_CONNECTION_TIMEOUT_MS = 120_000;


This change does not look related to this failure, can you confirm ?

The read calls started failing with timeouts after reading some batches and did not recover even after retries.
During local testing faced the same behavior in both CDF and DTS hence increasing the timeout helped overcome the timeouts.
This was observed in our environment.

We do have a user provided value for the connectTimeout, are we not using that for some flow ?

Yes, the plugin does use the user provided connectTimeout. However it fails with the default connectTimeout, so increased the timeout to handle the default configs.

itsankit-google

please add JIRA in the PR title.

itsankit-google · 2025-06-12T15:32:43Z

src/main/java/io/cdap/plugin/salesforce/plugin/source/batch/SalesforceWideRecordReader.java

+  private PartnerConnection partnerConnection;
+  private SObjectDescriptor sObjectDescriptor;
+  private List<String> fieldsNames;
+  private String fields;
+  private String sObjectName;


why all of them need to be added as instance variables?

as we are now fetching records based on batch, so it is required to declare these as instance variables

itsankit-google · 2025-06-12T15:50:28Z

src/main/java/io/cdap/plugin/salesforce/plugin/source/batch/SalesforceWideRecordReader.java

      LOG.debug("Number of partitions to be fetched for wide object: '{}'", partitions.size());
-
-      // Process partitions with batches sized to adhere to API limits and optimize memory usage.


can we hit API limit early now after this change?

No, as the logic remains the same, the only change is the way it reads the set of 2K records at a time instead of the entire batch in one go.

Wide Record reader fix.

0c921fc

prince-cs added the build label Jun 10, 2025

Merge branch 'develop' into wide-record-reader

0c00e1a

prince-cs requested a review from MrRahulSharma June 11, 2025 05:18

MrRahulSharma requested a review from itsankit-google June 12, 2025 10:53

MrRahulSharma reviewed Jun 12, 2025

View reviewed changes

itsankit-google reviewed Jun 12, 2025

View reviewed changes

prince-cs requested a review from itsankit-google June 12, 2025 15:45

itsankit-google reviewed Jun 12, 2025

View reviewed changes

prince-cs added 2 commits June 12, 2025 21:29

comment refactoring

42a6974

added unit tests

ef1ba8d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Wide Record reader fix. #288

Wide Record reader fix. #288

Uh oh!

prince-cs commented Jun 10, 2025 •

edited

Loading

Uh oh!

MrRahulSharma commented Jun 11, 2025

Uh oh!

MrRahulSharma Jun 12, 2025

Uh oh!

anup-cloudsufi Jun 12, 2025

Uh oh!

MrRahulSharma Jun 25, 2025

Uh oh!

anup-cloudsufi Jun 25, 2025

Uh oh!

itsankit-google left a comment

Uh oh!

itsankit-google Jun 12, 2025

Uh oh!

prince-cs Jun 12, 2025

Uh oh!

itsankit-google Jun 12, 2025

Uh oh!

anup-cloudsufi Jun 12, 2025

Uh oh!

Uh oh!

		LOG.debug("Number of partitions to be fetched for wide object: '{}'", partitions.size());

		// Process partitions with batches sized to adhere to API limits and optimize memory usage.

Wide Record reader fix. #288

Are you sure you want to change the base?

Wide Record reader fix. #288

Uh oh!

Conversation

prince-cs commented Jun 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MrRahulSharma commented Jun 11, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

itsankit-google left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

prince-cs commented Jun 10, 2025 •

edited

Loading