-
Notifications
You must be signed in to change notification settings - Fork 29
Wide Record reader fix. #288
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
Please add more details in the description, like the issue, RCA, fix, how did we validate etc. |
@@ -47,7 +47,7 @@ public class SalesforceConstants { | |||
public static final int RANGE_FILTER_MIN_VALUE = 0; | |||
public static final int SOQL_MAX_LENGTH = 20000; | |||
|
|||
public static final int DEFAULT_CONNECTION_TIMEOUT_MS = 30000; | |||
public static final int DEFAULT_CONNECTION_TIMEOUT_MS = 120_000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change does not look related to this failure, can you confirm ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The read calls started failing with timeouts
after reading some batches and did not recover even after retries.
During local testing faced the same behavior in both CDF and DTS hence increasing the timeout helped overcome the timeouts.
This was observed in our environment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do have a user provided value for the connectTimeout, are we not using that for some flow ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the plugin does use the user provided connectTimeout
. However it fails with the default connectTimeout
, so increased the timeout to handle the default configs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add JIRA in the PR title.
private PartnerConnection partnerConnection; | ||
private SObjectDescriptor sObjectDescriptor; | ||
private List<String> fieldsNames; | ||
private String fields; | ||
private String sObjectName; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why all of them need to be added as instance variables?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as we are now fetching records based on batch, so it is required to declare these as instance variables
LOG.debug("Number of partitions to be fetched for wide object: '{}'", partitions.size()); | ||
|
||
// Process partitions with batches sized to adhere to API limits and optimize memory usage. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we hit API limit early now after this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, as the logic remains the same, the only change is the way it reads the set of 2K records at a time instead of the entire batch in one go.
Currently records from a single batch are being read in a parallel stream processing and the result is being stored in its entirety. This could lead to out of memory(OOM).
The fix is to ensure that the batches are being read with the help of an iterator rather storing the entire result. Made the changes accordingly to
SalesforceWideRecordReader.java
Validation has been done by making a query of length more than 20k and ensuring the pipeline passes.
JIRA: https://cdap.atlassian.net/browse/PLUGIN-1897