-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
for downloads less than 32MB pep_api_data_obj_read_*
is triggered more than once and contains wrong size info in len
attribute
#604
Comments
If I'm understanding the use case, we want the ability to accurately record the length of the overall data object (or amount of it read in total by the _download on the client side,) within the execution of one PEP. Of course it's not an expectation that could ever apply to data objects larger than the single buffer transfer size. Does it make sense to aim for a client-side solution in this case? I think I wouldn't mind accepting alternate schemes for chunking as a parameter in the get( ) and _download functions, but I don't know whether altering the scheme for all applications is called for. |
I don't know that it's the job of an individual read to have any idea how big the entire data object is... Can we explain the multiple read calls and the differing sizes shown in the original post? If we can explain them, then this might be just a documentation exercise, and not a bug? |
Right, the hint to me was with zero-size object read producing a len field of 8 *1024**2 that we might be recording in this "len" field the attempted read size. |
I notice a |
Hmm... I notice in pep_api_data_obj_read_post, the 4th parameter (rule_args[3]) is a BytesBuf. So ... |
i don't think it's the job of the read pep to know about the entire object. |
^^^ There's an |
It wouldn't seem so, but maybe l1descInx value is a unique "pointer" to a data object for the server hosting the data object being downloaded. @alanking @korydraughn ? |
|
Hello @mstfdkmn - If you would like to track the total downloads from specific replicas, I think it could be done via the python rule engine plugin. The following rule illustrates the underlying calls necessary to get descriptor information, as well as how you might tally bytes read from a source replica. You would need to key on certain information derived from the l1descinx member of the OpenedDataObjInp argument to pep_api_data_obj_read_post, and use the bytesBuf parameter to determine the actual length of the byte buffer read in any one invocation.
|
Hello @d-w-moore, If the main goal is simply to know the total downloads, then you’re right, I can use a custom rule as you suggested. However, my primary motivation for reporting this issue was:
Thanks. |
Is there more to do on this issue, whether for the 2.1.1 milestone or otherwise? We know now that a rule can be used to help track bytes downloaded per invocation of the data object READ API but that (or, as @mstfdkmn mentions, the standard and consistent invocation order of PEPs) probably won't directly affect our client-side work. |
I think the current behavior has been described and explained. |
And should the bug label be removed? 'question' instead? |
I think that the definition for "standard and consistent" will be difficult to pin down as PEPs are server-side and merely react to API invocations from various clients maintained by various developers and even teams. Since we have understood and explained the behavior reported in this issue, I think we can consider it resolved.
Yes, agreed. |
marking as such. and closing. |
python-irodsclient 2.0.1
irods 4.3.2
I am observing unnecessary pep-invokes (and irrelevant size information in the len attribute) when an object that its size is under the parallel threshold (size is less than 32BM) is downloaded. I think that is because of the extra reading (READ_BUFFER_SIZE ) for the small files here at
python-irodsclient/irods/data_object.py
Line 20 in bf46679
Called by
python-irodsclient/irods/manager/data_object_manager.py
Line 219 in bf46679
My observations:
pep_api_data_obj_read_*
is triggered only one time but the following message is captured. Please see the value for thelen
serialized object.len
varies in each invoke.pep_api_data_obj_read_*
is triggered 5 times and none of them has the correct size information.I have a kind of fix that ensures pep_api_data_obj_read_* is called only once, accurately populating the
len
field in the PEP. My tests haven't revealed any issues, but I'd like to get your input before submitting a pull request to confirm there won't be unintended side effects. Or might be anther nice way to fix this?in /irods/data_object.py:
L219 in/irods/manager/data_object_manager.py:
The text was updated successfully, but these errors were encountered: