-
-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: n-dimensional support #20
Conversation
A bit of a followup: it turns out my time is needed elsewhere on our project since we need to get to pilot by December, so I unfortunately won't have time to implement n-d support for the frontend plugin. Of course, feel free to request changes on this PR, and ping me with any questions on my changes. I plan to write some unit tests for the server extension at some point, and I will push those after I get them in place. Cheers! |
…e hyperslabSlices list instead of a slice object
…re_nan flag when serializing the payload
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! It's great that we were both thinking along the same lines in terms of implementing nD support. There's a lot of cool stuff that I think this extension will be able to do, but it's going to take a lot of work, and I would be very happy to have you involved as a member of the jupyterlab-hdf5
community.
Overall I think that the thrust of this PR is good. My one big problem with it is how throwing all of the slice info into a single string in the API makes it so that a complex parser is needed in the backend.
Partly, I think the problem here is that we're trying to shove a list-of-slice into a uri query parameter. It's not a great fit. I've now come up with a simpler API; basically, the list-of-slice will instead be passed around as a JSON object within the body of a POST request. The new API also now has some interactive docs.
try: | ||
self.finish(json.dumps(self.manager.get(path, uri, row, col))) | ||
|
||
self.finish(simplejson.dumps(self.manager.get(path, uri, select), ignore_nan=True)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I trust that the reviver approach is probably the best option for handling this on the front end, but is there a reason to avoid handling it (or an option to) on the server?
msg = "Bad Request: invalid selection parameter (can't convert to int) for dimension: " + str(dim) | ||
raise HTTPError(400, reason=msg) | ||
|
||
if start < 0 or start > extent: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's a good example of why you generally shouldn't write your own parsing function. If we're claiming to support Numpy-like syntax, then both negative indices and indices out of bounds should be treated as valid in slices.
@@ -55,3 +65,88 @@ def uriJoin(*parts): | |||
|
|||
def uriName(uri): | |||
return uri.split('/')[-1] | |||
|
|||
|
|||
def _getHyperslabSlices(dsetshape, select): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of my big design goals in how I put the frontend, API, and backend together was to avoid the need for any complex parsing function. They're never worth the trouble, there's almost always an edge case or three that you miss, and maintenance is a pain.
Instead, what I'd really like to do is make sure that the frontend is structured such that the slices it asks for don't require anything other than some simple splitting, if even that. I was even thinking that it might be good to split up the slice input box into n segments given a nD dataset (or it might be really annoying; I need to try that one out). The job of the API and backend is then to not lose the information that gets handed to them (by, eg concating all the info on all the slices into a single string).
If you do ever really want a parsing function like this, I'd strongly recommend against rolling your own. Instead, you should search for prior art, or even better, come up with something that works off of ast.literal_eval
(which is safe, so long as you can tolerate the possibility of a user causing a segfault at will).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. I think sending a list of slices in a post body as you've proposed is a cleaner approach. The idea for the select
query param came directly from the HDF group's API, so it seemed like a good idea at the time. The complexity of the parser did get out of hand pretty quick, and as you mentioned is error prone.
// require the uri, row, and col query parameters | ||
const { fpath, uri, row, col } = parameters; | ||
// require the uri query parameter, select is optional | ||
const { fpath, uri, ...select } = parameters; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ...
doesn't really mean optional in typescript. Check out https://basarat.gitbooks.io/typescript/docs/spread-operator.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops. I'm familiar with spread syntax -- not sure why I have it there!
|
||
if not select.startswith('['): | ||
msg = "Bad Request: selection query missing start bracket" | ||
raise HTTPError(400, reason=msg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's good practice/encapsulation for a function like this to raise semantic errors (in this case, probably a ValueError
). It would then be the responsibility of eg HdfBaseManager
(the thing that's actually interacting directly with the HTTP requests) to catch said semantic errors and raise HTTPError
as appropriate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that makes sense, good call!
Thanks for the very thoughtful feedback @telamonian! I'll look into this more and give a proper reply when I have a bit more time, cheers! |
Let me know if I should make any fixes to the existing code added in this PR. I'm thinking not since we will be ditching the parser and |
So I finally went and wrote a proper AST parser for arbitrary numpy-style index expressions: Quansight-Labs/ndindex#91. Turns out that parsing a formal grammar is actually pretty straightforward, and I was able to do most of what I needed just by copying the implementation of @JonjonHays So what I'm going to do is cherry pick a couple of the commits from this PR (in particular the json fixes, nice work btw), and then rewrite a bunch of stuff to take full advantage of |
Nice @telamonian! Great to see this work out and continue moving along. Hope all is well -- cheers Max! |
The new PR over at #41 has now implemented fully working n-dimensional support in both frontend and backend, so I'll close this PR in favor of that one |
This PR is in-progress, however it could potentially be merged if necessary/desired.
Changes in this PR
N-Dimensional Support for the Back-end Server Extension
row
andcol
query parameters with a singleselect
parameterselect
param is a string and uses the same syntax and semantics as h5py, and h5serv. E.g,?select=[0:10, 5:10, :, 7]
.select
determines an n-dimensional hyperslab selection of a dataset.select
is always optional. If it is not specified for calls to thedata
endpoint, the entire dataset will be returned. If it is not specified for calls to thecontents
endpoint, thedata
property of the returneddsetContentDict
will beNone
. It might make sense to change this last rule so there is an easy way to have the entire dataset included in thedsetContentDict
, rather than having to construct the query string. Maybeselect=all
, or a boolean query param on thecontents
endpoint that indicates we want actual data included in the response.api.yaml
file has not been updated to reflect these changes yet. I imported the current file in Swagger's online editor, but I was getting a lot of compiling errors. Are these errors expected? Is the Swagger online editor a good way to go about this? (this is my first experience working with an OpenApi specification)Front-end Compatibility for the N-Dimensional Back-end Changes
Slice
UI field)Next Steps
N-Dimensional Support for the Front-end JupyterLab Extension
I haven't gone too deep into the front end just yet, so I'm mostly including this section as a placeholder. Please feel free to edit with a more comprehensive roadmap.
select
query string.