Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: n-dimensional support #20

Closed
wants to merge 6 commits into from

Conversation

JonjonHays
Copy link
Contributor

This PR is in-progress, however it could potentially be merged if necessary/desired.

Changes in this PR

N-Dimensional Support for the Back-end Server Extension

  • Replaced the row and col query parameters with a single select parameter
    • The select param is a string and uses the same syntax and semantics as h5py, and h5serv. E.g, ?select=[0:10, 5:10, :, 7]. select determines an n-dimensional hyperslab selection of a dataset.
  • select is always optional. If it is not specified for calls to the data endpoint, the entire dataset will be returned. If it is not specified for calls to the contents endpoint, the data property of the returned dsetContentDict will be None. It might make sense to change this last rule so there is an easy way to have the entire dataset included in the dsetContentDict, rather than having to construct the query string. Maybe select=all, or a boolean query param on the contents endpoint that indicates we want actual data included in the response.
  • The api.yaml file has not been updated to reflect these changes yet. I imported the current file in Swagger's online editor, but I was getting a lot of compiling errors. Are these errors expected? Is the Swagger online editor a good way to go about this? (this is my first experience working with an OpenApi specification)

Front-end Compatibility for the N-Dimensional Back-end Changes

  • Minimal changes to the front end to restore working state after making changes to the back-end
  • These changes don't actually implement n-dimensional support to the front end, but they do (as far as I know) restore existing functionality (i.e, displaying 2d data and 2d slicing via the Slice UI field)

Next Steps

N-Dimensional Support for the Front-end JupyterLab Extension

I haven't gone too deep into the front end just yet, so I'm mostly including this section as a placeholder. Please feel free to edit with a more comprehensive roadmap.

  • Currently the extension is requesting extraneous data on higher dimensions, and it’s causing some hefty lag. This will be resolved if we implement a default slice, as @telamonian suggested, that selects the full length of the first two dimensions, and the 0th index of all higher dimensions.
    • It might make sense to disallow slices that request extraneous data. E.g, [:, :, :, 0, 0, etc].
  • Tentative: Modify/extend the ISlice interface to handle the construction of the select query string.
  • ...

@JonjonHays
Copy link
Contributor Author

A bit of a followup: it turns out my time is needed elsewhere on our project since we need to get to pilot by December, so I unfortunately won't have time to implement n-d support for the frontend plugin. Of course, feel free to request changes on this PR, and ping me with any questions on my changes. I plan to write some unit tests for the server extension at some point, and I will push those after I get them in place. Cheers!

Copy link
Member

@telamonian telamonian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! It's great that we were both thinking along the same lines in terms of implementing nD support. There's a lot of cool stuff that I think this extension will be able to do, but it's going to take a lot of work, and I would be very happy to have you involved as a member of the jupyterlab-hdf5 community.

Overall I think that the thrust of this PR is good. My one big problem with it is how throwing all of the slice info into a single string in the API makes it so that a complex parser is needed in the backend.

Partly, I think the problem here is that we're trying to shove a list-of-slice into a uri query parameter. It's not a great fit. I've now come up with a simpler API; basically, the list-of-slice will instead be passed around as a JSON object within the body of a POST request. The new API also now has some interactive docs.

try:
self.finish(json.dumps(self.manager.get(path, uri, row, col)))

self.finish(simplejson.dumps(self.manager.get(path, uri, select), ignore_nan=True))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The NaN handling issue was a good catch.

Ultimately I want to try the plan that @kgryte sketched out in #22, but that can be a future PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I trust that the reviver approach is probably the best option for handling this on the front end, but is there a reason to avoid handling it (or an option to) on the server?

msg = "Bad Request: invalid selection parameter (can't convert to int) for dimension: " + str(dim)
raise HTTPError(400, reason=msg)

if start < 0 or start > extent:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's a good example of why you generally shouldn't write your own parsing function. If we're claiming to support Numpy-like syntax, then both negative indices and indices out of bounds should be treated as valid in slices.

@@ -55,3 +65,88 @@ def uriJoin(*parts):

def uriName(uri):
return uri.split('/')[-1]


def _getHyperslabSlices(dsetshape, select):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of my big design goals in how I put the frontend, API, and backend together was to avoid the need for any complex parsing function. They're never worth the trouble, there's almost always an edge case or three that you miss, and maintenance is a pain.

Instead, what I'd really like to do is make sure that the frontend is structured such that the slices it asks for don't require anything other than some simple splitting, if even that. I was even thinking that it might be good to split up the slice input box into n segments given a nD dataset (or it might be really annoying; I need to try that one out). The job of the API and backend is then to not lose the information that gets handed to them (by, eg concating all the info on all the slices into a single string).

If you do ever really want a parsing function like this, I'd strongly recommend against rolling your own. Instead, you should search for prior art, or even better, come up with something that works off of ast.literal_eval (which is safe, so long as you can tolerate the possibility of a user causing a segfault at will).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I think sending a list of slices in a post body as you've proposed is a cleaner approach. The idea for the select query param came directly from the HDF group's API, so it seemed like a good idea at the time. The complexity of the parser did get out of hand pretty quick, and as you mentioned is error prone.

// require the uri, row, and col query parameters
const { fpath, uri, row, col } = parameters;
// require the uri query parameter, select is optional
const { fpath, uri, ...select } = parameters;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ... doesn't really mean optional in typescript. Check out https://basarat.gitbooks.io/typescript/docs/spread-operator.html

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops. I'm familiar with spread syntax -- not sure why I have it there!


if not select.startswith('['):
msg = "Bad Request: selection query missing start bracket"
raise HTTPError(400, reason=msg)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's good practice/encapsulation for a function like this to raise semantic errors (in this case, probably a ValueError). It would then be the responsibility of eg HdfBaseManager (the thing that's actually interacting directly with the HTTP requests) to catch said semantic errors and raise HTTPError as appropriate

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that makes sense, good call!

@JonjonHays
Copy link
Contributor Author

Thanks for the very thoughtful feedback @telamonian! I'll look into this more and give a proper reply when I have a bit more time, cheers!

@JonjonHays
Copy link
Contributor Author

JonjonHays commented Dec 8, 2019

Let me know if I should make any fixes to the existing code added in this PR. I'm thinking not since we will be ditching the parser and select param.

@telamonian telamonian mentioned this pull request Oct 12, 2020
@telamonian
Copy link
Member

If you do ever really want a parsing function like this, I'd strongly recommend against rolling your own. Instead, you should search for prior art, or even better, come up with something that works off of ast.literal_eval (which is safe, so long as you can tolerate the possibility of a user causing a segfault at will).

So I finally went and wrote a proper AST parser for arbitrary numpy-style index expressions: Quansight-Labs/ndindex#91. Turns out that parsing a formal grammar is actually pretty straightforward, and I was able to do most of what I needed just by copying the implementation of ast.parse_literal and then making some tweaks. Best of all, since this will be a contribution to the ongoing ndindex project there will be a number of very component folks helping me to validate the basic functionality of literal_eval_index.

@JonjonHays So what I'm going to do is cherry pick a couple of the commits from this PR (in particular the json fixes, nice work btw), and then rewrite a bunch of stuff to take full advantage of literal_eval_index

@JonjonHays
Copy link
Contributor Author

So I finally went and wrote a proper AST parser for arbitrary numpy-style index expressions: Quansight/ndindex#91. Turns out that parsing a formal grammar is actually pretty straightforward, and I was able to do most of what I needed just by copying the implementation of ast.parse_literal and then making some tweaks. Best of all, since this will be a contribution to the ongoing ndindex project there will be a number of very component folks helping me to validate the basic functionality of literal_eval_index.

@JonjonHays So what I'm going to do is cherry pick a couple of the commits from this PR (in particular the json fixes, nice work btw), and then rewrite a bunch of stuff to take full advantage of literal_eval_index

Nice @telamonian! Great to see this work out and continue moving along. Hope all is well -- cheers Max!

@telamonian
Copy link
Member

The new PR over at #41 has now implemented fully working n-dimensional support in both frontend and backend, so I'll close this PR in favor of that one

@telamonian telamonian closed this Nov 20, 2020
@telamonian telamonian added this to the 0.5.0 milestone Dec 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants