-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Progress on rasdaman (Deep Learning) UDFs #2
Comments
@KathiSchleidt as of right now we are still working on the following:
We will keep you updated with our results as they come. |
More generally (and maybe contained in points 2&3), how can a user see what UDF are available? Or can users only access their own UDF? |
More generally (and maybe contained in points 2&3), how can a user see what UDF are available? |
On providing a listing of available UDF, to my view, WCPS getCapabilitities would be my first candidate, in addition to exposing via the processing resource metadata. Please include me on the call sorting this! On all users being able to access existing UDF, works for me. We should check with the UC partners just to be sure, but pretty sure we won't have the issues we have with sensitive data on sensitive models. |
ML models trained on sensitive data might need restricted access as well. For instance depending on the user agreement of the data (what derived products are allowed, often not clearly specified for ML models), or wether the training of the model has sufficiently hidden the sensitive (input) data points (otherwise an ML expert might be able to extract them from the model, as a kind off reverse engineering). |
@ocampos16 Out of curiosity (also relates to 'how to catalogue' and 'what might be restricted'): Do you intend to treat a trained model as a whole, or to split it up into the computational graph and the trained parameters? |
@robknapen (chiming in here) dissecting a model is a rabbit hole from our perspective, and I can see no advantage - we would treat a model always as a black box. |
Accepted, at some time access control will be necessary - just not at this stage where we have only 1 anyway :) |
@robknapen turning @pebau statement around, do you see a situation where we provide the same model with 2 sets of trained parameters? |
Sure, for example the same CNN model that we used so far can be trained for other (semantic segmentation) tasks (similar though, since the model architecture expects 28 features as input), or it can be trained for a different region. Both would use the same model architecture (= computational graph), but learn different weights. Splitting these two is the basis for what is known as transfer learning in ML. So for inference you can have a model architecture and load it with matching weights and biases for a number of similar prediction tasks. [For sure this is more difficult to implement than a pure black box approach and there might be no short term benefits.] Libraries such as Tensorflow, Keras, and PyTorch all have methods that support this type of working with deep learning models. The usually long training times makes it a rather common approach to quickly start experimenting. |
status: pytorch-based UDFs work, Jupyterhub almost installed (need Rob's help for completion -> Mohit will contact) |
@robknapen am I correct that if you have a model trained on 2 different datasets, you'd provide this as 2 different models (most of the info the same, but different input data, maybe different spatial validity)? |
@KathiSchleidt Yes, the models learn to represent the different datasets. When they are 'too different', it will result in distinct models. When the datasets are different but still similar, a single, more robust, model can be trained on them. So there can be exceptions :-) |
@robknapen any insight as to what impact these exceptions have on the a/p resource metadata? There, we have the following fields forseen:
Can you use these to describe what you'd need to know? |
@KathiSchleidt I think so. In some cases I would mention an existing (trained) model (or its saved weights) as ‘input data’, and use ‘characteristics’ to explain how it was used. (Maybe we need a better minimum length for ‘characteristics’? 1 Character doesn’t seem very helpful to me. I would prefer either 0, or enforce some longer text (200+ characters?).) |
|
@KathiSchleidt Yes, we can split it into configuration/initialisation data and input (training) data, to make the difference in purpose more clear. |
When the MD is displayed in the catalog, this solution turns out in what can be seen in pic below @robknapen @KathiSchleidt does this work for you? |
Summarizing the status of rasdaman UDFs:
Let me know if you feel something missing on pytorch UDFs. |
Jivitesh is now assigned to look into the python UDF implementation (testing and verification). this will provide another UC view and can serve as validation. |
in light of the new issue which formulate the requirements for more ML models in short I will close this ticket. |
What's the status on creating rasdaman UDFs?
The requirements were discussed in Bremen, should be clear. If not, please ask!
Details in the UC2 presentation from Bremen.
The text was updated successfully, but these errors were encountered: