-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ALTREP serialization and deserialization methods are ignored #87
Comments
Hi, you are 100% correct. Last time I checked, there was no public C API to make generic ALTREP serialization work with qs. However, that may have changed and I will look into it again. I am a happily surprised that someone has made use out of alt-rep serialization. If you could point me to a reproducible example, that would help. |
Here I've created a small example: To try it out, install with: remotes::install_github("david-cortes/altreppedpointer") Then generate an object that won't be qs-serializable as follows: library(altreppedpointer)
obj <- get_custom_object()
library(qs)
fname <- file.path(tempdir(), "custom_obj.qs")
qsave(obj, fname)
obj2 <- qread(fname) And compare with: library(altreppedpointer)
obj0 <- get_custom_object()
fname2 <- file.path(tempdir(), "custom_obj.Rds")
saveRDS(obj0, fname2)
obj3 <- readRDS(fname2) If you check If you modify the source code to not have an |
Thanks for the example! Looking into this again: Ideally, I'd like to get access to the serialized state directly but as far as I can tell there's now way to access it. In R internally, that's done by calling SEXP state = ALTREP_SERIALIZED_STATE(s); which is not an exported function. One other potential approach is if there is someway to tell if the Let me know your thoughts. |
I haven't looked very deeply into it, but following up on that |
I really hate going through those macros, but the hint that it calls
So we can get serialized state now. It's pretty hack-y, but technically only uses exported API. CRAN might still complain, but I'll give it a shot! |
I've worked on this during the weekend and have run into a road block. Getting the serialized_state function from the function table and re-creating the state during unserialization works fine, but converting the state back to the original object does not seem to work. It looks like this: Serialization: Unserialization: Read from disk -> Altrep_info + serialized_state -?> ALTREP_class/lookup table -> unserialize -> Altrep_object The issue is during unserialization you don't have access to the ALTREP_class or function table (both are dynamically registered), so you can't find the proper unserialization function. Everything in altrep.c is statically linked, so I see no way of getting the required pointers. |
I haven't looked into it, but aren't the classes and methods tied to a given package namespace? When registering them through a call to e.g. |
Actually it looks like it might not be so easy as some functions have |
Once you register methods from inside your package, they are stored to a static linked object in altrep.c. From outside the package and from outside R internals, the only link to those methods I can find is from an existing Altrep object (which I wouldn't have during unserialization). Anyway, it's possible I'm missing something and I'd appreciate if you could take a second look! Below is a simplified version of serialization:
|
I thought of a workaround, which would be to register serialization methods with qs on load. Something like this:
But of course that would require knowing about ALTREP classes ahead of time, and I don't know of any real packages that make good use of ALTREP. |
But doesn't that still lead to the same problem of not being able to get |
@kalibera Apologies if this is not the right medium, but it looks like you have been involved quite a lot in this mechanism for ALTREP serialization in R. Would you be able to provide some pointers on how to de-serialize an ALTREP'd class in a package for serialization of R objects? |
My understanding: I can get "ARclass" (and serialization methods) from an existing ALTREP object. I can't get ARclass from just knowing the namespace and classname. I also can't save ARclass directly as it's a dynamic internal object. During deserialization, since I don't have an existing ALTREP object, I can't get the serialization methods. Looking up ARclass from namespace and classname is not part of the public API (a chicken and egg problem). |
Since it doesn't seem to be possible to get the ALTREP class from just the package name, how about defaulting to |
Unfortunately there are cases where Without any support from R core, the only solution is some sort of whitelist / onLoad (the idea above) or registration by the user. E.g.: Serialization:
De-serialization (new R instance)
|
I'm wondering how this sort of registration could work in practice.
It feels like the only way this registration would avoid the problem of non-installed packages would be by adding the registration call in the serialization method, but that sort of mechanism feels like a very awkward workaround and much harder to implement correctly. Do you think perhaps that packages could define some global variable like |
I'd like to avoid
Yes, in general, package authors really shouldn't be changing altrep serialization format. Regardless of using
It could also be done by the user. If left entirely up to the user instead of package authors, would that also address the concern of having extra "Suggests" and tests? Regardless, I don't know of any packages that benefit from ALTREP serialization, so right now there's no concern of extra Suggests. |
The thing with leaving it to users is that users might not be aware that some object will not serialize correctly with In terms of particular packages, you might see xgboost's v2 release using the same serialization logic as the earlier example: dmlc/xgboost#9810 As for current packages, there's this one package from myself for example: https://cran.r-project.org/web/packages/outliertree/index.html But it's not a popular package and I'm not aware of any other package on CRAN currently having a non-qs-compatible serialization mechanism. If you think that falling back to |
I asked Luke Tierney by email on accessing ALTREP serialization methods, and he pretty firmly said don't. Given that, I've used
Registration is needed for serialization and only requires the class and package names. Since I'm using base::unserialize, deserialization does not need anything special. I'd be happy to add xgboost (I use it all the time) and outliertree classes to qs onLoad. Since I'm only registering strings (by adding them to a static unordered_set) there's no issues with adding extra Suggests or issues with package load order. |
Thanks for looking into this. Can confirm that serialization for these objects works as expected on the current master branch. |
If one defines an ALTREP'd object for which there are serialization and unserialization methods (
R_set_altrep_Serialized_state_method
andR_set_altrep_Unserialize_method
), but no DATAPTR method,qs
will throw an error:In contrast,
saveRDS
andreadRDS
will both correctly serialize and deserialize the object.Furthermore, if the ALTREP'd object does define
DATAPTR
, thenqs
will save whatever is in thatDATAPTR
but will not call the serialization or de-serialization methods, resulting inqread
potentially returning an invalid object.This usage of ALTREP is particularly useful when defining external pointers to non-R objects that have custom serialization methods (e.g. statistical models where the result is a C++ object which is not directly translatable to R objects, but which can be interfaced through
externalptr
).The text was updated successfully, but these errors were encountered: