-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: new YaDeserialize interface #84
Comments
@scottlamb As far as adaptation_set: [
AdaptationSet {
representation_base: RepresentationBase {
attribute: RepresentationBaseAttributes {
profiles: None,
width: None,
height: None,
sar: None,
frame_rate: None,
audio_sampling_rate: None,
mime_type: None,
segment_profiles: None,
codecs: None,
maximum_sap_period: None,
start_with_sap: None,
max_playout_rate: None,
coding_dependency: None,
scan_type: None,
},
element: RepresentationBaseElements {
frame_packing: [],
audio_channel_configuration: [],
content_protection: [],
essential_property: [],
supplemental_property: [],
inband_event_stream: [],
},
}, and its (partial) code: #[derive(Default, Debug, Clone, PartialEq, YaSerialize, YaDeserialize)]
#[yaserde(
prefix = "mpd",
namespace = "mpd: urn:mpeg:dash:schema:mpd:2011",
default_namespace = "mpd"
)]
pub struct AdaptationSet {
#[yaserde(flatten)]
pub representation_base: RepresentationBase,
#[yaserde(flatten)]
pub attribute: AdaptationSetAttributes,
#[yaserde(flatten)]
pub element: AdaptationSetElements,
} One recent discovery, however, made my assumptions about the approach question its safety. I assumed, that if the underlying elements/attributes yield no response or existence, then the library would not deserialize those, and the need for How would I be able to associate namespaces for same tag under this proposal? In the case where I have one namespace for the entire document, but then someone tries to use it with a slightly different version of the same document, which has a different namespace. For example, the document is designed against Will it provide support for handling "Xlink" or that stays mainly unchanged and up to my "custom" processing? I'm totally in support of your proposal, it would greatly enhance the library. How does it change "serialization", if at all? |
@MarcAntoine-Arnaud I held off until now because I hadn't heard from you. But this is holding back my project, so I'm working through it. My question is: are you interested in this sort of change? If so, I'm happy to work with you. If not, I'll release my own crate. |
I have no idea what this project is about @scottlamb @MarcAntoine-Arnaud but I thought I'd make you aware of eminence/xmltree-rs#33 as you mentioned eminence/xmltree-rs#13 repeatedly. |
Rough proposal for a new
YaDeserialize
interface below. This started from my more modest proposal in #76. I found some other issues as I dug in more, and I was encouraged by @MarcAntoine-Arnaud 's "I'm quite open to big changes in this library". I've tried to start from first principles. I'd be very happy to get feedback and revise.My goal is to support ONVIF well, and I'm trying to use @DmitrySamoylov 's library for that purpose, so the examples here will likely be quite familiar to him.
We propose a new
YaDeserialize
interface with several benefits over the present one:YaDeserialize
implementation errors.xml-rs
andYaDeserialize
implementations.Proposed interface
Example generated code
Given the input below:
yaserde
might generate the followingYaDeserialize
implementations:Design points
Compile time and code bloat
The onvif-rs/schema crate is a heavy user of yaserde derive macros (containing 1,968
#[derive]
lines mentioningYaDeserialize
orYaSerialize
). By usingcargo -Ztimings
, we can see the schema crate's contribution to compilation times.Compilation times, measured with
cargo +nightly-2020-06-23 -Ztimings build
at revision 4a654a4:--debug
--release
We can also examine its contribution to the final binary's size.
cargo +nightly-2020-06-23 bloat --release --example camera --crates
attributes 940KiB to this crate.LTO makes surprisingly little difference given that many of the schema types should be unused by the example binary. With
lto = "fat"
, the samecargo bloat
command attributes 855KiB to the schema crate.These numbers suggest it should be a priority to reduce the lines of generated code, improving both compile time and binary size.
TODO: once implemented, verify the new interface actually improves this significantly...currently this is a leap of faith...
Structure flattening
yaserde
supports "flattening" structures and enums together via the#[yaserde(flatten)]
field attribute. This support is similar to serde's struct flatten. The structopt crate has a similar flattening feature.The xsd-parser-rs crate uses yaserde's flatten attribute to support placing an
enum
that represents a<xs:choice>
within astruct
that represents a<xs:sequence>
. This allows implementing the struct and enum with#[derive(YaDeserialize)]
. Without this feature, the struct would likely need a customYaDeserialize
implementation. Thus it's important to keep flatten support.The current implementation is to serialize a new XML document into a
Vec<u8>
. This document contains all the "unused" elements in the current struct. TheVec<u8>
is then parsed and used to deserialize each flattened field. This method contributes to code bloat, as well as being inefficient at runtime.We propose instead that the monolithic
YaDeserialize::deserialize
be broken apart. With one call per attribute or element, the parent struct and all flattened fields can be deserialized simultaneously without buffering. Each could be given the option to consume an attribute or element or pass the buck to the next.TODO: do we really need to support multiple flattened fields in a struct, or is one sufficient? The latter would allow us to simplify the interface by getting rid of the
Option
returns.<xsd:any>
supportSome XML schemas rely on
<xsd:any>
to function. For example, an ONVIF analytics configuration might contain the following snippet:The
ElementItem
is generic. It doesn't refer to theCellLayout
type.This is a poor schema design, and any generated code from it will be subpar. Validation could be much stronger, and automatically generated code much better, if onvif.xsd instead contained a specific element and type for a
CellMotionEngine
that mentioned the type of each element, rather than using a genericAnanlyticsModule
and expressing the type information only in a table in a PDF document. (In this case, ONVIF Analytics Service Specification Table B-3.)However, XML libraries aren't being written in 2020 to support new schemas with perfect designs. They're for interoperating with legacy schemas regardless of their flaws.
yaserde
users are encouraged to hand-rollYaDeserialize
implementations for these cases. In this example, callers would benefit from anenum AnalyticsModule
written by a human who has examined the PDF specification.Nonetheless, it's desirable to support automatic parsing of such schemas into usable, if untyped and unergonomic, representations. This would have two benefits:
GetAnalyticsConfiguration
request, make some modification to the config, and write it back with aSetAnalyticsConfiguration
without damaging other portions of the configuration.Debug
output would show the missing information, aiding development.It's also common for schemas to support both an explicit list of elements and the
xs:any
type for extensibility (both forward compatibility and third-party extensions). In this case, it'd be most beneficial to support a flattened any representation.We propose that yaserde provide (behind a feature flag) serialization into the xmltree::Element type. While this type has some limitations (eg eminence/xmltree-rs#13), it's a reasonable way to represent a generic XML document fragment and can be improved as necessary.
This is analogous to handling of unknown fields in other serialization formats, eg. protobuf.
xs:QName
Some schemas require interpreting a QName given in attribute values or character data. For example, an ONVIF metadata stream might begin as follows:
In this message,
tns1
refers to the namespacehttp://www.onvif.org/ver10/topics
via thexmls:tns1
declaration on the<tt:MetaDataStream>
start tag.A
YaDeserialize
implementation which receives this character data needs to have the namespace declarations available to it to parse properly.yaserde
should ensure this is supplied when parsing each element.yaserde
doesn't need to supply the actual implementation of qname interpretation. In fact, an ONVIF metadata stream parser may need to parse non-compliant documents such as the following:This document above is missing the critical
tns1
namespace mapping. Parsing it successfully requires falling back to an assumed mapping. Having such a mapping for each type of document would clearly be out-of-scope for a general-purpose de/ser-ializer.SOAP bodies
SOAP defines a body element which contains either a fault element or arbitrary, message-specific content (which can include attributes, elements, and character data). Typically a given operation's content is of a particular type specified in the service's WSDL.
The onvif-rs crate contains logic for deserializing a SOAP envelope and body for an arbitrary operation. It doesn't use
yaserde
. Instead, it usesxmltree
to parse the document. If the body contains a fault element, it deserializes it and returns error. Otherwise, it serializes the doucment again for the caller to pass toyaserde
.A more efficient, ergonomic interface would avoid the redundant deserialization and serialization. Instead, it would directly deserialize a succesful body into a caller-chosen
YaDeserialize
implementation.To avoid code bloat, the soap logic should not have to be monomorphized for each operation. Instead,
YaDeserialize
should be object-safe, and it should be possible to deserialize into a pre-exisitngSoapBody
object.The simple generated class below meets these requirements. A hand-rolled
YaDeserialize
could additionally enforce that a fault is mutually exclusive with attributes, other elements, and character data.We additionally observe that at least in the case of ONVIF, every body consists of only one child. A similar technique would allow implementing this as a single wrapper type without requiring monomorphization.
YaDeserialize
implementation errorsAs discussed in #76, there's no clear contract between child and parent elements today for the state of the
Deserializer
. Even if there were, there's no enforcement of this contract. It's possible to have bugs in which the child consumes elements that belong to the parent. These are difficult to debug.By introducing the
ElementReader
interface, we can make such bugs entirely impossible.Error type
By passing back an error type that can wrap a
Box<::std::error::Error + Send + Sync + 'static>
, we can allow the caller to see more richer information fromYaDeserialize
implementations. This might include a backtrace or an error type enum.Similarly, by passing back an
::xml::reader::Error
, we can allow callers to see the XML parser's position information.We might also augment errors with the stack of XML elements as of when the error is generated.
Asynchronous incremental parsing
The Rust ecosystem is moving toward asynchronous handling of network streams. It'd be nice to match this by supporting parsing an XML stream in chunks without tying up a thread. Currently this doesn't seem to be possible with any of the popular XML parser libraries:
xml::reader::Events
pulls from a synchronousstd::io::Read
implementation.quick_xml::Reader
pulls from astd::io::BufReader
.xmlparser::Tokenizer
expects the full document in a single&str
.xml5ever's
xml5ever::driver::XmlParser
provides a push interface that could be useful for this purpose. However, this crate is advertised as "alpha quality".Short of using this alpha-quality crate, asynchronous operation doesn't seem possible today. Nonetheless, let's look at what an ideal interface might look like.
A half-parsed state is a stack of
YaDeserialize
implementations, each likely owned by the previous one. This can't be represented as aVec<&mut YaDeserialize>
due to the self-referential ownership. It can be represented as a stack of recursiveasync fn
invocations. Thus, the most logical interface for this use case is a straightforward adaptation of the synchronous recursive interface.We propose implementing the synchronous interface today. The interface can be converted to asynchronous later when it's needed and when the xml parsing libraries are ready.
The text was updated successfully, but these errors were encountered: