Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement XPath on traits instead of concrete types #120

Open
vandenoever opened this issue Sep 26, 2017 · 9 comments
Open

Implement XPath on traits instead of concrete types #120

vandenoever opened this issue Sep 26, 2017 · 9 comments

Comments

@vandenoever
Copy link

vandenoever commented Sep 26, 2017

sxd-xpath works with sxd-dom. Data needs to be converted to an sxd-dom before an XPath can be run on it.

If sxd-path would work on traits, it could be used on any data structure that implements those traits.

The traits might look something like this:

pub trait Node<'a> {
    fn as_attribute(&self) -> Option<&Self>
    where
        Self: Attribute<'a>,
    {
        None
    }
    fn as_element(&self) -> Option<&Self>
    where
        Self: Element<'a>,
    {
        None
    }
    fn as_text(&self) -> Option<&Self>
    where
        Self: Text<'a>,
    {
        None
    }
}

pub trait QName {
    fn namespace_uri(&self) -> Option<&str>;
    fn local_part(&self) -> &str;
}

pub trait NamedNode<'a>: Node<'a> {
    type QName: QName;
    fn name(&self) -> &QName;
}

pub trait Attribute<'a>: NamedNode<'a> {
    type AttributeValue: Into<String>;
    fn value(&self) -> &Self::AttributeValue;
}

pub trait Element<'a>: NamedNode<'a> {
    type Attribute: Attribute<'a> + 'a;
    type AttributeIter: Iterator<Item = &'a Self::Attribute>;
    type Child: Node<'a> + 'a;
    type ChildIter: Iterator<Item = &'a Self::Child>;

    fn attributes(&'a self) -> Self::AttributeIter;
    fn children(&'a self) -> Self::ChildIter;
}

pub trait Text<'a>: Node<'a> {
    fn data(&self) -> &str;
}
@leoschwarz
Copy link

I guess the biggest motivation for doing this would be decoupling the XPath parser and evaluator from sxd-document so that it could be used with other backends too, I suspect the main motivation would be to be able to parse huge documents not fitting into memory? I don't think the use case of using XPath against any data is as common as that it would justify such a change in itself (abstraction like this makes code more complex), and I think having anyone wanting something like that create a document manually is the most reasonable decision.

I wonder if this is the right approach here though, since I don't know how feasible it is to evaluate XPath without a DOM, I can see a lot of complications as it's possible to have both forward and backward dependency in XPath queries. If there is already an example of a library providing this or a specification of how this would have to be done properly, that would be really valuable.

@shepmaster
Copy link
Owner

so that it could be used with other backends too

The biggest I've heard of would be html5ever, which is indeed a DOM structure.

huge documents not fitting into memory

Having a "streaming XPath" is a truly interesting idea, but I'm not sure how one would go about it. As you mention:

it's possible to have both forward and backward dependency in XPath queries

It's definitely not possible for an arbitrary XPath to be applied in such a manner, so we'd have to either limit the input or determine if a given XPath is "streamable".

an example of a library providing this or a specification of how this would have to be done properly, that would be really valuable.

Agreed.

@shepmaster
Copy link
Owner

If someone really did want to apply these against html5ever, I think the strongest path would be to spin up a branch that just wildly hacks this crate to work against those nodes. That would give very concrete ideas to what kind of abstraction is needed.

@vandenoever
Copy link
Author

The C++ library Qt supports XQuery (and XPath) on classes that derive from QAbstractXmlNodeModel.

http://doc.qt.io/qt-5/qabstractxmlnodemodel.html

http://doc.qt.io/qt-5/xquery-introduction.html

I suspect the main motivation would be to be able to parse huge documents not fitting into memory?

A backend that can place cursors in enormous documents would allow this. This might have indexes on nodes. XML databases do this.

@shepmaster
Copy link
Owner

shepmaster commented Apr 26, 2018

on classes that derive from QAbstractXmlNodeModel.

Do you know of any other concrete implementations of that base model? I see QSimpleXmlNodeModel, but is there a way to tell if this is used anywhere else?

@vandenoever
Copy link
Author

There is one for HTML documents: https://github.com/jgehring/qhtmlnodemodel

Qt comes with an example for file trees: https://code.woboq.org/qt5/qtxmlpatterns/examples/xmlpatterns/filetree/filetree.cpp.html

Here's a blog with the rationale for the use of an abstract node model: https://englich.wordpress.com/2007/11/15/query-your-toaster/

@shepmaster
Copy link
Owner

Cool, thank you! What was your specific usecase that made you originally open this issue?

@vandenoever
Copy link
Author

KDE has a few uses of it. One maps binary MS Office documents to a QAbstractXmlNodeModel.

https://lxr.kde.org/ident?_i=QAbstractXmlNodeModel

https://lxr.kde.org/source/playground/libs/binschema/cpp/msoxmlnodemodel.cpp

@vandenoever
Copy link
Author

I was thinking of doing some XPath code and noticed quite a few XML implementations in Rust. Quite a few developers have started XML parsers and doms with different trade-offs. For each of them, adding XPath is quite a task. For developers that want to use XPath in Rust code, there's not so much choice.

My concrete use case at the time was working with gigabyte spreadsheets. I ended up parsing into a special struct and had to forgo the convenience of xsd-xpath.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants