You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The current API allows customizing parsing behavior using the TreeAdapter and TreeAdapterTypeMap interfaces, which I think is very cool. However the type declarations on these interfaces are overly relaxed; in particular, the parsing logic implicitly expects certain node types to be related to each other (e.g., every element is a parentNode), but these relationships are not at all enforced by the TreeAdapter type declarations. Consequently, the types of arguments passed by the parse5 internal parsing code to the custom TreeAdapter may have completely inconsistent types with those expected by the TreeAdapter's own type declarations.
extreme example
To illustrate my point with an extreme example, consider the following, which is semantically not a faithful/good tree adapter but nevertheless type-checks without error (on parse5 version 7.1.2, TS 5.4.5). In particular, each "node type" is just a literal string type.
import*asP5from"parse5";constminimalAdapter: P5.TreeAdapter<{childNode: "child";commentNode: "comment";document: "document";documentFragment: "fragment";documentType: "doctype";element: `element/${string}`;node: "comment"|"text"|"doctype"| `element/${string}`;parentNode: "parent";template: "template";textNode: "text";}>={adoptAttributes(){},appendChild(){},createCommentNode(){return"comment";},createDocument(){return"document";},createDocumentFragment(){return"fragment";},createElement(tag){console.log("create element",tag);return`element/${tag}`;},// `createTextNode` is part of the interface in parse5 master branch// but not yet in 7.1.2 release, which is what i'm testing on// createTextNode(value: string){ return "text"; },detachNode(){},getAttrList(){return[];},getChildNodes(){return[];},getCommentNodeContent(){return"(commentContent)";},getDocumentMode(){returnP5.html.DOCUMENT_MODE.NO_QUIRKS;},getDocumentTypeNodeName(){return"(doctypeName)";},getDocumentTypeNodePublicId(){return"(docTypePublicId)";},getDocumentTypeNodeSystemId(){return"(docTypeSystemId)";},getFirstChild(){returnnull;},getNamespaceURI(){returnP5.html.NS.HTML;},getNodeSourceCodeLocation(){returnnull;},getParentNode(){returnnull;},getTagName(){return"(tagName)";},getTemplateContent(){return"fragment";},getTextNodeContent(){return"(textContent)";},insertBefore(){},insertText(parent: "parent",text){console.log("insert text:",{ parent, text });},insertTextBefore(){},isCommentNode(node): node is "comment"{returnnode==="comment";},isDocumentTypeNode(node): node is "doctype"{returnnode==="doctype";},isElementNode(node): node is `element/${string}` {returnnode.startsWith("element/");},isTextNode(node): node is "text"{returnnode==="text";},setDocumentMode(){},setDocumentType(){},setNodeSourceCodeLocation(){},setTemplateContent(){},updateNodeSourceCodeLocation(){},};P5.parseFragment("foobar",{treeAdapter: minimalAdapter});
extreme example explanation
Pay attention specifically to:
the parentNode type (which is just the literal string type, "parent")
which is inconsistent with the type declarations, even though there is no type-checking error. This happens because the actual way that this parsing behind the scenes looks something like this (not exactly; I'm omitting some details to keep this discussion a little simpler, but I'm pretty sure the basic idea is right):
First call createElement to create a "mock" html element.
Then call insertText with the element we just created, and the text "foobar" (from the input).
The problem here is that createElement produces a node of type element, while insertText expects a node of type parentNode, but there is no requirement that all elements are actually parentNodes (in technical lingo, element needs to be a subtype of a.k.a. extendparentNode), and indeed, that is the requirement we constructed this example to violate.
related problems
The example above showcases just one of many unspoken type requirements. There are probably many, but here are just a few more I can think of:
templates are created using createElement as well, but then passed into setTemplateContent, which expects its input to be type template. By the reasoning I gave above this would require that all elements are templates, which doesn't make sense, so something else needs to be fixed to address this type-soundness issue besides merely imposing constraints on the types.
documents are also parentNodes, I think
similar issues with childNode and appendChild
who cares?
You could argue that my example is pathological and unrealistic, and that's true. Indeed, most of these unspoken constraints are satisfied just fine by the default, built-in tree adapter, so for most people these type unsoundness errors are moot. Still, I think that if we're going to go to these lengths (custom tree adapters & adapter types) to make parse5 customizable anyway, then we might as well do it right, and currently the types are not right.
how do we fix it?
I'm not totally sure! Some of the issues are partially addressed by adding extends constraints to ensure that, e.g., Element extends ParentNode. However, adding these constraints to TreeAdapterTypeMap's type parameters is insufficient because this way they can be bypassed simply by directly constructing the type map via a literal, which is already what my example above does. So to really enforce these constraints they have to occur directly on TreeAdapter itself, which would require a very annoying rewrite that lifts all the node type parameters directly onto the TreeAdaptertype itself. That would make the code a lot more verbose and break backwards-compatibility.
On top of that, it also doesn't address the issue of templates, which arise from technically unsound use of createElement to create a template node type, given specific input arguments. Simply constraining Element extends Template can address the type errors but doesn't make sense and conceptually isn't what we want; the more correct fix would be instead to have a completely separate createTemplateNode method and use that in the parsing logic instead when constructing templates. Again, API-breaking change.
Anyway, I don't have a simple fix that doesn't break stability in lots of ways, but I'm raising this issue anyway for discussion because there is a real typing issue, and I do care a lot about types. If we ultimately don't implement/enforce these type constraints/requirements in TypeScript because of stability or other technical reasons, I think they deserve at least a mention in the documentation so other folks trying to write their own TreeAdapter know what compatibility requirements they need in order to avoid running into unexpected type errors. :)
The text was updated successfully, but these errors were encountered:
The types aren't quite right anyway, outside of all this. The concept of a type map has some problems and weakens types in some situations
A parent node is meant to be a union of all the possible types which have parents. That shouldn't really change but we could possibly introduce a base type (like WithParent)
Meanwhile, template inherits element currently (and probably should)
Ultimately I think the actual fix one day is to make tree adapters less dynamic/flexible and require them to implement certain fundamental types
Id be open to contributions here as I won't have much time over the next couple of months
summary
The current API allows customizing parsing behavior using the
TreeAdapter
andTreeAdapterTypeMap
interfaces, which I think is very cool. However the type declarations on these interfaces are overly relaxed; in particular, the parsing logic implicitly expects certain node types to be related to each other (e.g., everyelement
is aparentNode
), but these relationships are not at all enforced by theTreeAdapter
type declarations. Consequently, the types of arguments passed by the parse5 internal parsing code to the customTreeAdapter
may have completely inconsistent types with those expected by theTreeAdapter
's own type declarations.extreme example
To illustrate my point with an extreme example, consider the following, which is semantically not a faithful/good tree adapter but nevertheless type-checks without error (on parse5 version 7.1.2, TS 5.4.5). In particular, each "node type" is just a literal string type.
extreme example explanation
Pay attention specifically to:
parentNode
type (which is just the literal string type,"parent"
)insertText
function, which per the official parse5 documentation takes in two arguments:T["parentNode"]
is, per the last bullet point, the literal string type"parent"
. In the code example, we explicitly annotate this:However, running the code results in the following output:
which is inconsistent with the type declarations, even though there is no type-checking error. This happens because the actual way that this parsing behind the scenes looks something like this (not exactly; I'm omitting some details to keep this discussion a little simpler, but I'm pretty sure the basic idea is right):
createElement
to create a "mock"html
element.insertText
with the element we just created, and the text"foobar"
(from the input).The problem here is that
createElement
produces a node of typeelement
, whileinsertText
expects a node of typeparentNode
, but there is no requirement that allelement
s are actuallyparentNode
s (in technical lingo,element
needs to be a subtype of a.k.a. extendparentNode
), and indeed, that is the requirement we constructed this example to violate.related problems
The example above showcases just one of many unspoken type requirements. There are probably many, but here are just a few more I can think of:
template
s are created usingcreateElement
as well, but then passed intosetTemplateContent
, which expects its input to be typetemplate
. By the reasoning I gave above this would require that allelement
s aretemplate
s, which doesn't make sense, so something else needs to be fixed to address this type-soundness issue besides merely imposing constraints on the types.document
s are alsoparentNode
s, I thinkchildNode
andappendChild
who cares?
You could argue that my example is pathological and unrealistic, and that's true. Indeed, most of these unspoken constraints are satisfied just fine by the default, built-in tree adapter, so for most people these type unsoundness errors are moot. Still, I think that if we're going to go to these lengths (custom tree adapters & adapter types) to make parse5 customizable anyway, then we might as well do it right, and currently the types are not right.
how do we fix it?
I'm not totally sure! Some of the issues are partially addressed by adding
extends
constraints to ensure that, e.g.,Element extends ParentNode
. However, adding these constraints toTreeAdapterTypeMap
's type parameters is insufficient because this way they can be bypassed simply by directly constructing the type map via a literal, which is already what my example above does. So to really enforce these constraints they have to occur directly onTreeAdapter
itself, which would require a very annoying rewrite that lifts all the node type parameters directly onto theTreeAdapter
type itself. That would make the code a lot more verbose and break backwards-compatibility.On top of that, it also doesn't address the issue of
template
s, which arise from technically unsound use ofcreateElement
to create atemplate
node type, given specific input arguments. Simply constrainingElement extends Template
can address the type errors but doesn't make sense and conceptually isn't what we want; the more correct fix would be instead to have a completely separatecreateTemplateNode
method and use that in the parsing logic instead when constructing templates. Again, API-breaking change.Anyway, I don't have a simple fix that doesn't break stability in lots of ways, but I'm raising this issue anyway for discussion because there is a real typing issue, and I do care a lot about types. If we ultimately don't implement/enforce these type constraints/requirements in TypeScript because of stability or other technical reasons, I think they deserve at least a mention in the documentation so other folks trying to write their own
TreeAdapter
know what compatibility requirements they need in order to avoid running into unexpected type errors. :)The text was updated successfully, but these errors were encountered: