Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TreeAdapter interface types are unsound #1253

Open
kwshi opened this issue Sep 1, 2024 · 1 comment
Open

TreeAdapter interface types are unsound #1253

kwshi opened this issue Sep 1, 2024 · 1 comment

Comments

@kwshi
Copy link

kwshi commented Sep 1, 2024

summary

The current API allows customizing parsing behavior using the TreeAdapter and TreeAdapterTypeMap interfaces, which I think is very cool. However the type declarations on these interfaces are overly relaxed; in particular, the parsing logic implicitly expects certain node types to be related to each other (e.g., every element is a parentNode), but these relationships are not at all enforced by the TreeAdapter type declarations. Consequently, the types of arguments passed by the parse5 internal parsing code to the custom TreeAdapter may have completely inconsistent types with those expected by the TreeAdapter's own type declarations.

extreme example

To illustrate my point with an extreme example, consider the following, which is semantically not a faithful/good tree adapter but nevertheless type-checks without error (on parse5 version 7.1.2, TS 5.4.5). In particular, each "node type" is just a literal string type.

import * as P5 from "parse5";

const minimalAdapter: P5.TreeAdapter<{
  childNode: "child";
  commentNode: "comment";
  document: "document";
  documentFragment: "fragment";
  documentType: "doctype";
  element: `element/${string}`;
  node: "comment" | "text" | "doctype" | `element/${string}`;
  parentNode: "parent";
  template: "template";
  textNode: "text";
}> = {
  adoptAttributes() {},
  appendChild() {},
  createCommentNode() {
    return "comment";
  },
  createDocument() {
    return "document";
  },
  createDocumentFragment() {
    return "fragment";
  },
  createElement(tag) {
    console.log("create element", tag);
    return `element/${tag}`;
  },
  // `createTextNode` is part of the interface in parse5 master branch
  // but not yet in 7.1.2 release, which is what i'm testing on
  //   createTextNode(value: string){ return "text"; },
  detachNode() {},
  getAttrList() {
    return [];
  },
  getChildNodes() {
    return [];
  },
  getCommentNodeContent() {
    return "(commentContent)";
  },
  getDocumentMode() {
    return P5.html.DOCUMENT_MODE.NO_QUIRKS;
  },
  getDocumentTypeNodeName() {
    return "(doctypeName)";
  },
  getDocumentTypeNodePublicId() {
    return "(docTypePublicId)";
  },
  getDocumentTypeNodeSystemId() {
    return "(docTypeSystemId)";
  },
  getFirstChild() {
    return null;
  },
  getNamespaceURI() {
    return P5.html.NS.HTML;
  },
  getNodeSourceCodeLocation() {
    return null;
  },
  getParentNode() {
    return null;
  },
  getTagName() {
    return "(tagName)";
  },
  getTemplateContent() {
    return "fragment";
  },
  getTextNodeContent() {
    return "(textContent)";
  },
  insertBefore() {},
  insertText(parent: "parent", text) {
    console.log("insert text:", { parent, text });
  },
  insertTextBefore() {},
  isCommentNode(node): node is "comment" {
    return node === "comment";
  },
  isDocumentTypeNode(node): node is "doctype" {
    return node === "doctype";
  },
  isElementNode(node): node is `element/${string}` {
    return node.startsWith("element/");
  },
  isTextNode(node): node is "text" {
    return node === "text";
  },
  setDocumentMode() {},
  setDocumentType() {},
  setNodeSourceCodeLocation() {},
  setTemplateContent() {},
  updateNodeSourceCodeLocation() {},
};

P5.parseFragment("foobar", { treeAdapter: minimalAdapter });

extreme example explanation

Pay attention specifically to:

  • the parentNode type (which is just the literal string type, "parent")
  • the insertText function, which per the official parse5 documentation takes in two arguments:
    insertText(parentNode: T["parentNode"], text: string)
    in this case, T["parentNode"] is, per the last bullet point, the literal string type "parent". In the code example, we explicitly annotate this:
    insertText(parent: "parent", text)

However, running the code results in the following output:

insert text: { parent: 'element/html', text: 'foobar' }

which is inconsistent with the type declarations, even though there is no type-checking error. This happens because the actual way that this parsing behind the scenes looks something like this (not exactly; I'm omitting some details to keep this discussion a little simpler, but I'm pretty sure the basic idea is right):

  • First call createElement to create a "mock" html element.
  • Then call insertText with the element we just created, and the text "foobar" (from the input).

The problem here is that createElement produces a node of type element, while insertText expects a node of type parentNode, but there is no requirement that all elements are actually parentNodes (in technical lingo, element needs to be a subtype of a.k.a. extend parentNode), and indeed, that is the requirement we constructed this example to violate.

related problems

The example above showcases just one of many unspoken type requirements. There are probably many, but here are just a few more I can think of:

  • templates are created using createElement as well, but then passed into setTemplateContent, which expects its input to be type template. By the reasoning I gave above this would require that all elements are templates, which doesn't make sense, so something else needs to be fixed to address this type-soundness issue besides merely imposing constraints on the types.
  • documents are also parentNodes, I think
  • similar issues with childNode and appendChild

who cares?

You could argue that my example is pathological and unrealistic, and that's true. Indeed, most of these unspoken constraints are satisfied just fine by the default, built-in tree adapter, so for most people these type unsoundness errors are moot. Still, I think that if we're going to go to these lengths (custom tree adapters & adapter types) to make parse5 customizable anyway, then we might as well do it right, and currently the types are not right.

how do we fix it?

I'm not totally sure! Some of the issues are partially addressed by adding extends constraints to ensure that, e.g., Element extends ParentNode. However, adding these constraints to TreeAdapterTypeMap's type parameters is insufficient because this way they can be bypassed simply by directly constructing the type map via a literal, which is already what my example above does. So to really enforce these constraints they have to occur directly on TreeAdapter itself, which would require a very annoying rewrite that lifts all the node type parameters directly onto the TreeAdaptertype itself. That would make the code a lot more verbose and break backwards-compatibility.

On top of that, it also doesn't address the issue of templates, which arise from technically unsound use of createElement to create a template node type, given specific input arguments. Simply constraining Element extends Template can address the type errors but doesn't make sense and conceptually isn't what we want; the more correct fix would be instead to have a completely separate createTemplateNode method and use that in the parsing logic instead when constructing templates. Again, API-breaking change.

Anyway, I don't have a simple fix that doesn't break stability in lots of ways, but I'm raising this issue anyway for discussion because there is a real typing issue, and I do care a lot about types. If we ultimately don't implement/enforce these type constraints/requirements in TypeScript because of stability or other technical reasons, I think they deserve at least a mention in the documentation so other folks trying to write their own TreeAdapter know what compatibility requirements they need in order to avoid running into unexpected type errors. :)

@43081j
Copy link
Collaborator

43081j commented Sep 2, 2024

The types aren't quite right anyway, outside of all this. The concept of a type map has some problems and weakens types in some situations

A parent node is meant to be a union of all the possible types which have parents. That shouldn't really change but we could possibly introduce a base type (like WithParent)

Meanwhile, template inherits element currently (and probably should)

Ultimately I think the actual fix one day is to make tree adapters less dynamic/flexible and require them to implement certain fundamental types

Id be open to contributions here as I won't have much time over the next couple of months

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants