Skip to content

Correct disable-output-escaping implementation. #112

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 2, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,7 @@ const xslt = new Xslt(options);
```

- `cData` (`boolean`, default `true`): resolves CDATA elements in the output. Content under CDATA is resolved as text. This overrides `escape` for CDATA content.
- `escape` (`boolean`, default `true`): replaces symbols like `<`, `>`, `&` and `"` by the corresponding [XML entities](https://www.tutorialspoint.com/xml/xml_character_entities.htm).
- `escape` (`boolean`, default `true`): replaces symbols like `<`, `>`, `&` and `"` by the corresponding [HTML/XML entities](https://www.tutorialspoint.com/xml/xml_character_entities.htm). Can be overridden by `disable-output-escaping`, that also does the opposite, unescaping `&gt;` and `&lt;` by `<` and `>`, respectively.
- `selfClosingTags` (`boolean`, default `true`): Self-closes tags that don't have inner elements, if `true`. For instance, `<test></test>` becomes `<test />`.
- `outputMethod` (`string`, default `xml`): Specifies the default output method. if `<xsl:output>` is declared in your XSLT file, this will be overridden.
- `parameters` (`array`, default `[]`): external parameters that you want to use.
Expand Down
9 changes: 6 additions & 3 deletions TODO.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,14 @@
XSLT-processor TODO
=====

* Rethink match algorithm, as described in https://github.com/DesignLiquido/xslt-processor/pull/62#issuecomment-1636684453;
* Rethink match algorithm, as described in https://github.com/DesignLiquido/xslt-processor/pull/62#issuecomment-1636684453. There's a good number of issues open about this problem:
* https://github.com/DesignLiquido/xslt-processor/issues/108
* https://github.com/DesignLiquido/xslt-processor/issues/109
* https://github.com/DesignLiquido/xslt-processor/issues/110
* XSLT validation, besides the version number;
* XSL:number
* `attribute-set`, `decimal-format`, etc. (check `src/xslt.ts`)
* `/html/body//ul/li|html/body//ol/li` has `/html/body//ul/li` evaluated by this XPath implementation as "absolute", and `/html/body//ol/li` as "relative". Both should be evaluated as "absolute".
* Implement `<xsl:import>` with correct template precedence.
* `/html/body//ul/li|html/body//ol/li` has `/html/body//ul/li` evaluated by this XPath implementation as "absolute", and `/html/body//ol/li` as "relative". Both should be evaluated as "absolute". One idea is to rewrite the XPath logic entirely, since it is nearly impossible to debug it.
* Implement `<xsl:import>` with correct template precedence.

Help is much appreciated. It seems to currently work for most of our purposes, but fixes and additions are always welcome!
1 change: 1 addition & 0 deletions package.json
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,7 @@
"@rollup/plugin-typescript": "^11.1.1",
"@types/he": "^1.2.0",
"@types/jest": "^29.5.12",
"@types/node-fetch": "^2.6.11",
"@typescript-eslint/eslint-plugin": "^8.4.0",
"@typescript-eslint/parser": "^8.4.0",
"babel-jest": "^29.7.0",
Expand Down
1 change: 1 addition & 0 deletions src/dom/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,5 @@ export * from './xdocument';
export * from './xml-functions';
export * from './xml-output-options';
export * from './xml-parser';
export * from './xbrowser-node';
export * from './xnode';
10 changes: 10 additions & 0 deletions src/dom/xbrowser-node.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
import { XNode } from "./xnode";

/**
* Special XNode class, that retains properties from browsers like
* IE, Opera, Safari, etc.
*/
export class XBrowserNode extends XNode {
innerText?: string;
textContent?: string;
}
104 changes: 72 additions & 32 deletions src/dom/xml-functions.ts
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
import { XNode } from './xnode';
import { XDocument } from './xdocument';
import { XmlOutputOptions } from './xml-output-options';

import { XBrowserNode } from './xbrowser-node';

/**
* Returns the text value of a node; for nodes without children this
Expand All @@ -25,15 +25,15 @@
* @param disallowBrowserSpecificOptimization A boolean, to avoid browser optimization.
* @returns The XML value as a string.
*/
export function xmlValue(node: XNode | any, disallowBrowserSpecificOptimization: boolean = false): string {
export function xmlValue(node: XNode, disallowBrowserSpecificOptimization: boolean = false): string {
if (!node) {
return '';
}

let ret = '';
switch (node.nodeType) {
case DOM_DOCUMENT_TYPE_NODE:
return `<!DOCTYPE ${node.nodeValue}>`
return `<!DOCTYPE ${node.nodeValue}>`;

Check warning on line 36 in src/dom/xml-functions.ts

View workflow job for this annotation

GitHub Actions / Coverage annotations (🧪 jest-coverage-report-action)

🧾 Statement is not covered

Warning! Not covered statement

Check warning on line 36 in src/dom/xml-functions.ts

View workflow job for this annotation

GitHub Actions / Coverage annotations (🧪 jest-coverage-report-action)

🌿 Branch is not covered

Warning! Not covered branch
case DOM_TEXT_NODE:
case DOM_CDATA_SECTION_NODE:
case DOM_ATTRIBUTE_NODE:
Expand All @@ -44,19 +44,22 @@
if (!disallowBrowserSpecificOptimization) {
// Only returns something if node has either `innerText` or `textContent` (not an XNode).
// IE, Safari, Opera, and friends (`innerText`)
const innerText = node.innerText;
if (innerText != undefined) {
const browserNode = node as XBrowserNode;
const innerText = browserNode.innerText;
if (innerText !== undefined) {
return innerText;
}

Check warning on line 51 in src/dom/xml-functions.ts

View workflow job for this annotation

GitHub Actions / Coverage annotations (🧪 jest-coverage-report-action)

🌿 Branch is not covered

Warning! Not covered branch
// Firefox (`textContent`)
const textContent = node.textContent;
if (textContent != undefined) {
const textContent = browserNode.textContent;
if (textContent !== undefined) {
return textContent;
}

Check warning on line 56 in src/dom/xml-functions.ts

View workflow job for this annotation

GitHub Actions / Coverage annotations (🧪 jest-coverage-report-action)

🌿 Branch is not covered

Warning! Not covered branch
}

if (node.transformedChildNodes.length > 0) {
const transformedTextNodes = node.transformedChildNodes.filter((n: XNode) => n.nodeType !== DOM_ATTRIBUTE_NODE);
const transformedTextNodes = node.transformedChildNodes.filter(
(n: XNode) => n.nodeType !== DOM_ATTRIBUTE_NODE
);
for (let i = 0; i < transformedTextNodes.length; ++i) {
ret += xmlValue(transformedTextNodes[i]);
}
Expand All @@ -71,8 +74,15 @@
}
}

// TODO: Give a better name to this.
export function xmlValue2(node: any, disallowBrowserSpecificOptimization: boolean = false) {
/**
* The older version to obtain a XML value from a node.
* For now, this form is only used to get text from attribute nodes,
* and it should be removed in future versions.
* @param node The attribute node.
* @param disallowBrowserSpecificOptimization A boolean, to avoid browser optimization.
* @returns The XML value as a string.
*/
export function xmlValueLegacyBehavior(node: XNode, disallowBrowserSpecificOptimization: boolean = false) {
if (!node) {
return '';
}
Expand All @@ -91,15 +101,16 @@
case DOM_ELEMENT_NODE:
if (!disallowBrowserSpecificOptimization) {
// IE, Safari, Opera, and friends
const innerText = node.innerText;
if (innerText != undefined) {
const browserNode = node as XBrowserNode;
const innerText = browserNode.innerText;
if (innerText !== undefined) {
return innerText;
}

Check warning on line 108 in src/dom/xml-functions.ts

View workflow job for this annotation

GitHub Actions / Coverage annotations (🧪 jest-coverage-report-action)

🌿 Branch is not covered

Warning! Not covered branch
// Firefox
const textContent = node.textContent;
if (textContent != undefined) {
const textContent = browserNode.textContent;
if (textContent !== undefined) {
return textContent;
}

Check warning on line 113 in src/dom/xml-functions.ts

View workflow job for this annotation

GitHub Actions / Coverage annotations (🧪 jest-coverage-report-action)

🌿 Branch is not covered

Warning! Not covered branch
}

const len = node.transformedChildNodes.length;
Expand All @@ -121,60 +132,74 @@
* @returns The XML string.
* @see xmlTransformedText
*/
export function xmlText(node: XNode, options: XmlOutputOptions = {
cData: true,
escape: true,
selfClosingTags: true,
outputMethod: 'xml'
}) {
export function xmlText(
node: XNode,
options: XmlOutputOptions = {
cData: true,
escape: true,
selfClosingTags: true,
outputMethod: 'xml'
}
) {
const buffer: string[] = [];
xmlTextRecursive(node, buffer, options);
return buffer.join('');
}

/**
* The recursive logic to transform a node in XML text.
* It can be considered legacy, since it does not work with transformed nodes, and
* probably will be removed in the future.
* @param {XNode} node The node.
* @param {string[]} buffer The buffer, that will represent the transformed XML text.
* @param {XmlOutputOptions} options XML output options.
*/
function xmlTextRecursive(node: XNode, buffer: string[], options: XmlOutputOptions) {
if (node.nodeType == DOM_TEXT_NODE) {
buffer.push(xmlEscapeText(node.nodeValue));
} else if (node.nodeType == DOM_CDATA_SECTION_NODE) {
if (options.cData) {
buffer.push(node.nodeValue);
} else {
buffer.push(`<![CDATA[${node.nodeValue}]]>`);
}
} else if (node.nodeType == DOM_COMMENT_NODE) {
buffer.push(`<!--${node.nodeValue}-->`);
} else if (node.nodeType == DOM_ELEMENT_NODE) {
buffer.push(`<${xmlFullNodeName(node)}`);

for (let i = 0; i < node.childNodes.length; ++i) {
const childNode = node.childNodes[i];
if (!childNode || childNode.nodeType !== DOM_ATTRIBUTE_NODE) {
continue;
}

if (childNode.nodeName && childNode.nodeValue) {
buffer.push(` ${xmlFullNodeName(childNode)}="${xmlEscapeAttr(childNode.nodeValue)}"`);
}
}

if (node.childNodes.length === 0) {
if (options.selfClosingTags || (options.outputMethod === 'html' && ['hr', 'link'].includes(node.nodeName))) {
if (
options.selfClosingTags ||
(options.outputMethod === 'html' && ['hr', 'link'].includes(node.nodeName))
) {
buffer.push('/>');
} else {
buffer.push(`></${xmlFullNodeName(node)}>`);
}
} else {
buffer.push('>');
for (let i = 0; i < node.childNodes.length; ++i) {
xmlTextRecursive(node.childNodes[i], buffer, options);
}
buffer.push(`</${xmlFullNodeName(node)}>`);
}
} else if (node.nodeType == DOM_DOCUMENT_NODE || node.nodeType == DOM_DOCUMENT_FRAGMENT_NODE) {
for (let i = 0; i < node.childNodes.length; ++i) {
xmlTextRecursive(node.childNodes[i], buffer, options);
}
}

Check warning on line 202 in src/dom/xml-functions.ts

View workflow job for this annotation

GitHub Actions / Coverage annotations (🧪 jest-coverage-report-action)

🌿 Branch is not covered

Warning! Not covered branch
}

/**
Expand All @@ -197,15 +222,20 @@
return buffer.join('');
}

function xmlTransformedTextRecursive(node: XNode, buffer: any[], options: XmlOutputOptions) {
/**
* The recursive logic to transform a node in XML text.
* @param {XNode} node The node.
* @param {string[]} buffer The buffer, that will represent the transformed XML text.
* @param {XmlOutputOptions} options XML output options.
*/
function xmlTransformedTextRecursive(node: XNode, buffer: string[], options: XmlOutputOptions) {
if (node.visited) return;
const nodeType = node.transformedNodeType || node.nodeType;
const nodeValue = node.transformedNodeValue || node.nodeValue;
if (nodeType === DOM_TEXT_NODE) {
if (node.transformedNodeValue && node.transformedNodeValue.trim() !== '') {
const finalText = node.escape && options.escape?
xmlEscapeText(node.transformedNodeValue) :
node.transformedNodeValue;
const finalText =
node.escape && options.escape ? xmlEscapeText(node.transformedNodeValue): xmlUnescapeText(node.transformedNodeValue);
buffer.push(finalText);
}
} else if (nodeType === DOM_CDATA_SECTION_NODE) {
Expand Down Expand Up @@ -246,9 +276,9 @@
function xmlElementLogicTrivial(node: XNode, buffer: string[], options: XmlOutputOptions) {
buffer.push(`<${xmlFullNodeName(node)}`);

let attributes = node.transformedChildNodes.filter(n => n.nodeType === DOM_ATTRIBUTE_NODE);
let attributes = node.transformedChildNodes.filter((n) => n.nodeType === DOM_ATTRIBUTE_NODE);
if (attributes.length === 0) {
attributes = node.childNodes.filter(n => n.nodeType === DOM_ATTRIBUTE_NODE);
attributes = node.childNodes.filter((n) => n.nodeType === DOM_ATTRIBUTE_NODE);

Check warning on line 281 in src/dom/xml-functions.ts

View workflow job for this annotation

GitHub Actions / Coverage annotations (🧪 jest-coverage-report-action)

🧾 Statement is not covered

Warning! Not covered statement

Check warning on line 281 in src/dom/xml-functions.ts

View workflow job for this annotation

GitHub Actions / Coverage annotations (🧪 jest-coverage-report-action)

🕹️ Function is not covered

Warning! Not covered function
}

for (let i = 0; i < attributes.length; ++i) {
Expand All @@ -262,9 +292,9 @@
}
}

let childNodes = node.transformedChildNodes.filter(n => n.nodeType !== DOM_ATTRIBUTE_NODE);
let childNodes = node.transformedChildNodes.filter((n) => n.nodeType !== DOM_ATTRIBUTE_NODE);
if (childNodes.length === 0) {
childNodes = node.childNodes.filter(n => n.nodeType !== DOM_ATTRIBUTE_NODE);
childNodes = node.childNodes.filter((n) => n.nodeType !== DOM_ATTRIBUTE_NODE);

Check warning on line 297 in src/dom/xml-functions.ts

View workflow job for this annotation

GitHub Actions / Coverage annotations (🧪 jest-coverage-report-action)

🧾 Statement is not covered

Warning! Not covered statement

Check warning on line 297 in src/dom/xml-functions.ts

View workflow job for this annotation

GitHub Actions / Coverage annotations (🧪 jest-coverage-report-action)

🕹️ Function is not covered

Warning! Not covered function
}

childNodes = childNodes.sort((a, b) => a.siblingPosition - b.siblingPosition);
Expand Down Expand Up @@ -317,7 +347,17 @@
}

/**
* Escape XML special markup chracters: tag delimiter < > and entity
* Replaces HTML/XML entities to their literal characters.
* Currently implementing only tag delimiters.
* @param text The text to be transformed.
* @returns The unescaped text.
*/
export function xmlUnescapeText(text: string): string {
return `${text}`.replace(/&lt;/g, '<').replace(/&gt;/g, '>');
}

/**
* Escape XML special markup characters: tag delimiter <, >, and entity
* reference start delimiter &. The escaped string can be used in XML
* text portions (i.e. between tags).
* @param s The string to be escaped.
Expand All @@ -332,8 +372,8 @@
}

/**
* Escape XML special markup characters: tag delimiter < > entity
* reference start delimiter & and quotes ". The escaped string can be
* Escape XML special markup characters: tag delimiter, <, >, entity
* reference start delimiter &, and double quotes ("). The escaped string can be
* used in double quoted XML attribute value portions (i.e. in
* attributes within start tags).
* @param s The string to be escaped.
Expand Down
4 changes: 2 additions & 2 deletions src/xslt/xslt.ts
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ import {
xmlGetAttribute,
xmlTransformedText,
xmlValue,
xmlValue2
xmlValueLegacyBehavior
} from '../dom';
import { ExprContext, XPath } from '../xpath';

Expand Down Expand Up @@ -366,7 +366,7 @@ export class Xslt {

const documentFragment = domCreateDocumentFragment(this.outputDocument);
await this.xsltChildNodes(context, template, documentFragment);
const value = xmlValue2(documentFragment);
const value = xmlValueLegacyBehavior(documentFragment);

if (output && output.nodeType === DOM_DOCUMENT_FRAGMENT_NODE) {
domSetTransformedAttribute(output, name, value);
Expand Down
28 changes: 27 additions & 1 deletion tests/xslt/xslt.test.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -199,7 +199,16 @@ describe('xslt', () => {
});

describe('xsl:text', () => {
it('disable-output-escaping', async () => {
// Apparently, this is not how `disable-output-escaping` works.
// By an initial research, `<!DOCTYPE html>` explicitly mentioned in
// the XSLT gives an error like:
// `Unable to generate the XML document using the provided XML/XSL input.
// org.xml.sax.SAXParseException; lineNumber: 4; columnNumber: 70;
// A DOCTYPE is not allowed in content.`
// All the examples of `disable-output-escaping` usage will point out
// the opposite: `&lt;!DOCTYPE html&gt;` will become `<!DOCTYPE html>`.
// This test will be kept here for historical purposes.
it.skip('disable-output-escaping', async () => {
const xml = `<anything></anything>`;
const xslt = `<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="html" indent="yes" />
Expand All @@ -216,6 +225,23 @@ describe('xslt', () => {
assert.equal(html, '<!DOCTYPE html>');
});

it('disable-output-escaping, XML/HTML entities', async () => {
const xml = `<anything></anything>`;
const xslt = `<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="html" indent="yes" />
<xsl:template match="/">
<xsl:text disable-output-escaping='yes'>&lt;!DOCTYPE html&gt;</xsl:text>
</xsl:template>
</xsl:stylesheet>`;

const xsltClass = new Xslt();
const xmlParser = new XmlParser();
const parsedXml = xmlParser.xmlParse(xml);
const parsedXslt = xmlParser.xmlParse(xslt);
const html = await xsltClass.xsltProcess(parsedXml, parsedXslt);
assert.equal(html, '<!DOCTYPE html>');
});

it('CDATA as JavaScript', async () => {
const xml = `<XampleXml xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xsd="http://www.w3.org/2001/XMLSchema">
Expand Down
8 changes: 8 additions & 0 deletions yarn.lock
Original file line number Diff line number Diff line change
Expand Up @@ -1886,6 +1886,14 @@
resolved "https://registry.yarnpkg.com/@types/json-schema/-/json-schema-7.0.15.tgz#596a1747233694d50f6ad8a7869fcb6f56cf5841"
integrity sha512-5+fP8P8MFNC+AyZCDxrB2pkZFPGzqQWUzpSeuuVLvm8VMcorNYavBqoFcxK8bQz4Qsbn4oUEEem4wDLfcysGHA==

"@types/node-fetch@^2.6.11":
version "2.6.11"
resolved "https://registry.yarnpkg.com/@types/node-fetch/-/node-fetch-2.6.11.tgz#9b39b78665dae0e82a08f02f4967d62c66f95d24"
integrity sha512-24xFj9R5+rfQJLRyM56qh+wnVSYhyXC2tkoBndtY0U+vubqNsYXGjufB2nn8Q6gt0LrARwL6UBtMCSVCwl4B1g==
dependencies:
"@types/node" "*"
form-data "^4.0.0"

"@types/node@*":
version "22.5.4"
resolved "https://registry.yarnpkg.com/@types/node/-/node-22.5.4.tgz#83f7d1f65bc2ed223bdbf57c7884f1d5a4fa84e8"
Expand Down
Loading