Skip to content
This repository has been archived by the owner on May 26, 2022. It is now read-only.

Commit

Permalink
Fix shared strings XML Entities auto decode (#411)
Browse files Browse the repository at this point in the history
When converting an XMLReader node to a SimpleXMLElement, the conversion would automatically decode the XML entities. This resulted in a double decode.
For example: """ was converted to """ when imported into a SimpleXMLElement and was again converted into " (quote).

This commit changes the way the XLSX Shared Strings file is processed. It also changes the unescaping logic for both XLSX and ODS.

Finally, it removes any usage of the SimpleXML library (yay!).
  • Loading branch information
adrilo authored Apr 28, 2017
1 parent 1eb01a3 commit 0481054
Show file tree
Hide file tree
Showing 17 changed files with 119 additions and 404 deletions.
1 change: 0 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,6 @@ require_once '[PATH/TO]/src/Spout/Autoloader/autoload.php'; // don't forget to c
* PHP version 5.4.0 or higher
* PHP extension `php_zip` enabled
* PHP extension `php_xmlreader` enabled
* PHP extension `php_simplexml` enabled


## Basic usage
Expand Down
3 changes: 1 addition & 2 deletions composer.json
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,7 @@
"require": {
"php": ">=5.4.0",
"ext-zip": "*",
"ext-xmlreader" : "*",
"ext-simplexml": "*"
"ext-xmlreader" : "*"
},
"require-dev": {
"phpunit/phpunit": "^4.8.0"
Expand Down
17 changes: 13 additions & 4 deletions src/Spout/Common/Escaper/ODS.php
Original file line number Diff line number Diff line change
Expand Up @@ -26,10 +26,13 @@ public function escape($string)
// 'ENT_DISALLOWED' ensures that invalid characters in the given document type are replaced.
// Otherwise control characters like a vertical tab "\v" will make the XML document unreadable by the XML processor
// @link https://github.com/box/spout/issues/329
$replacedString = htmlspecialchars($string, ENT_QUOTES | ENT_DISALLOWED);
$replacedString = htmlspecialchars($string, ENT_NOQUOTES | ENT_DISALLOWED);
} else {
// We are on hhvm or any other engine that does not support ENT_DISALLOWED
$escapedString = htmlspecialchars($string, ENT_QUOTES);
// We are on hhvm or any other engine that does not support ENT_DISALLOWED.
//
// @NOTE: Using ENT_NOQUOTES as only XML entities ('<', '>', '&') need to be encoded.
// Single and double quotes can be left as is.
$escapedString = htmlspecialchars($string, ENT_NOQUOTES);

// control characters values are from 0 to 1F (hex values) in the ASCII table
// some characters should not be escaped though: "\t", "\r" and "\n".
Expand All @@ -52,6 +55,12 @@ public function escape($string)
*/
public function unescape($string)
{
return htmlspecialchars_decode($string, ENT_QUOTES);
// ==============
// = WARNING =
// ==============
// It is assumed that the given string has already had its XML entities decoded.
// This is true if the string is coming from a DOMNode (as DOMNode already decode XML entities on creation).
// Therefore there is no need to call "htmlspecialchars_decode()".
return $string;
}
}
13 changes: 10 additions & 3 deletions src/Spout/Common/Escaper/XLSX.php
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,9 @@ protected function init()
public function escape($string)
{
$escapedString = $this->escapeControlCharacters($string);
$escapedString = htmlspecialchars($escapedString, ENT_QUOTES);
// @NOTE: Using ENT_NOQUOTES as only XML entities ('<', '>', '&') need to be encoded.
// Single and double quotes can be left as is.
$escapedString = htmlspecialchars($escapedString, ENT_NOQUOTES);

return $escapedString;
}
Expand All @@ -55,8 +57,13 @@ public function escape($string)
*/
public function unescape($string)
{
$unescapedString = htmlspecialchars_decode($string, ENT_QUOTES);
$unescapedString = $this->unescapeControlCharacters($unescapedString);
// ==============
// = WARNING =
// ==============
// It is assumed that the given string has already had its XML entities decoded.
// This is true if the string is coming from a DOMNode (as DOMNode already decode XML entities on creation).
// Therefore there is no need to call "htmlspecialchars_decode()".
$unescapedString = $this->unescapeControlCharacters($string);

return $unescapedString;
}
Expand Down
1 change: 1 addition & 0 deletions src/Spout/Reader/ODS/RowIterator.php
Original file line number Diff line number Diff line change
Expand Up @@ -202,6 +202,7 @@ protected function processCellStartingNode($xmlReader)
{
$currentNumColumnsRepeated = $this->getNumColumnsRepeatedForCurrentNode($xmlReader);

// NOTE: expand() will automatically decode all XML entities of the child nodes
$node = $xmlReader->expand();
$currentCellValue = $this->getCellValue($node);

Expand Down
159 changes: 0 additions & 159 deletions src/Spout/Reader/Wrapper/SimpleXMLElement.php

This file was deleted.

1 change: 1 addition & 0 deletions src/Spout/Reader/Wrapper/XMLReader.php
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
<?php

namespace Box\Spout\Reader\Wrapper;
use DOMNode;


/**
Expand Down
Loading

0 comments on commit 0481054

Please sign in to comment.