Skip to content

Commit 05b6c43

Browse files
authored
Allow Xlsx Reader to Specify ParseHuge Release1291 (#4515)
* Allow Xlsx Reader to Specify ParseHuge Release1291 Backport #4514. A number of Security Advisories related to libxml_options were opened. In the end, we disabled the ability to specify any libxml_options. However, some users were adversely affected because they needed LIBXML_PARSEHUGE for some of their files. Having finally obtained access to a file demonstrating this problem, we can restore this ability. - The operation is potentially dangerous, a vector for memory leaks and out-of-memory errors. It is not recommended unless absolutely needed. - It will not be permitted as a global (static) property with the ability to adversely affect other users on the same server. - It will instead be implemented as an instance property of Xlsx Reader (default to false), with a setter. I do not see a use case for a getter. - People will need to set this property individually for each file which they think needs it. - This change will be backported to all supported releases. - The sheer size and processing time for the file involved makes it impractical to add a formal test case. It has, nevertheless, been tested satisfactorily. * Update CHANGELOG.md
1 parent b94b4e9 commit 05b6c43

File tree

2 files changed

+25
-7
lines changed

2 files changed

+25
-7
lines changed

CHANGELOG.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ All notable changes to this project will be documented in this file.
55
The format is based on [Keep a Changelog](https://keepachangelog.com)
66
and this project adheres to [Semantic Versioning](https://semver.org).
77

8-
# TBD - 1.29.11
8+
# 2025-06-22 - 1.29.11
99

1010
### Changed
1111

@@ -19,6 +19,7 @@ and this project adheres to [Semantic Versioning](https://semver.org).
1919

2020
- TEXT and TIMEVALUE functions. [Issue #4249](https://github.com/PHPOffice/PhpSpreadsheet/issues/4249) [PR #4352](https://github.com/PHPOffice/PhpSpreadsheet/pull/4352)
2121
- Removing Columns/Rows Containing Merged Cells. Backport of [PR #4465](https://github.com/PHPOffice/PhpSpreadsheet/pull/4465)
22+
- Allow Xlsx Reader to Specify ParseHuge. [Issue #4260](https://github.com/PHPOffice/PhpSpreadsheet/issues/4260) [PR #4515](https://github.com/PHPOffice/PhpSpreadsheet/pull/4515)
2223

2324
# 2025-02-07 - 1.29.10
2425

src/PhpSpreadsheet/Reader/Xlsx.php

Lines changed: 23 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -66,6 +66,19 @@ class Xlsx extends BaseReader
6666
*/
6767
private $sharedFormulae = [];
6868

69+
private bool $parseHuge = false;
70+
71+
/**
72+
* Allow use of LIBXML_PARSEHUGE.
73+
* This option can lead to memory leaks and failures,
74+
* and is not recommended. But some very large spreadsheets
75+
* seem to require it.
76+
*/
77+
public function setParseHuge(bool $parseHuge): void
78+
{
79+
$this->parseHuge = $parseHuge;
80+
}
81+
6982
/**
7083
* Create a new Xlsx Reader instance.
7184
*/
@@ -135,8 +148,8 @@ private function loadZip(string $filename, string $ns = '', bool $replaceUnclose
135148
}
136149
$rels = @simplexml_load_string(
137150
$this->getSecurityScannerOrThrow()->scan($contents),
138-
'SimpleXMLElement',
139-
0,
151+
SimpleXMLElement::class,
152+
$this->parseHuge ? LIBXML_PARSEHUGE : 0,
140153
$ns
141154
);
142155

@@ -150,8 +163,8 @@ private function loadZipNonamespace(string $filename, string $ns): SimpleXMLElem
150163
$contents = $this->getFromZipArchive($this->zip, $filename);
151164
$rels = simplexml_load_string(
152165
$this->getSecurityScannerOrThrow()->scan($contents),
153-
'SimpleXMLElement',
154-
0,
166+
SimpleXMLElement::class,
167+
$this->parseHuge ? LIBXML_PARSEHUGE : 0,
155168
($ns === '' ? $ns : '')
156169
);
157170

@@ -273,7 +286,9 @@ public function listWorksheetInfo($filename)
273286
$this->zip,
274287
$fileWorksheetPath
275288
)
276-
)
289+
),
290+
null,
291+
$this->parseHuge ? LIBXML_PARSEHUGE : 0,
277292
);
278293
$xml->setParserProperty(2, true);
279294

@@ -1951,7 +1966,9 @@ private function readRibbon(Spreadsheet $excel, string $customUITarget, ZipArchi
19511966
// exists and not empty if the ribbon have some pictures (other than internal MSO)
19521967
$UIRels = simplexml_load_string(
19531968
$this->getSecurityScannerOrThrow()
1954-
->scan($dataRels)
1969+
->scan($dataRels),
1970+
SimpleXMLElement::class,
1971+
$this->parseHuge ? LIBXML_PARSEHUGE : 0
19551972
);
19561973
if (false !== $UIRels) {
19571974
// we need to save id and target to avoid parsing customUI.xml and "guess" if it's a pseudo callback who load the image

0 commit comments

Comments
 (0)