Skip to content

Commit 268a620

Browse files
unixnutJordan Hallk00ni
authored
Ignore encryption (#653)
* Add ability to ingore PDF encryption check * Switch to ! syntax * Update src/Smalot/PdfParser/Parser.php * Additional changes for #488 doc/Usage.md: - Moved description of `setIgnoreEncryption` option to doc/CustomConfig.md - Added brief "PDF encryption" section doc/CustomConfig.md: added `setIgnoreEncryption` option and section to describe it. src/Smalot/PdfParser/Config.php: Doc comment for Config::setIgnoreEncryption() Added tests/PHPUnit/Integration/EncryptionTest.php Added samples/not_really_encrypted.pdf (thanks to @parijke who orginially created this as test.pdf) See #653 * src/Smalot/PdfParser/Config.php: PHP-CS-Fixer issue fixed * Update CustomConfig.md refined texts * Config.php: use explicit PHP doc entities * ParserTest.php: moved tests * removed EncryptionTest.php --------- Co-authored-by: Jordan Hall <[email protected]> Co-authored-by: Konrad Abicht <[email protected]>
1 parent feaf39e commit 268a620

File tree

6 files changed

+83
-1
lines changed

6 files changed

+83
-1
lines changed

doc/CustomConfig.md

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@ The `Config` class has the following options:
2121
|--------------------------|---------|-----------------|------------------------------------------------------------------------------------------------------------------------------------------------------|
2222
| `setDecodeMemoryLimit` | Integer | `0` | If parsing fails because of memory exhaustion, you can set a lower memory limit for decoding operations. |
2323
| `setFontSpaceLimit` | Integer | `-50` | Changing font space limit can be helpful when `Parser::getText()` returns a text with too many spaces. |
24+
| `setIgnoreEncryption` | Boolean | `false` | Read PDFs that are not encrypted but have the encryption flag set. This is a temporary workaround, don't rely on it. |
2425
| `setHorizontalOffset` | String | ` ` | When words are broken up or when the structure of a table is not preserved, you may get better results when adapting `setHorizontalOffset`. |
2526
| `setPdfWhitespaces` | String | `\0\t\n\f\r ` | |
2627
| `setPdfWhitespacesRegex` | String | `[\0\t\n\f\r ]` | |
@@ -63,3 +64,17 @@ $config->setFontSpaceLimit(-60);
6364
$parser = new \Smalot\PdfParser\Parser([], $config);
6465
$pdf = $parser->parseFile('document.pdf');
6566
```
67+
68+
## option setIgnoreEncryption
69+
70+
In some cases PDF files may be internally marked as encrypted even though the content is not encrypted and can be read.
71+
This can be caused by the PDF being created by a tool that does not properly set the encryption flag.
72+
If you are sure that the PDF is not encrypted, you can ignore the encryption flag by setting the `ignoreEncryption` flag to `true` in a custom `Config` instance.
73+
74+
```php
75+
$config = new \Smalot\PdfParser\Config();
76+
$config->setIgnoreEncryption(true);
77+
78+
$parser = new \Smalot\PdfParser\Parser([], $config);
79+
$pdf = $parser->parseFile('document.pdf');
80+
```

doc/Usage.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -230,3 +230,14 @@ foreach ($pages as $page) {
230230
];
231231
}
232232
```
233+
234+
## PDF encryption
235+
236+
This library cannot currently read encrypted PDF files, i.e. those with
237+
a read password. Attempting to do so produces this error:
238+
```
239+
Exception: Secured pdf file are currently not supported.
240+
```
241+
242+
See `setIgnoreEncryption` option in [CustomConfig.md](CustomConfig.md)
243+
for how to override the check in specific cases.

samples/not_really_encrypted.pdf

6.54 KB
Binary file not shown.

src/Smalot/PdfParser/Config.php

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -82,6 +82,13 @@ class Config
8282
*/
8383
private $dataTmFontInfoHasToBeIncluded = false;
8484

85+
/**
86+
* Whether to attempt to read PDFs even if they are marked as encrypted.
87+
*
88+
* @var bool
89+
*/
90+
private $ignoreEncryption = false;
91+
8592
public function getFontSpaceLimit()
8693
{
8794
return $this->fontSpaceLimit;
@@ -151,4 +158,18 @@ public function setDataTmFontInfoHasToBeIncluded(bool $dataTmFontInfoHasToBeIncl
151158
{
152159
$this->dataTmFontInfoHasToBeIncluded = $dataTmFontInfoHasToBeIncluded;
153160
}
161+
162+
public function getIgnoreEncryption(): bool
163+
{
164+
return $this->ignoreEncryption;
165+
}
166+
167+
/**
168+
* @deprecated this is a temporary workaround, don't rely on it
169+
* @see https://github.com/smalot/pdfparser/pull/653
170+
*/
171+
public function setIgnoreEncryption(bool $ignoreEncryption): void
172+
{
173+
$this->ignoreEncryption = $ignoreEncryption;
174+
}
154175
}

src/Smalot/PdfParser/Parser.php

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -102,7 +102,7 @@ public function parseContent(string $content): Document
102102
// Create structure from raw data.
103103
list($xref, $data) = $this->rawDataParser->parseData($content);
104104

105-
if (isset($xref['trailer']['encrypt'])) {
105+
if (isset($xref['trailer']['encrypt']) && false === $this->config->getIgnoreEncryption()) {
106106
throw new \Exception('Secured pdf file are currently not supported.');
107107
}
108108

tests/PHPUnit/Integration/ParserTest.php

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -403,6 +403,41 @@ public function testRetainImageContentImpact(): void
403403
$this->assertLessThan($baselineMemory * 1.05, $usedMemory, 'Memory is '.$usedMemory);
404404
$this->assertTrue('' !== $document->getText());
405405
}
406+
407+
/**
408+
* Tests handling of encrypted PDF.
409+
*
410+
* @see https://github.com/smalot/pdfparser/pull/653
411+
*/
412+
public function testNoIgnoreEncryption(): void
413+
{
414+
$filename = $this->rootDir.'/samples/not_really_encrypted.pdf';
415+
$threw = false;
416+
try {
417+
(new Parser([]))->parseFile($filename);
418+
} catch (\Exception $e) {
419+
// we expect an exception to be thrown if an encrypted PDF is encountered.
420+
$threw = true;
421+
}
422+
$this->assertTrue($threw);
423+
}
424+
425+
/**
426+
* Tests behavior if encryption is ignored.
427+
*
428+
* @see https://github.com/smalot/pdfparser/pull/653
429+
*/
430+
public function testIgnoreEncryption(): void
431+
{
432+
$config = new Config();
433+
$config->setIgnoreEncryption(true);
434+
435+
$filename = $this->rootDir.'/samples/not_really_encrypted.pdf';
436+
437+
$this->assertTrue((new Parser([], $config))->parseFile($filename) instanceof Document);
438+
439+
// without the configuration option set, an exception would be thrown.
440+
}
406441
}
407442

408443
class ParserSub extends Parser

0 commit comments

Comments
 (0)