Open
Description
Code to reproduce:
use \Wa72\HtmlPageDom\HtmlPageCrawler;
$html = <<<EOF
<!DOCTYPE html>
<html prefix="og: http://ogp.me/ns#"><head><meta charset="UTF-8">
<title>网友终于肉搜出「范冰冰」家族照片,没想到看见她奶奶才发现「范冰冰是全家最难看的」!</title>
<body>
网友终于肉搜出「范冰冰」家族照片,没想到看见她奶奶才发现「范冰冰是全家最难看的」!
</body>
</html>
EOF;
$document = new HtmlPageCrawler($html);
echo $document->saveHTML();
Result:
<!DOCTYPE html>
<html prefix="og: http://ogp.me/ns#"><head><meta charset="UTF-8"><title>网友终于肉搜出「范冰冰」家族照片,没想到看见她奶奶才发现「范冰冰是全家最难看的」!</title></head><body>
网友终于肉搜出「范冰冰」家族照片,没想到看见她奶奶才发现「范冰冰是全家最难看的」!
</body></html>
Expected Result:
<!DOCTYPE html>
<html prefix="og: http://ogp.me/ns#"><head><meta charset="UTF-8">
<title>网友终于肉搜出「范冰冰」家族照片,没想到看见她奶奶才发现「范冰冰是全家最难看的」!</title>
<body>
网友终于肉搜出「范冰冰」家族照片,没想到看见她奶奶才发现「范冰冰是全家最难看的」!
</body></html>
It is a known bug of PHP DomDocument. Here is the reference:
http://stackoverflow.com/questions/8218230/php-domdocument-loadhtml-not-encoding-utf-8-correctly
Metadata
Metadata
Assignees
Labels
No labels