Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix utf-8 encoding problem #19

Open
wants to merge 18 commits into
base: master
Choose a base branch
from
Open

Conversation

shtse8
Copy link

@shtse8 shtse8 commented Feb 27, 2017

Due to the urgent need in my project, hope it can help. #18

@shtse8 shtse8 changed the title Try to fix UTF-8 problem. Fix utf-8 encoding problem Feb 28, 2017
@ventrec
Copy link

ventrec commented Oct 5, 2017

I've ran into the same issue myself now, and this fix would be highly apprectiated.

@wasinger Is it possible to get this merged?

@shtse8
Copy link
Author

shtse8 commented Oct 6, 2017

I finally gave up this project and wrote my own html dom parse. But I am new on starting up a open source project.

@kukungkung
Copy link

I think it can help you.

curl_setopt($ch, CURLOPT_ENCODING, 'UTF-8');

@shtse8
Copy link
Author

shtse8 commented Oct 24, 2017

@kukungkung setting curlopt is just try to request with encoding utf-8, you have to decode the utf8 yourself. and the response may not follow your encoding. Also, mostly site are sending utf8 to you. here, the main problem is, htmlpagedom parse cannot support utf8, but not the curl.

@glensc
Copy link
Contributor

glensc commented Jun 11, 2018

the problem is not that utf8 is not parsed, just that result is encoded with html entities.

i solved this in my code:

$html = html_entity_decode((string)$crawler, ENT_NOQUOTES, 'UTF-8');

@glensc glensc mentioned this pull request Jun 11, 2018
@glensc
Copy link
Contributor

glensc commented Jun 11, 2018

the PRs seems broken because created from @shtse8 master branch, thus changes from #19 and #20 mixed in both pull requests. and perhaps even changes not related to neither of the PRs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants