Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

getText() returns text other drivers does not #153

Open
alexpott opened this issue Oct 8, 2020 · 4 comments
Open

getText() returns text other drivers does not #153

alexpott opened this issue Oct 8, 2020 · 4 comments

Comments

@alexpott
Copy link
Contributor

alexpott commented Oct 8, 2020

\Behat\Mink\Driver\BrowserKitDriver::getText() will return text in the head section and also any json on the page that's contained in a script tag in the HTML body. \Behat\Mink\Driver\Selenium2Driver::getText(), for example, will not return text from the head section or script tags in the body section. Given the Mink documentation states:

getText() will strip tags and unprinted characters out of the response, including newlines. So it’ll basically return the text that the user sees on the page.

I'm not sure if this is a Symfony\DomCrawler issue or not.

See for a discussion of the affects of this - https://www.drupal.org/project/drupal/issues/3175718

@jonathanjfshaw
Copy link

@alexpott
Copy link
Contributor Author

alexpott commented Oct 9, 2020

@jonathanjfshaw yep and it's returning what document.body.textContent in the browser console does. The point is that this is not what \Behat\Mink\Driver\Selenium2Driver::getText() returns and it is returning stuff that is not visible.

@aik099
Copy link
Member

aik099 commented Oct 22, 2020

I see no issue here.

The Selenium driver is talking to a real browser and can ask to return only text visible to a user. The BrowserKit being a headless driver only looking at HTML tags and parsing them to its knowledge. This way stripping all HTML tags will leave their content in place resulting in the effect you're getting.

@alexpott , I'm recommending to use the getText method on the BODY NodeElement (PHP class in Mink) of the document, not the whole document. This way you won't get any extra stuff (at least I hope so).

Code below (maybe not working) is how I'll be getting the contents of a document.

$body_text = $session->getPage()->find('xpath', '//body')->getText();

@alexpott
Copy link
Contributor Author

@aik099 body can contain script tags. Adding script tags just before closing the body tag is often advocated for performance reasons.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants