Skip to content

Latest commit

 

History

History
1201 lines (611 loc) · 22.1 KB

HtmlPageCrawler.md

File metadata and controls

1201 lines (611 loc) · 22.1 KB

Wa72\HtmlPageDom\HtmlPageCrawler

Extends \Symfony\Component\DomCrawler\Crawler by adding tree manipulation functions for HTML documents inspired by jQuery such as setInnerHtml(), css(), append(), prepend(), before(), addClass(), removeClass()

Implements:

Countable, IteratorAggregate, Traversable, Stringable

Extend:

Symfony\Component\DomCrawler\Crawler

Methods

Name Description
__clone
__get
__toString
addClass Adds the specified class(es) to each element in the set of matched elements.
addHtmlFragment
after Insert content, specified by the parameter, after each element in the set of matched elements.
append Insert HTML content as child nodes of each element after existing children
appendTo Insert every element in the set of matched elements to the end of the target.
before Insert content, specified by the parameter, before each element in the set of matched elements.
create Get an HtmlPageCrawler object from a HTML string, DOMNode, DOMNodeList or HtmlPageCrawler
css Get one CSS style property of the first element or set it for all elements in the list
getAttribute Returns the attribute value of the first node of the list.
getCombinedText Get the combined text contents of each element in the set of matched elements, including their descendants.
getDOMDocument get ownerDocument of the first element
getInnerHtml Alias for Crawler::html() for naming consistency with setInnerHtml()
getStyle get one CSS style property of the first element
hasClass Determine whether any of the matched elements are assigned the given class.
insertAfter Insert every element in the set of matched elements after the target.
insertBefore Insert every element in the set of matched elements before the target.
isHtmlDocument checks whether the first node contains a complete html document (as opposed to a document fragment)
makeClone Create a deep copy of the set of matched elements.
makeEmpty Removes all child nodes and text from all nodes in set
prepend Insert content, specified by the parameter, to the beginning of each element in the set of matched elements.
prependTo Insert every element in the set of matched elements to the beginning of the target.
remove Remove the set of matched elements from the DOM.
removeAttr Remove an attribute from each element in the set of matched elements.
removeAttribute Remove an attribute from each element in the set of matched elements.
removeClass Remove a class from each element in the list
replaceAll Replace each target element with the set of matched elements.
replaceWith Replace each element in the set of matched elements with the provided new content and return the set of elements that was removed.
saveHTML Get the HTML code fragment of all elements and their contents.
setAttribute Sets an attribute on each element
setInnerHtml Set the HTML contents of each element
setStyle set one CSS style property for all elements in the list
setText Set the text contents of the matched elements.
toggleClass Add or remove one or more classes from each element in the set of matched elements, depending the class’s presence.
unwrap Remove the parents of the set of matched elements from the DOM, leaving the matched elements in their place.
unwrapInner Remove the matched elements, but promote the children to take their place.
wrap Wrap an HTML structure around each element in the set of matched elements
wrapAll Wrap an HTML structure around all elements in the set of matched elements.
wrapInner Wrap an HTML structure around the content of each element in the set of matched elements.

Inherited methods

Name Description
__construct -
add Adds a node to the current list of nodes.
addContent Adds HTML/XML content.
addDocument Adds a \DOMDocument to the list of nodes.
addHtmlContent Adds an HTML content to the list of nodes.
addNode Adds a \DOMNode instance to the list of nodes.
addNodeList Adds a \DOMNodeList to the list of nodes.
addNodes Adds an array of \DOMNode instances to the list of nodes.
addXmlContent Adds an XML content to the list of nodes.
ancestors Returns the ancestors of the current selection.
attr Returns the attribute value of the first node of the list.
children Returns the children nodes of the current selection.
clear Removes all the nodes.
closest Return first parents (heading toward the document root) of the Element that matches the provided selector.
count -
each Calls an anonymous function on each node of the list.
eq Returns a node given its position in the node list.
evaluate Evaluates an XPath expression.
extract Extracts information from the list of nodes.
filter Filters the list of nodes with a CSS selector.
filterXPath Filters the list of nodes with an XPath expression.
first Returns the first node of the current selection.
form Returns a Form object for the first node in the list.
getBaseHref Returns base href.
getIterator -
getNode -
getUri Returns the current URI.
html Returns the first node of the list as HTML.
image Returns an Image object for the first node in the list.
images Returns an array of Image objects for the nodes in the list.
innerText Returns only the inner text that is the direct descendent of the current node, excluding any child nodes.
last Returns the last node of the current selection.
link Returns a Link object for the first node in the list.
links Returns an array of Link objects for the nodes in the list.
matches -
nextAll Returns the next siblings nodes of the current selection.
nodeName Returns the node name of the first node of the list.
outerHtml -
previousAll Returns the previous sibling nodes of the current selection.
reduce Reduces the list of nodes by calling an anonymous function.
registerNamespace -
selectButton Selects a button by name or alt value for images.
selectImage Selects images by alt value.
selectLink Selects links by name or alt value for clickable images.
setDefaultNamespacePrefix Overloads a default namespace prefix to be used with XPath and CSS expressions.
siblings Returns the siblings nodes of the current selection.
slice Slices the list of nodes by $offset and $length.
text Returns the text of the first node of the list.
xpathLiteral Converts string for XPath expressions.

HtmlPageCrawler::__clone

Description

 __clone (void)

Parameters

This function has no parameters.

Return Values

void


HtmlPageCrawler::__get

Description

 __get (void)

Parameters

This function has no parameters.

Return Values

void


HtmlPageCrawler::__toString

Description

 __toString (void)

Parameters

This function has no parameters.

Return Values

void


HtmlPageCrawler::addClass

Description

public addClass (string $name)

Adds the specified class(es) to each element in the set of matched elements.

Parameters

  • (string) $name : One or more space-separated classes to be added to the class attribute of each matched element.

Return Values

\HtmlPageCrawler

$this for chaining


HtmlPageCrawler::addHtmlFragment

Description

 addHtmlFragment (void)

Parameters

This function has no parameters.

Return Values

void


HtmlPageCrawler::after

Description

public after (string|\HtmlPageCrawler|\DOMNode|\DOMNodeList $content)

Insert content, specified by the parameter, after each element in the set of matched elements.

Parameters

  • (string|\HtmlPageCrawler|\DOMNode|\DOMNodeList) $content

Return Values

\HtmlPageCrawler

$this for chaining


HtmlPageCrawler::append

Description

public append (string|\HtmlPageCrawler|\DOMNode|\DOMNodeList $content)

Insert HTML content as child nodes of each element after existing children

Parameters

  • (string|\HtmlPageCrawler|\DOMNode|\DOMNodeList) $content : HTML code fragment or DOMNode to append

Return Values

\HtmlPageCrawler

$this for chaining


HtmlPageCrawler::appendTo

Description

public appendTo (string|\HtmlPageCrawler|\DOMNode|\DOMNodeList $element)

Insert every element in the set of matched elements to the end of the target.

Parameters

  • (string|\HtmlPageCrawler|\DOMNode|\DOMNodeList) $element

Return Values

\Wa72\HtmlPageDom\HtmlPageCrawler

A new Crawler object containing all elements appended to the target elements


HtmlPageCrawler::before

Description

public before (string|\HtmlPageCrawler|\DOMNode|\DOMNodeList $content)

Insert content, specified by the parameter, before each element in the set of matched elements.

Parameters

  • (string|\HtmlPageCrawler|\DOMNode|\DOMNodeList) $content

Return Values

\HtmlPageCrawler

$this for chaining


HtmlPageCrawler::create

Description

public static create (string|\HtmlPageCrawler|\DOMNode|\DOMNodeList|array $content)

Get an HtmlPageCrawler object from a HTML string, DOMNode, DOMNodeList or HtmlPageCrawler

This is the equivalent to jQuery's $() function when used for wrapping DOMNodes or creating DOMElements from HTML code.

Parameters

  • (string|\HtmlPageCrawler|\DOMNode|\DOMNodeList|array) $content

Return Values

\HtmlPageCrawler


HtmlPageCrawler::css

Description

public css (string $key, null|string $value)

Get one CSS style property of the first element or set it for all elements in the list

Function is here for compatibility with jQuery; it is the same as getStyle() and setStyle()

Parameters

  • (string) $key : The name of the style property
  • (null|string) $value : The CSS value to set, or NULL to get the current value

Return Values

\HtmlPageCrawler|string

If no param is provided, returns the CSS styles of the first element


HtmlPageCrawler::getAttribute

Description

public getAttribute (string $name)

Returns the attribute value of the first node of the list.

This is just an alias for attr() for naming consistency with setAttribute()

Parameters

  • (string) $name : The attribute name

Return Values

string|null

The attribute value or null if the attribute does not exist

Throws Exceptions

\InvalidArgumentException

When current node is empty


HtmlPageCrawler::getCombinedText

Description

public getCombinedText (void)

Get the combined text contents of each element in the set of matched elements, including their descendants.

This is what the jQuery text() function does, contrary to the Crawler::text() method that returns only
the text of the first node.

Parameters

This function has no parameters.

Return Values

string


HtmlPageCrawler::getDOMDocument

Description

public getDOMDocument (void)

get ownerDocument of the first element

Parameters

This function has no parameters.

Return Values

\DOMDocument|null


HtmlPageCrawler::getInnerHtml

Description

public getInnerHtml (void)

Alias for Crawler::html() for naming consistency with setInnerHtml()

Parameters

This function has no parameters.

Return Values

string


HtmlPageCrawler::getStyle

Description

public getStyle (string $key)

get one CSS style property of the first element

Parameters

  • (string) $key : name of the property

Return Values

string|null

value of the property


HtmlPageCrawler::hasClass

Description

public hasClass (string $name)

Determine whether any of the matched elements are assigned the given class.

Parameters

  • (string) $name

Return Values

bool


HtmlPageCrawler::insertAfter

Description

public insertAfter (string|\HtmlPageCrawler|\DOMNode|\DOMNodeList $element)

Insert every element in the set of matched elements after the target.

Parameters

  • (string|\HtmlPageCrawler|\DOMNode|\DOMNodeList) $element

Return Values

\Wa72\HtmlPageDom\HtmlPageCrawler

A new Crawler object containing all elements appended to the target elements


HtmlPageCrawler::insertBefore

Description

public insertBefore (string|\HtmlPageCrawler|\DOMNode|\DOMNodeList $element)

Insert every element in the set of matched elements before the target.

Parameters

  • (string|\HtmlPageCrawler|\DOMNode|\DOMNodeList) $element

Return Values

\Wa72\HtmlPageDom\HtmlPageCrawler

A new Crawler object containing all elements appended to the target elements


HtmlPageCrawler::isHtmlDocument

Description

public isHtmlDocument (void)

checks whether the first node contains a complete html document (as opposed to a document fragment)

Parameters

This function has no parameters.

Return Values

bool


HtmlPageCrawler::makeClone

Description

public makeClone (void)

Create a deep copy of the set of matched elements.

Equivalent to clone() in jQuery (clone is not a valid PHP function name)

Parameters

This function has no parameters.

Return Values

\HtmlPageCrawler


HtmlPageCrawler::makeEmpty

Description

public makeEmpty (void)

Removes all child nodes and text from all nodes in set

Equivalent to jQuery's empty() function which is not a valid function name in PHP

Parameters

This function has no parameters.

Return Values

\HtmlPageCrawler

$this


HtmlPageCrawler::prepend

Description

public prepend (string|\HtmlPageCrawler|\DOMNode|\DOMNodeList $content)

Insert content, specified by the parameter, to the beginning of each element in the set of matched elements.

Parameters

  • (string|\HtmlPageCrawler|\DOMNode|\DOMNodeList) $content : HTML code fragment

Return Values

\HtmlPageCrawler

$this for chaining


HtmlPageCrawler::prependTo

Description

public prependTo (string|\HtmlPageCrawler|\DOMNode|\DOMNodeList $element)

Insert every element in the set of matched elements to the beginning of the target.

Parameters

  • (string|\HtmlPageCrawler|\DOMNode|\DOMNodeList) $element

Return Values

\Wa72\HtmlPageDom\HtmlPageCrawler

A new Crawler object containing all elements prepended to the target elements


HtmlPageCrawler::remove

Description

public remove (void)

Remove the set of matched elements from the DOM.

(as opposed to Crawler::clear() which detaches the nodes only from Crawler
but leaves them in the DOM)

Parameters

This function has no parameters.

Return Values

void


HtmlPageCrawler::removeAttr

Description

public removeAttr (string $name)

Remove an attribute from each element in the set of matched elements.

Alias for removeAttribute for compatibility with jQuery

Parameters

  • (string) $name

Return Values

\HtmlPageCrawler


HtmlPageCrawler::removeAttribute

Description

public removeAttribute (string $name)

Remove an attribute from each element in the set of matched elements.

Parameters

  • (string) $name

Return Values

\HtmlPageCrawler


HtmlPageCrawler::removeClass

Description

public removeClass (string $name)

Remove a class from each element in the list

Parameters

  • (string) $name

Return Values

\HtmlPageCrawler

$this for chaining


HtmlPageCrawler::replaceAll

Description

public replaceAll (string|\HtmlPageCrawler|\DOMNode|\DOMNodeList $element)

Replace each target element with the set of matched elements.

Parameters

  • (string|\HtmlPageCrawler|\DOMNode|\DOMNodeList) $element

Return Values

\Wa72\HtmlPageDom\HtmlPageCrawler

A new Crawler object containing all elements appended to the target elements


HtmlPageCrawler::replaceWith

Description

public replaceWith (string|\HtmlPageCrawler|\DOMNode|\DOMNodeList $content)

Replace each element in the set of matched elements with the provided new content and return the set of elements that was removed.

Parameters

  • (string|\HtmlPageCrawler|\DOMNode|\DOMNodeList) $content

Return Values

\Wa72\HtmlPageDom\HtmlPageCrawler

$this for chaining


HtmlPageCrawler::saveHTML

Description

public saveHTML (void)

Get the HTML code fragment of all elements and their contents.

If the first node contains a complete HTML document return only
the full code of this document.

Parameters

This function has no parameters.

Return Values

string

HTML code (fragment)


HtmlPageCrawler::setAttribute

Description

public setAttribute (string $name, string $value)

Sets an attribute on each element

Parameters

  • (string) $name
  • (string) $value

Return Values

\HtmlPageCrawler

$this for chaining


HtmlPageCrawler::setInnerHtml

Description

public setInnerHtml (string|\HtmlPageCrawler|\DOMNode|\DOMNodeList $content)

Set the HTML contents of each element

Parameters

  • (string|\HtmlPageCrawler|\DOMNode|\DOMNodeList) $content : HTML code fragment

Return Values

\HtmlPageCrawler

$this for chaining


HtmlPageCrawler::setStyle

Description

public setStyle (string $key, string $value)

set one CSS style property for all elements in the list

Parameters

  • (string) $key : name of the property
  • (string) $value : value of the property

Return Values

\HtmlPageCrawler

$this for chaining


HtmlPageCrawler::setText

Description

public setText (string $text)

Set the text contents of the matched elements.

Parameters

  • (string) $text

Return Values

\HtmlPageCrawler


HtmlPageCrawler::toggleClass

Description

public toggleClass (string $classname)

Add or remove one or more classes from each element in the set of matched elements, depending the class’s presence.

Parameters

  • (string) $classname : One or more classnames separated by spaces

Return Values

\Wa72\HtmlPageDom\HtmlPageCrawler

$this for chaining


HtmlPageCrawler::unwrap

Description

public unwrap (void)

Remove the parents of the set of matched elements from the DOM, leaving the matched elements in their place.

Parameters

This function has no parameters.

Return Values

\Wa72\HtmlPageDom\HtmlPageCrawler

$this for chaining


HtmlPageCrawler::unwrapInner

Description

public unwrapInner (void)

Remove the matched elements, but promote the children to take their place.

Parameters

This function has no parameters.

Return Values

\Wa72\HtmlPageDom\HtmlPageCrawler

$this for chaining


HtmlPageCrawler::wrap

Description

public wrap (string|\HtmlPageCrawler|\DOMNode $wrappingElement)

Wrap an HTML structure around each element in the set of matched elements

The HTML structure must contain only one root node, e.g.:
Works:


Does not work:

Parameters

  • (string|\HtmlPageCrawler|\DOMNode) $wrappingElement

Return Values

\HtmlPageCrawler

$this for chaining


HtmlPageCrawler::wrapAll

Description

public wrapAll (string|\HtmlPageCrawler|\DOMNode|\DOMNodeList $content)

Wrap an HTML structure around all elements in the set of matched elements.

Parameters

  • (string|\HtmlPageCrawler|\DOMNode|\DOMNodeList) $content

Return Values

\Wa72\HtmlPageDom\HtmlPageCrawler

$this for chaining

Throws Exceptions

\LogicException


HtmlPageCrawler::wrapInner

Description

public wrapInner (string|\HtmlPageCrawler|\DOMNode|\DOMNodeList $content)

Wrap an HTML structure around the content of each element in the set of matched elements.

Parameters

  • (string|\HtmlPageCrawler|\DOMNode|\DOMNodeList) $content

Return Values

\Wa72\HtmlPageDom\HtmlPageCrawler

$this for chaining