Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v2.0 #157

Open
wants to merge 12 commits into
base: main
Choose a base branch
from
Open

v2.0 #157

wants to merge 12 commits into from

Conversation

otsch
Copy link
Member

@otsch otsch commented Aug 6, 2024

  • Restrict retrying cached error responses
    The HttpLoader::retryCachedErrorResponses() method now returns an instance of the new RetryManager class, providing the methods only() and except() that can be used to restrict retries to certain HTTP response status codes.
  • Remove the methods Step::addToResult(), Step::addLaterToResult() and everything connected to them.
  • It is no longer possible to provide multiple loaders via the crawler. Instead you can now manually provide customized loaders directly to steps via the new Step::withLoader() method.
  • Remove the PaginatorInterface and the old version of the AbstractPaginator. Also remove an unnecessary argument from the processLoaded method and the default implementation of thegetNextRequest() method (child classes have to implement it themselves.
  • Remove 4 deprecated methods interacting with the headless browser helper. Users can directly call the methods on the browser helper.
  • Remove the Microseconds util class, which is now part of the crwlr/utils package.

otsch added 12 commits August 6, 2024 00:33
The `HttpLoader::retryCachedErrorResponses()` method now returns an
instance of the new `RetryManager` class, providing the methods `only()`
and `except()` that can be used to restrict retries to certain HTTP
response status codes.
Start v2.0 block and improve the note for the retry cached error
responses change.
Remove the methods addToResult(), addLaterToResult() and everything
connected to that.

Also, it is no longer possible to provide multiple loaders via the
crawler. Instead you can now manually provide customized loaders
directly to steps via the new withLoader() method.

Further, also remove the Microseconds util class, which is now part of
the crwlr/utils package.
Removes the `PaginatorInterface` and the old version of the
`AbstractPaginator`. Also remove an unnecessary argument from the
`processLoaded` method and the default implementation of the
`getNextRequest()` method (child classes have to implement it
themselves.
Remove 4 deprecated methods interacting with the headless browser
helper. Users can directly call the methods on the browser helper.

Also remove deprecated method `RespondedRequest::cacheKeyFromRequest()`.
Add new methods `FileCache::prolong()` and `FileCache::prolongAll()` to
allow prolonging the time to live for cached responses.
Move most functionality from `Step` to `BaseStep` and make it work with
`Group` steps.
Add info about changed `Crawler::addStep()` signature and removal of the
`UnknownLoaderKeyException`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant