Skip to content

v0.2.1

Compare
Choose a tag to compare
@D4Vinci D4Vinci released this 15 Nov 16:27
· 141 commits to main since this release
773fcd5

What's changed

New features

  1. Now the Response object returned from all fetchers is the same as the Adaptor object except it has these added attributes: status, reason, cookies, headers, and request_headers. All cookies, headers, and request_headers are always of type dictionary.
    So your code can now become like:
    >> from scrapling import Fetcher
    >> page = Fetcher().get('https://example.com', stealthy_headers=True)
    >> print(page.status)
    200
    >> products = page.css('.product')
    Instead of before
    >> from scrapling import Fetcher
    >> fetcher = Fetcher().get('https://example.com', stealthy_headers=True)
    >> print(fetcher.status)
    200
    >> page = fetcher.adaptor
    >> products = page.css('.product')
    But I have left the .adaptor property working for backward compatibility.
  2. Now both the StealthyFetcher and PlayWrightFetcher classes can take a proxy argument with the fetch method which accepts a string or a dictionary.
  3. Now the StealthyFetcher class has the os_randomize argument with the fetch method. If enabled, Scrapling will randomize the OS fingerprints used. The default is Scrapling matching the fingerprints with the current OS.

Bugs Squashed

  1. Fixed a bug that happens while passing headers with the Fetcher class.
  2. Fixed a bug with parsing JSON responses passed from the fetcher-type classes.

Quality of life changes

  1. The text functionality behavior was to try to remove HTML comments before returning the text but that induced errors in some cases and made the code more complicated than needed. Now it has reverted to the default lxml behavior, you will notice a slight speed increase to all operations that counts on elements' text like selectors. Now if you want Scrapling to remove HTML comments from elements before returning the text to avoid the weird text-splitting behavior that's in lxml/parsel/scrapy, just keep the keep_comments argument set to True as it is by default.

Note

A friendly reminder that maintaining and improving Scrapling takes a lot of time and effort which I have been happily doing for months even though it's becoming harder. So, if you like Scrapling and want it to keep improving, you can help by supporting me through the Sponsor button.