Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v0.2.1 #8

Merged
merged 15 commits into from
Nov 15, 2024
Merged

v0.2.1 #8

merged 15 commits into from
Nov 15, 2024

Conversation

D4Vinci
Copy link
Owner

@D4Vinci D4Vinci commented Nov 15, 2024

What's changed

New features

  1. Now the Response object returned from all fetchers is the same as the Adaptor object except it has these added attributes: status, reason, cookies, headers, and request_headers. All cookies, headers, and request_headers are always of type dictionary.
    So your code can now become like:
    >> from scrapling import Fetcher
    >> page = Fetcher().get('https://example.com', stealthy_headers=True)
    >> print(page.status)
    200
    >> products = page.css('.product')
    Instead of before
    >> from scrapling import Fetcher
    >> fetcher = Fetcher().get('https://example.com', stealthy_headers=True)
    >> print(fetcher.status)
    200
    >> page = fetcher.adaptor
    >> products = page.css('.product')
    But I have left the .adaptor property working for backward compatibility.
  2. Now both the StealthyFetcher and PlayWrightFetcher classes can take a proxy argument with the fetch method which accepts a string or a dictionary.
  3. Now the StealthyFetcher class has the os_randomize argument with the fetch method. If enabled, Scrapling will randomize the OS fingerprints used. The default is Scrapling matching the fingerprints with the current OS.

Bugs Squashed

  1. Fixed a bug that happens while passing headers with the Fetcher class.
  2. Fixed a bug with parsing JSON responses passed from the fetcher-type classes.

Quality of life changes

  1. The text functionality behavior was to try to remove HTML comments before returning the text but that induced errors in some cases and made the code more complicated than needed. Now it has reverted to the default lxml behavior, you will notice a slight speed increase to all operations that counts on elements' text like selectors. Now if you want Scrapling to remove HTML comments from elements before returning the text to avoid the weird text-splitting behavior that's in lxml/parsel/scrapy, just keep the keep_comments argument set to True as it is by default.

Important

This is a friendly reminder that maintaining and improving Scrapling takes a lot of time and effort which I have been happily doing for months.
So, if you like Scrapling and want it to keep improving, you can help by supporting me through the Sponsor button.

@D4Vinci D4Vinci merged commit 773fcd5 into main Nov 15, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant