v0.2.1 #8

D4Vinci · 2024-11-15T16:21:45Z

What's changed

New features

Now the Response object returned from all fetchers is the same as the Adaptor object except it has these added attributes: status, reason, cookies, headers, and request_headers. All cookies, headers, and request_headers are always of type dictionary.
So your code can now become like:

>> from scrapling import Fetcher
>> page = Fetcher().get('https://example.com', stealthy_headers=True)
>> print(page.status)
200
>> products = page.css('.product')

Instead of before

>> from scrapling import Fetcher
>> fetcher = Fetcher().get('https://example.com', stealthy_headers=True)
>> print(fetcher.status)
200
>> page = fetcher.adaptor
>> products = page.css('.product')

But I have left the .adaptor property working for backward compatibility.

Now both the StealthyFetcher and PlayWrightFetcher classes can take a proxy argument with the fetch method which accepts a string or a dictionary.
Now the StealthyFetcher class has the os_randomize argument with the fetch method. If enabled, Scrapling will randomize the OS fingerprints used. The default is Scrapling matching the fingerprints with the current OS.

Bugs Squashed

Fixed a bug that happens while passing headers with the Fetcher class.
Fixed a bug with parsing JSON responses passed from the fetcher-type classes.

Quality of life changes

The text functionality behavior was to try to remove HTML comments before returning the text but that induced errors in some cases and made the code more complicated than needed. Now it has reverted to the default lxml behavior, you will notice a slight speed increase to all operations that counts on elements' text like selectors. Now if you want Scrapling to remove HTML comments from elements before returning the text to avoid the weird text-splitting behavior that's in lxml/parsel/scrapy, just keep the keep_comments argument set to True as it is by default.

Important

This is a friendly reminder that maintaining and improving Scrapling takes a lot of time and effort which I have been happily doing for months.
So, if you like Scrapling and want it to keep improving, you can help by supporting me through the Sponsor button.

Keeping `dev` up to date with `main` branch

The logic to remove comment tags before returning text was causing errors sometimes so from now on leave `keep_comments` set to True better as it is.

but with added attributes

…etcher

…ns/methods

D4Vinci added 15 commits November 13, 2024 22:51

Merge pull request #5 from D4Vinci/main

4e5f7d3

Keeping `dev` up to date with `main` branch

StaticEngine - Fix the headers bug

ab7b0d4

Parser - New logic to handle JSON responses passed from Fetchers

d4b896b

Better logic to handle json responses

eaa7da2

Merge remote-tracking branch 'origin/main' into dev

81f4e18

Update tests Github action

ea81e02

Return to the default text behaviour

b0c2b18

The logic to remove comment tags before returning text was causing errors sometimes so from now on leave `keep_comments` set to True better as it is.

Making the Response returned from all fetchers same as Adaptor object

095d50c

but with added attributes

Reflecting the new changes in the README and adding small notes

c334755

Pumping the version up to 0.2.1

7dd9d30

Update README.md

ae7959b

Rephrasing some parts

b993adb

Adding the proxy support to browser-based Fetchers

2abb702

Adding the option to randomize the OS fingerprints with the StealthyF…

5e8275c

…etcher

Correcting the return string for the Response object in all functio…

90e38af

…ns/methods

D4Vinci merged commit 773fcd5 into main Nov 15, 2024
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.2.1 #8

v0.2.1 #8

D4Vinci commented Nov 15, 2024

v0.2.1 #8

v0.2.1 #8

Conversation

D4Vinci commented Nov 15, 2024

What's changed

New features

Bugs Squashed

Quality of life changes