-
-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v0.2.9 #25
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
For easy copy-paste from Scrapy/parsel code when needed :)
This will force lxml to keep cdata while parsing html if you want
It's not supported in the new version of Playwright and is problematic in some situations while being slower than newer versions.
If possible, otherwise returns `Adaptor.html_content`
So now you control the logging and the debugging from the shell through the logger with the name 'scrapling'
Dropped browserforge from the requirements here so its version gets controlled by camoufox
For the issue with geoip
By caching the `StaticEngine` class instance
Might give a slight performance increase too
The first step in fully supporting async
This will cause a slight performance increase
…d the async function later The caching will give a slight performance increase with bulk requests
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What's changed
New features
AsyncFetcher
class version ofFetcher
, and bothStealthyFetcher
andPlayWrightFetcher
have a new method calledasync_fetch
with the same options.Now the
StealthyFetcher
class has thegeoip
argument in its fetch methods which when enabled makes the class automatically use IP's longitude, latitude, timezone, country, and locale, then spoof the WebRTC IP address. It will also calculate and spoof the browser's language based on the distribution of language speakers in the target region.Added the
retries
argument toFetcher
/AsyncFetcher
classes so now you can set the number of retries of each request done byhttpx
.Added the
url_join
method toAdaptor
and Fetchers which takes a relative URL and joins it with the current URL to generate an absolute full URL!Added the
keep_cdata
method toAdaptor
and Fetchers to stop the parser from removing cdata when needed.Now
Adaptor
/Response
body
method returns the raw HTML response when possible (without processing it in the library).Adding logging for the
Response
class so now when you use the Fetchers you will get a log that gives info about the response you got.Example:
Now using all standard string methods on a
TextHandler
like.replace()
will result in anotherTextHandler
. It was returning the standard string before.Big improvements to speed across the library and improvements to stealth in Fetchers classes overall.
Due to refactoring a lot of the code and using caching at the right positions, now doing requests in bulk will have a big speed increase.
Breaking changes
Now the support for Python 3.8 has been dropped. (Mainly because Playwright stopped supporting it but it was a problematic version anyway)
The
debug
argument has been removed from all the library, now if you want to set the library to debugging, do this after importing the library:Bugs Squashed
Quality of life changes
scrapling
for easier and cleaner control. We were using the root logger before.Note
A friendly reminder that maintaining and improving
Scrapling
takes a lot of time and effort which I have been happily doing for months even though it's becoming harder. So, if you likeScrapling
and want it to keep improving, you can help by supporting me through the Sponsor button.