Not working with some sites #4

Gujal00 · 2024-06-08T22:54:04Z

Thanks a lot for developing and sharing this solution, it works nicely for a lot of sites.
However there are some sites where it doesn't solve the challenge and gives up with {"code":500,"message":"Request Timeout"}

This is one of the sites, can you please check and advise
https://apnetv.to/Hindi-Serials

The text was updated successfully, but these errors were encountered:

zfcsoftware · 2024-06-09T06:34:29Z

It seems to be caused by devtools detector. Normally the browser has been tested on devtools detector and is not caught, but additional precautions must have been taken. I will investigate this issue in more detail, but you can scrape it without any problem with the following method.

Step 1:

cf-clearance-scraper/module/scrape.js

Line 93 in 4a3bb4e

waitUntil: ['load', 'networkidle0']

Replace this line with the following.

waitUntil: 'domcontentloaded'

Step 2:
Interfere with the request. Add the following code in the field below.

cf-clearance-scraper/module/browser.js

Line 113 in 4a3bb4e

   const { RequestInterceptionManager } = await import('puppeteer-intercept-and-modify-requests')
       const client = await page.target().createCDPSession()
       const interceptManager = new RequestInterceptionManager(client)
       await interceptManager.intercept({
           urlPattern: `https://apnetv.to/Hindi-Serials`,
           resourceType: "Document",
           modifyResponse({ body }) {
             return {
                      body: body.replace(`window.location.href = 'https://apnetv.to/indexnow.html';`,''),
             };
           },
         });

Gujal00 · 2024-06-09T06:59:56Z

Thanks a lot for taking the time to look at this issue.
I made the changes and ran on my ubuntu server vm. Looks like even though i did npm install it is missing some dependencies

gujal@tux:~/cf-clearance-scraper$ npm run start

> [email protected] start
> node index.js

Server running on port 3000
Failed to launch the browser process! undefined
[1590:1590:0609/185637.942529:ERROR:ozone_platform_x11.cc(243)] Missing X server or $DISPLAY
[1590:1590:0609/185637.942605:ERROR:env.cc(258)] The platform failed to initialize.  Exiting.


TROUBLESHOOTING: https://pptr.dev/troubleshooting

Anyway i will wait for you to check and release the docker image, then will check with the image as it will fully self contained. Thanks

zfcsoftware · 2024-06-09T07:03:54Z

puppeteer-intercept-and-modify-requests

npm i puppeteer-intercept-and-modify-requests
Just run it. After making the changes and running it, scraping will be available on the relevant site.
Thank you for your feedback.

jairoxyz · 2024-06-09T18:51:32Z

I added your code and tried it and it get's cf_clearance for that site. You think there is a way to intercept dev tool detector for any site, rather than hard-coding the intercept and modify for specific sites?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Not working with some sites #4

Not working with some sites #4

Gujal00 commented Jun 8, 2024

zfcsoftware commented Jun 9, 2024

Gujal00 commented Jun 9, 2024

zfcsoftware commented Jun 9, 2024

jairoxyz commented Jun 9, 2024

Not working with some sites #4

Not working with some sites #4

Comments

Gujal00 commented Jun 8, 2024

zfcsoftware commented Jun 9, 2024

Gujal00 commented Jun 9, 2024

zfcsoftware commented Jun 9, 2024

jairoxyz commented Jun 9, 2024