How to Scrape Pages That Require Scrolling? #991

duartemvix · 2025-04-16T02:51:16Z

duartemvix
Apr 16, 2025

Any tips for scraping first page content of addresses like this?

By default it returns an empty result in result.markdown when running asynchronously.

My goal with this scraper project is to scrape multiple addresses (not worried with anti-bot so much) that aren't really big websites. I foresee that I might run into a few problems however for pages that require a bit of scrolling. Unless all the HTML is available on page load before scrolling?

I've tried, btw, the scroll_full_page=True setting on crawler configuration, but that also yields a few problems with other pages, breaking my flexibility and modularity of this application. I was getting the crawler to raise errors and exceptions completely by trying to mimic an user as in using settings like magic=True or simulate_user=True, and even adjust_viewport_to_content=True.

This is a snippet of my current script.

async with AsyncWebCrawler(config=browser_cfg) as crawler:
    async for result in await crawler.arun_many(urls=urls, config=run_cfg):
        markdown_text = result.markdown.strip()
        # summary = "[CONTENT TOO SHORT]"
        # summary = await summarize_with_openrouter(markdown_text) if len(markdown_text) >= 250
    
        if result.success:
            results.append({
                "url": result.url,
                "summary": result.markdown[:200], # replace with summary after testing
                "success": result.success,
                "error": result.error_message if not result.success else None,
                "status_code": result.status_code
            })
        else:
            results.append({
                "url": result.url,
                "summary": "[PROCESSING ERROR]",
                "success": False,
                "error": str(inner_err),
                "status_code": 500
            })

I'd do things like summarize the content with a LLM in the next step, but I'm failing to get any content at all during crawling, makes sense?

I'd love tips to be able to scrape as many URLs as I can. I already have a good success though, but this would give me a nice improvement.

Thank you!

hafezparast · 2026-03-28T01:22:49Z

hafezparast
Mar 28, 2026
Sponsor

The issue is that JS-heavy sites like recess.studio load content dynamically on scroll, so the HTML is empty at initial load. Here are a few approaches, from simplest to most flexible:

Option 1: scan_full_page with a delay (simplest)

config = CrawlerRunConfig(
    scan_full_page=True,
    scroll_delay=0.5,                  # wait between scroll steps
    delay_before_return_html=2.0,      # wait after scrolling finishes
    wait_until="networkidle",          # wait for network to settle
)

Option 2: Targeted JS scrolling (more control)
If scan_full_page causes issues on some pages, use js_code to scroll just enough:

config = CrawlerRunConfig(
    js_code="""
    await new Promise(r => {
        let totalHeight = 0;
        const distance = 300;
        const timer = setInterval(() => {
            window.scrollBy(0, distance);
            totalHeight += distance;
            if (totalHeight >= document.body.scrollHeight) {
                clearInterval(timer);
                r();
            }
        }, 200);
    });
    """,
    delay_before_return_html=2.0,
    wait_until="domcontentloaded",
)

Option 3: Per-URL configs with arun_many (most flexible)
Since you're crawling multiple URLs with different needs, you can pass a list of configs:

configs = []
for url in urls:
    if needs_scrolling(url):  # your logic
        configs.append(CrawlerRunConfig(scan_full_page=True, scroll_delay=0.5))
    else:
        configs.append(CrawlerRunConfig())

results = await crawler.arun_many(urls=urls, config=configs)

Also note: the param is scan_full_page, not scroll_full_page — make sure you're using the correct name. And magic/simulate_user were removed in newer versions, so if you're on an older version, upgrading to v0.8.6 may also help with stability.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to Scrape Pages That Require Scrolling? #991

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

How to Scrape Pages That Require Scrolling? #991

Uh oh!

Uh oh!

duartemvix Apr 16, 2025

Replies: 1 comment

Uh oh!

hafezparast Mar 28, 2026 Sponsor

duartemvix
Apr 16, 2025

hafezparast
Mar 28, 2026
Sponsor