Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tips on how to make Aioprocessing faster. #46

Open
XtrollerX opened this issue Aug 19, 2024 · 0 comments
Open

Tips on how to make Aioprocessing faster. #46

XtrollerX opened this issue Aug 19, 2024 · 0 comments

Comments

@XtrollerX
Copy link

XtrollerX commented Aug 19, 2024

decided to test out 2 sets of codes, each of them is tasked to run a hundred requests asynchronously and print out the website title for each request. After finishing all of the requests, a time will be printed out indicating how long the requests took.

AsyncIO

import asyncio
import time
import asyncio
from playwright.async_api import async_playwright, Browser

async def worker(browser: Browser, i: int):
    context = await browser.new_context()
    page = await context.new_page()
    await page.goto("https://www.google.com/")
    print(await page.title())
    await context.close()


async def main():
    async with async_playwright() as playwright:
        browser = await playwright.chromium.launch(channel="chrome")
        await asyncio.wait(
            [asyncio.create_task(worker(browser, i)) for i in range(100)],
            return_when=asyncio.ALL_COMPLETED,
        )
        await browser.close()

start = time.perf_counter()
asyncio.run(main())
end = time.perf_counter()
print(end - start)

Time taken: 8.336619800000335 seconds

Aiomultiprocesses

async def run(playwright, url):
    chromium = playwright.chromium # or "firefox" or "webkit".
    browser = await chromium.launch(channel="chrome")
    page = await browser.new_page()
    await page.goto(url)
   
    print(await page.title())
    await browser.close()

async def mains(url):
    async with async_playwright() as playwright:
        return await run(playwright,url)

async def main():
    urls = ["https://www.google.com/" for i in range(100)]
    async with Pool() as pool:
        async for result in pool.map(mains, urls):
            None


if __name__ == '__main__':
    # Python 3.7
    start = time.perf_counter()
    asyncio.run(main())
    end = time.perf_counter()
    print(end - start,end="")

Time taken: 17.115715199999613 seconds

I am unsure if I am doing this correctly, however these are my questions.

  1. Am I doing it correctly? If not, how can I make it go better and go faster?
  2. Should I just stick with AsyncIo?
  3. If I were to scale up to thousands of requests, how should I do it with playwright or any other scraping libraries?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant