You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
decided to test out 2 sets of codes, each of them is tasked to run a hundred requests asynchronously and print out the website title for each request. After finishing all of the requests, a time will be printed out indicating how long the requests took.
AsyncIO
import asyncio
import time
import asyncio
from playwright.async_api import async_playwright, Browser
async def worker(browser: Browser, i: int):
context = await browser.new_context()
page = await context.new_page()
await page.goto("https://www.google.com/")
print(await page.title())
await context.close()
async def main():
async with async_playwright() as playwright:
browser = await playwright.chromium.launch(channel="chrome")
await asyncio.wait(
[asyncio.create_task(worker(browser, i)) for i in range(100)],
return_when=asyncio.ALL_COMPLETED,
)
await browser.close()
start = time.perf_counter()
asyncio.run(main())
end = time.perf_counter()
print(end - start)
Time taken: 8.336619800000335 seconds
Aiomultiprocesses
async def run(playwright, url):
chromium = playwright.chromium # or "firefox" or "webkit".
browser = await chromium.launch(channel="chrome")
page = await browser.new_page()
await page.goto(url)
print(await page.title())
await browser.close()
async def mains(url):
async with async_playwright() as playwright:
return await run(playwright,url)
async def main():
urls = ["https://www.google.com/" for i in range(100)]
async with Pool() as pool:
async for result in pool.map(mains, urls):
None
if __name__ == '__main__':
# Python 3.7
start = time.perf_counter()
asyncio.run(main())
end = time.perf_counter()
print(end - start,end="")
Time taken: 17.115715199999613 seconds
I am unsure if I am doing this correctly, however these are my questions.
Am I doing it correctly? If not, how can I make it go better and go faster?
Should I just stick with AsyncIo?
If I were to scale up to thousands of requests, how should I do it with playwright or any other scraping libraries?
The text was updated successfully, but these errors were encountered:
decided to test out 2 sets of codes, each of them is tasked to run a hundred requests asynchronously and print out the website title for each request. After finishing all of the requests, a time will be printed out indicating how long the requests took.
AsyncIO
Time taken: 8.336619800000335 seconds
Aiomultiprocesses
Time taken: 17.115715199999613 seconds
I am unsure if I am doing this correctly, however these are my questions.
The text was updated successfully, but these errors were encountered: