Skip to content

Add Web Crawler Multi-Agent Workflow (Crawler_Agent.yaml)#561

Open
Montanus-zyy wants to merge 1 commit intoOpenBMB:mainfrom
Montanus-zyy:main
Open

Add Web Crawler Multi-Agent Workflow (Crawler_Agent.yaml)#561
Montanus-zyy wants to merge 1 commit intoOpenBMB:mainfrom
Montanus-zyy:main

Conversation

@Montanus-zyy
Copy link

Description

Added a customized multi-agent workflow specifically designed for web scraping tasks (Crawler_Agent.yaml).

Features

This workflow introduces a 3-node architecture to improve code robustness and quality:

  1. CEO (node_ceo): Analyzes the user's scraping requirements and breaks them down into step-by-step plans.
  2. Engineer (node_engineer): Writes the Python crawler using requests and BeautifulSoup. The prompt strictly requires adding User-Agent headers (for basic anti-bot bypass) and try-except blocks for error handling.
  3. Reviewer (node_reviewer): Acts as QA to inspect the generated code for syntax errors and ensures the anti-scraping and exception-handling requirements are met before finalizing the script.

Motivation

Web scraping is a very common daily task for data analysis. The default configurations often produce fragile crawler code. This specialized workflow ensures the generated code is ready for real-world scenarios.

Copy link
Collaborator

@NA-Wen NA-Wen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the contribution!

However, this workflow currently does not seem to actually perform web crawling — it mainly describes a planning/coding/review pipeline without a concrete mechanism for fetching or parsing web pages. As a result, it may not function as a real crawler workflow yet.

Could you please revise this and include an actual web crawling step (e.g., fetching pages, parsing content, or integrating a crawling tool)? The current version feels too limited for a “web crawler” workflow.

Once this is improved, we would be happy to merge it soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants