Skip to content

docs: add ZenRows provider and tool integration #31648

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
73 changes: 73 additions & 0 deletions docs/docs/integrations/providers/zenrows.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# ZenRows\n",
"\n",
"[ZenRows](https://www.zenrows.com/) is an enterprise-grade web scraping tool that provides advanced web data extraction capabilities at scale. ZenRows specializes in scraping modern websites, bypassing anti-bot systems, extracting structured data from any website, rendering JavaScript-heavy content, accessing geo-restricted websites, and more.\n",
"\n",
"[langchain-zenrows](https://pypi.org/project/langchain-zenrows/) provides tools that allow LLMs to access web data using ZenRows' powerful scraping infrastructure.\n",
"\n",
"## Installation and Setup\n",
"\n",
"```bash\n",
"pip install langchain-zenrows\n",
"```\n",
"\n",
"You'll need to set up your ZenRows API key:\n",
"\n",
"```python\n",
"import os\n",
"os.environ[\"ZENROWS_API_KEY\"] = \"your-api-key\"\n",
"```\n",
"\n",
"Or you can pass it directly when initializing tools:\n",
"\n",
"```python\n",
"from langchain_zenrows import ZenRowsUniversalScraper\n",
"zenrows_scraper_tool = ZenRowsUniversalScraper(zenrows_api_key=\"your-api-key\")\n",
"```\n",
"\n",
"## Tools\n",
"\n",
"### ZenRowsUniversalScraper\n",
"\n",
"The ZenRows integration provides comprehensive web scraping features:\n",
"\n",
"- **JavaScript Rendering**: Scrape modern SPAs and dynamic content\n",
"- **Anti-Bot Bypass**: Overcome sophisticated bot detection systems \n",
"- **Geo-Targeting**: Access region-specific content with 190+ countries\n",
"- **Multiple Output Formats**: HTML, Markdown, Plaintext, PDF, Screenshots\n",
"- **CSS Extraction**: Target specific data with CSS selectors\n",
"- **Structured Data Extraction**: Automatically extract emails, phone numbers, links, and more\n",
"- **Session Management**: Maintain consistent sessions across requests\n",
"- **Premium Proxies**: Residential IPs for maximum success rates\n",
"\n",
"See more in the [ZenRows tool documentation](/docs/integrations/tools/zenrows/)."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.0"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
266 changes: 266 additions & 0 deletions docs/docs/integrations/tools/zenrows_universal_scraper.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,266 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "a6f91f20",
"metadata": {},
"source": [
"# ZenRowsUniversalScraper\n",
"\n",
"[ZenRows](https://www.zenrows.com/) is an enterprise-grade web scraping tool that provides advanced web data extraction capabilities at scale. For more information about ZenRows and its Universal Scraper API, visit the [official documentation](https://docs.zenrows.com/universal-scraper-api/).\n",
"\n",
"This document provides a quick overview for getting started with ZenRowsUniversalScraper tool. For detailed documentation of all ZenRowsUniversalScraper features and configurations head to the [API reference](https://github.com/ZenRows-Hub/langchain-zenrows?tab=readme-ov-file#api-reference).\n",
"\n",
"## Overview\n",
"\n",
"### Integration details\n",
"\n",
"| Class | Package | JS support | Package latest |\n",
"| :--- | :--- | :---: | :---: |\n",
"| [ZenRowsUniversalScraper](https://pypi.org/project/langchain-zenrows/) | [langchain-zenrows](https://pypi.org/project/langchain-zenrows/) | ❌ | ![PyPI - Version](https://img.shields.io/pypi/v/langchain-zenrows?style=flat-square&label=%20) |\n",
"\n",
"### Tool features\n",
"\n",
"| Feature | Support |\n",
"| :--- | :---: |\n",
"| **JavaScript Rendering** | ✅ |\n",
"| **Anti-Bot Bypass** | ✅ |\n",
"| **Geo-Targeting** | ✅ |\n",
"| **Multiple Output Formats** | ✅ |\n",
"| **CSS Extraction** | ✅ |\n",
"| **Screenshot Capture** | ✅ |\n",
"| **Session Management** | ✅ |\n",
"| **Premium Proxies** | ✅ |\n",
"\n",
"## Setup\n",
"\n",
"Install the required dependencies."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f85b4089",
"metadata": {},
"outputs": [],
"source": [
"pip install langchain-zenrows"
]
},
{
"cell_type": "markdown",
"id": "b15e9266",
"metadata": {},
"source": [
"### Credentials\n",
"\n",
"You'll need a ZenRows API key to use this tool. You can sign up for free at [ZenRows](https://app.zenrows.com/register?prod=universal_scraper)."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e0b178a2-8816-40ca-b57c-ccdd86dde9c9",
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"\n",
"# Set your ZenRows API key\n",
"os.environ[\"ZENROWS_API_KEY\"] = \"<YOUR_ZENROWS_API_KEY>\""
]
},
{
"cell_type": "markdown",
"id": "1c97218f-f366-479d-8bf7-fe9f2f6df73f",
"metadata": {},
"source": [
"## Instantiation\n",
"\n",
"Here's how to instantiate an instance of the ZenRowsUniversalScraper tool."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8b3ddfe9-ca79-494c-a7ab-1f56d9407a64",
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"from langchain_zenrows import ZenRowsUniversalScraper\n",
"\n",
"# Set your ZenRows API key\n",
"os.environ[\"ZENROWS_API_KEY\"] = \"<YOUR_ZENROWS_API_KEY>\"\n",
"\n",
"zenrows_scraper_tool = ZenRowsUniversalScraper()"
]
},
{
"cell_type": "markdown",
"id": "a8f2ec3f",
"metadata": {},
"source": [
"You can also pass the ZenRows API key when initializing the ZenRowsUniversalScraper tool."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "91150d3e",
"metadata": {},
"outputs": [],
"source": [
"from langchain_zenrows import ZenRowsUniversalScraper\n",
"\n",
"zenrows_scraper_tool = ZenRowsUniversalScraper(zenrows_api_key=\"your-api-key\")"
]
},
{
"cell_type": "markdown",
"id": "74147a1a",
"metadata": {},
"source": [
"## Invocation\n",
"\n",
"### Basic Usage\n",
"\n",
"The tool accepts a URL and various optional parameters to customize the scraping behavior:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "65310a8b-eb0c-4d9e-a618-4f4abe2414fc",
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"from langchain_zenrows import ZenRowsUniversalScraper\n",
"\n",
"# Set your ZenRows API key\n",
"os.environ[\"ZENROWS_API_KEY\"] = \"<YOUR_ZENROWS_API_KEY>\"\n",
"\n",
"# Initialize the tool\n",
"zenrows_scraper_tool = ZenRowsUniversalScraper()\n",
"\n",
"# Scrape a simple webpage\n",
"result = zenrows_scraper_tool.invoke({\"url\": \"https://httpbin.io/html\"})\n",
"print(result)"
]
},
{
"cell_type": "markdown",
"id": "1f80ee4d",
"metadata": {},
"source": [
"### Advanced Usage with Parameters"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "advanced-invoke",
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"from langchain_zenrows import ZenRowsUniversalScraper\n",
"\n",
"# Set your ZenRows API key\n",
"os.environ[\"ZENROWS_API_KEY\"] = \"<YOUR_ZENROWS_API_KEY>\"\n",
"\n",
"zenrows_scraper_tool = ZenRowsUniversalScraper()\n",
"\n",
"# Scrape with JavaScript rendering and premium proxies\n",
"result = zenrows_scraper_tool.invoke({\n",
" \"url\": \"https://www.scrapingcourse.com/ecommerce/\",\n",
" \"js_render\": True,\n",
" \"premium_proxy\": True,\n",
" \"proxy_country\": \"us\",\n",
" \"response_type\": \"markdown\",\n",
" \"wait\": 2000\n",
"})\n",
"\n",
"print(result)"
]
},
{
"cell_type": "markdown",
"id": "659f9fbd-6fcf-445f-aa8c-72d8e60154bd",
"metadata": {},
"source": [
"### Use within an agent"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "af3123ad-7a02-40e5-b58e-7d56e23e5830",
"metadata": {},
"outputs": [],
"source": [
"from langchain_zenrows import ZenRowsUniversalScraper\n",
"from langchain_openai import ChatOpenAI # or your preferred LLM\n",
"from langgraph.prebuilt import create_react_agent\n",
"import os\n",
"\n",
"# Set your ZenRows and OpenAI API keys\n",
"os.environ[\"ZENROWS_API_KEY\"] = \"<YOUR_ZENROWS_API_KEY>\"\n",
"os.environ[\"OPENAI_API_KEY\"] = \"<YOUR_OPEN_AI_API_KEY>\"\n",
"\n",
"\n",
"# Initialize components\n",
"llm = ChatOpenAI(model=\"gpt-4o-mini\")\n",
"zenrows_scraper_tool = ZenRowsUniversalScraper()\n",
"\n",
"# Create agent\n",
"agent = create_react_agent(llm, [zenrows_scraper_tool])\n",
"\n",
"# Use the agent\n",
"result = agent.invoke(\n",
" {\n",
" \"messages\": \"Scrape https://news.ycombinator.com/ and list the top 3 stories with title, points, comments, username, and time.\"\n",
" }\n",
")\n",
"\n",
"print(\"Agent Response:\")\n",
"for message in result[\"messages\"]:\n",
" print(f\"{message.content}\")"
]
},
{
"cell_type": "markdown",
"id": "4ac8146c",
"metadata": {},
"source": [
"## API reference\n",
"\n",
"For detailed documentation of all ZenRowsUniversalScraper features and configurations head to the [**ZenRowsUniversalScraper API reference**](https://github.com/ZenRows-Hub/langchain-zenrows).\n",
"\n",
"For comprehensive information about the underlying API parameters and capabilities, see the [ZenRows Universal API documentation](https://docs.zenrows.com/universal-scraper-api/api-reference)."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "poetry-venv-311",
"language": "python",
"name": "poetry-venv-311"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.9"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
5 changes: 4 additions & 1 deletion libs/packages.yml
Original file line number Diff line number Diff line change
Expand Up @@ -665,6 +665,9 @@ packages:
- name: langchain-featherless-ai
repo: featherlessai/langchain-featherless-ai
path: .
- name: langchain-zenrows
repo: ZenRows-Hub/langchain-zenrows
path: .
- name: langchain-nebius
path: libs/nebius
repo: nebius/langchain-nebius
repo: nebius/langchain-nebius
Loading