diff --git a/docs/docs/integrations/providers/zenrows.ipynb b/docs/docs/integrations/providers/zenrows.ipynb new file mode 100644 index 0000000000000..cb8a5b59eecbf --- /dev/null +++ b/docs/docs/integrations/providers/zenrows.ipynb @@ -0,0 +1,73 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# ZenRows\n", + "\n", + "[ZenRows](https://www.zenrows.com/) is an enterprise-grade web scraping tool that provides advanced web data extraction capabilities at scale. ZenRows specializes in scraping modern websites, bypassing anti-bot systems, extracting structured data from any website, rendering JavaScript-heavy content, accessing geo-restricted websites, and more.\n", + "\n", + "[langchain-zenrows](https://pypi.org/project/langchain-zenrows/) provides tools that allow LLMs to access web data using ZenRows' powerful scraping infrastructure.\n", + "\n", + "## Installation and Setup\n", + "\n", + "```bash\n", + "pip install langchain-zenrows\n", + "```\n", + "\n", + "You'll need to set up your ZenRows API key:\n", + "\n", + "```python\n", + "import os\n", + "os.environ[\"ZENROWS_API_KEY\"] = \"your-api-key\"\n", + "```\n", + "\n", + "Or you can pass it directly when initializing tools:\n", + "\n", + "```python\n", + "from langchain_zenrows import ZenRowsUniversalScraper\n", + "zenrows_scraper_tool = ZenRowsUniversalScraper(zenrows_api_key=\"your-api-key\")\n", + "```\n", + "\n", + "## Tools\n", + "\n", + "### ZenRowsUniversalScraper\n", + "\n", + "The ZenRows integration provides comprehensive web scraping features:\n", + "\n", + "- **JavaScript Rendering**: Scrape modern SPAs and dynamic content\n", + "- **Anti-Bot Bypass**: Overcome sophisticated bot detection systems \n", + "- **Geo-Targeting**: Access region-specific content with 190+ countries\n", + "- **Multiple Output Formats**: HTML, Markdown, Plaintext, PDF, Screenshots\n", + "- **CSS Extraction**: Target specific data with CSS selectors\n", + "- **Structured Data Extraction**: Automatically extract emails, phone numbers, links, and more\n", + "- **Session Management**: Maintain consistent sessions across requests\n", + "- **Premium Proxies**: Residential IPs for maximum success rates\n", + "\n", + "See more in the [ZenRows tool documentation](/docs/integrations/tools/zenrows/)." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.0" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/docs/docs/integrations/tools/zenrows_universal_scraper.ipynb b/docs/docs/integrations/tools/zenrows_universal_scraper.ipynb new file mode 100644 index 0000000000000..6d7f8c5b6e7d2 --- /dev/null +++ b/docs/docs/integrations/tools/zenrows_universal_scraper.ipynb @@ -0,0 +1,266 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "a6f91f20", + "metadata": {}, + "source": [ + "# ZenRowsUniversalScraper\n", + "\n", + "[ZenRows](https://www.zenrows.com/) is an enterprise-grade web scraping tool that provides advanced web data extraction capabilities at scale. For more information about ZenRows and its Universal Scraper API, visit the [official documentation](https://docs.zenrows.com/universal-scraper-api/).\n", + "\n", + "This document provides a quick overview for getting started with ZenRowsUniversalScraper tool. For detailed documentation of all ZenRowsUniversalScraper features and configurations head to the [API reference](https://github.com/ZenRows-Hub/langchain-zenrows?tab=readme-ov-file#api-reference).\n", + "\n", + "## Overview\n", + "\n", + "### Integration details\n", + "\n", + "| Class | Package | JS support | Package latest |\n", + "| :--- | :--- | :---: | :---: |\n", + "| [ZenRowsUniversalScraper](https://pypi.org/project/langchain-zenrows/) | [langchain-zenrows](https://pypi.org/project/langchain-zenrows/) | ❌ | ![PyPI - Version](https://img.shields.io/pypi/v/langchain-zenrows?style=flat-square&label=%20) |\n", + "\n", + "### Tool features\n", + "\n", + "| Feature | Support |\n", + "| :--- | :---: |\n", + "| **JavaScript Rendering** | ✅ |\n", + "| **Anti-Bot Bypass** | ✅ |\n", + "| **Geo-Targeting** | ✅ |\n", + "| **Multiple Output Formats** | ✅ |\n", + "| **CSS Extraction** | ✅ |\n", + "| **Screenshot Capture** | ✅ |\n", + "| **Session Management** | ✅ |\n", + "| **Premium Proxies** | ✅ |\n", + "\n", + "## Setup\n", + "\n", + "Install the required dependencies." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f85b4089", + "metadata": {}, + "outputs": [], + "source": [ + "pip install langchain-zenrows" + ] + }, + { + "cell_type": "markdown", + "id": "b15e9266", + "metadata": {}, + "source": [ + "### Credentials\n", + "\n", + "You'll need a ZenRows API key to use this tool. You can sign up for free at [ZenRows](https://app.zenrows.com/register?prod=universal_scraper)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e0b178a2-8816-40ca-b57c-ccdd86dde9c9", + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "\n", + "# Set your ZenRows API key\n", + "os.environ[\"ZENROWS_API_KEY\"] = \"\"" + ] + }, + { + "cell_type": "markdown", + "id": "1c97218f-f366-479d-8bf7-fe9f2f6df73f", + "metadata": {}, + "source": [ + "## Instantiation\n", + "\n", + "Here's how to instantiate an instance of the ZenRowsUniversalScraper tool." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "8b3ddfe9-ca79-494c-a7ab-1f56d9407a64", + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "from langchain_zenrows import ZenRowsUniversalScraper\n", + "\n", + "# Set your ZenRows API key\n", + "os.environ[\"ZENROWS_API_KEY\"] = \"\"\n", + "\n", + "zenrows_scraper_tool = ZenRowsUniversalScraper()" + ] + }, + { + "cell_type": "markdown", + "id": "a8f2ec3f", + "metadata": {}, + "source": [ + "You can also pass the ZenRows API key when initializing the ZenRowsUniversalScraper tool." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "91150d3e", + "metadata": {}, + "outputs": [], + "source": [ + "from langchain_zenrows import ZenRowsUniversalScraper\n", + "\n", + "zenrows_scraper_tool = ZenRowsUniversalScraper(zenrows_api_key=\"your-api-key\")" + ] + }, + { + "cell_type": "markdown", + "id": "74147a1a", + "metadata": {}, + "source": [ + "## Invocation\n", + "\n", + "### Basic Usage\n", + "\n", + "The tool accepts a URL and various optional parameters to customize the scraping behavior:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "65310a8b-eb0c-4d9e-a618-4f4abe2414fc", + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "from langchain_zenrows import ZenRowsUniversalScraper\n", + "\n", + "# Set your ZenRows API key\n", + "os.environ[\"ZENROWS_API_KEY\"] = \"\"\n", + "\n", + "# Initialize the tool\n", + "zenrows_scraper_tool = ZenRowsUniversalScraper()\n", + "\n", + "# Scrape a simple webpage\n", + "result = zenrows_scraper_tool.invoke({\"url\": \"https://httpbin.io/html\"})\n", + "print(result)" + ] + }, + { + "cell_type": "markdown", + "id": "1f80ee4d", + "metadata": {}, + "source": [ + "### Advanced Usage with Parameters" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "advanced-invoke", + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "from langchain_zenrows import ZenRowsUniversalScraper\n", + "\n", + "# Set your ZenRows API key\n", + "os.environ[\"ZENROWS_API_KEY\"] = \"\"\n", + "\n", + "zenrows_scraper_tool = ZenRowsUniversalScraper()\n", + "\n", + "# Scrape with JavaScript rendering and premium proxies\n", + "result = zenrows_scraper_tool.invoke({\n", + " \"url\": \"https://www.scrapingcourse.com/ecommerce/\",\n", + " \"js_render\": True,\n", + " \"premium_proxy\": True,\n", + " \"proxy_country\": \"us\",\n", + " \"response_type\": \"markdown\",\n", + " \"wait\": 2000\n", + "})\n", + "\n", + "print(result)" + ] + }, + { + "cell_type": "markdown", + "id": "659f9fbd-6fcf-445f-aa8c-72d8e60154bd", + "metadata": {}, + "source": [ + "### Use within an agent" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "af3123ad-7a02-40e5-b58e-7d56e23e5830", + "metadata": {}, + "outputs": [], + "source": [ + "from langchain_zenrows import ZenRowsUniversalScraper\n", + "from langchain_openai import ChatOpenAI # or your preferred LLM\n", + "from langgraph.prebuilt import create_react_agent\n", + "import os\n", + "\n", + "# Set your ZenRows and OpenAI API keys\n", + "os.environ[\"ZENROWS_API_KEY\"] = \"\"\n", + "os.environ[\"OPENAI_API_KEY\"] = \"\"\n", + "\n", + "\n", + "# Initialize components\n", + "llm = ChatOpenAI(model=\"gpt-4o-mini\")\n", + "zenrows_scraper_tool = ZenRowsUniversalScraper()\n", + "\n", + "# Create agent\n", + "agent = create_react_agent(llm, [zenrows_scraper_tool])\n", + "\n", + "# Use the agent\n", + "result = agent.invoke(\n", + " {\n", + " \"messages\": \"Scrape https://news.ycombinator.com/ and list the top 3 stories with title, points, comments, username, and time.\"\n", + " }\n", + ")\n", + "\n", + "print(\"Agent Response:\")\n", + "for message in result[\"messages\"]:\n", + " print(f\"{message.content}\")" + ] + }, + { + "cell_type": "markdown", + "id": "4ac8146c", + "metadata": {}, + "source": [ + "## API reference\n", + "\n", + "For detailed documentation of all ZenRowsUniversalScraper features and configurations head to the [**ZenRowsUniversalScraper API reference**](https://github.com/ZenRows-Hub/langchain-zenrows).\n", + "\n", + "For comprehensive information about the underlying API parameters and capabilities, see the [ZenRows Universal API documentation](https://docs.zenrows.com/universal-scraper-api/api-reference)." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "poetry-venv-311", + "language": "python", + "name": "poetry-venv-311" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.9" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/libs/packages.yml b/libs/packages.yml index ae0346964a9cc..6e3816a45c2df 100644 --- a/libs/packages.yml +++ b/libs/packages.yml @@ -665,6 +665,9 @@ packages: - name: langchain-featherless-ai repo: featherlessai/langchain-featherless-ai path: . +- name: langchain-zenrows + repo: ZenRows-Hub/langchain-zenrows + path: . - name: langchain-nebius path: libs/nebius - repo: nebius/langchain-nebius + repo: nebius/langchain-nebius \ No newline at end of file