Skip to content

allignement #993

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 24 commits into from
Jun 24, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
c1cead0
Merge pull request #982 from ScrapeGraphAI/pre/beta
VinciGit00 Jun 6, 2025
3322f9d
Update README.md
VinciGit00 Jun 6, 2025
30e6b59
ci(release): 1.54.0 [skip ci]
semantic-release-bot Jun 6, 2025
e846a14
fix: bug on generate answer node
VinciGit00 Jun 6, 2025
38b3997
ci(release): 1.54.1 [skip ci]
semantic-release-bot Jun 6, 2025
cd29791
feat: add adv
VinciGit00 Jun 7, 2025
8c54162
feat: update logs
VinciGit00 Jun 7, 2025
27d5096
Merge pull request #983 from ScrapeGraphAI/add-adv
VinciGit00 Jun 7, 2025
17d9a72
ci(release): 1.55.0 [skip ci]
semantic-release-bot Jun 7, 2025
2a73821
Update README.md
VinciGit00 Jun 9, 2025
94e9ebd
feat: add scrapegraphai integration
VinciGit00 Jun 13, 2025
3f64f88
ci(release): 1.56.0 [skip ci]
semantic-release-bot Jun 13, 2025
7340375
feat: add markdownify endpoint
VinciGit00 Jun 13, 2025
e4ba4e2
Merge branch 'main' of https://github.com/ScrapeGraphAI/Scrapegraph-ai
VinciGit00 Jun 13, 2025
9a2c02d
ci(release): 1.57.0 [skip ci]
semantic-release-bot Jun 13, 2025
1d1e4db
Update README.md
VinciGit00 Jun 16, 2025
07dec35
docs: add links to other language versions of README
dowithless Jun 16, 2025
273c7d1
Merge pull request #987 from dowithless/patch-1
VinciGit00 Jun 16, 2025
0c2481f
feat: add new oss link
VinciGit00 Jun 21, 2025
aa72708
Merge branch 'main' of https://github.com/ScrapeGraphAI/Scrapegraph-ai
VinciGit00 Jun 21, 2025
45ad464
ci(release): 1.58.0 [skip ci]
semantic-release-bot Jun 21, 2025
288c69a
feat: removed sposnsors
VinciGit00 Jun 24, 2025
3f8bc88
Merge branch 'main' of https://github.com/ScrapeGraphAI/Scrapegraph-ai
VinciGit00 Jun 24, 2025
6989e1a
ci(release): 1.59.0 [skip ci]
semantic-release-bot Jun 24, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
72 changes: 72 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,75 @@
## [1.59.0](https://github.com/ScrapeGraphAI/Scrapegraph-ai/compare/v1.58.0...v1.59.0) (2025-06-24)


### Features

* removed sposnsors ([288c69a](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/288c69a862f34b999db476e669ff97c00afacde3))

## [1.58.0](https://github.com/ScrapeGraphAI/Scrapegraph-ai/compare/v1.57.0...v1.58.0) (2025-06-21)


### Features

* add new oss link ([0c2481f](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/0c2481fffebca355e542ae420ee1bf4cade8e5e3))


### Docs

* add links to other language versions of README ([07dec35](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/07dec35f1bf95842ee55b17796bb45f2db0f44b3))

## [1.57.0](https://github.com/ScrapeGraphAI/Scrapegraph-ai/compare/v1.56.0...v1.57.0) (2025-06-13)


### Features

* add markdownify endpoint ([7340375](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/73403755da1e4c3065e91d834c59f6d8c1825763))

## [1.56.0](https://github.com/ScrapeGraphAI/Scrapegraph-ai/compare/v1.55.0...v1.56.0) (2025-06-13)


### Features

* add scrapegraphai integration ([94e9ebd](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/94e9ebd28061f8313bb23074b4db3406cf4db0c9))

## [1.55.0](https://github.com/ScrapeGraphAI/Scrapegraph-ai/compare/v1.54.1...v1.55.0) (2025-06-07)


### Features

* add adv ([cd29791](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/cd29791894325c54f1dec1d2a5f6456800beb63e))
* update logs ([8c54162](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/8c541620879570c46f32708c7e488e9a4ca0ea3e))

## [1.54.1](https://github.com/ScrapeGraphAI/Scrapegraph-ai/compare/v1.54.0...v1.54.1) (2025-06-06)


### Bug Fixes

* bug on generate answer node ([e846a14](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/e846a1415506a58f7bc8b76ac56ba0b6413178ba))

## [1.54.0](https://github.com/ScrapeGraphAI/Scrapegraph-ai/compare/v1.53.0...v1.54.0) (2025-06-06)


### Features

* add grok integration ([0c476a4](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/0c476a4a7bbbec3883f505cd47bcffdcd2d9e5fd))


### Bug Fixes

* grok integration and add new grok models ([3f18272](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/3f1827274c60a2729233577666d2fa446c48c4ba))


### chore

* enhanced a readme ([68bb34c](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/68bb34cc5e63b8a1d5acc61b9b61f9ea716a2a51))


### CI

* **release:** 1.52.0-beta.1 [skip ci] ([7adb0f1](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/7adb0f1df1efc4e6ada1134f6e53e4d6b072a608))
* **release:** 1.52.0-beta.2 [skip ci] ([386b46a](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/386b46a8692c8c18000bb071fc8f312adc3ad05e))
* **release:** 1.54.0-beta.1 [skip ci] ([77d4432](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/77d44321a1d41e10ac6aa13b526a49e718bd7c5d))

## [1.54.0-beta.1](https://github.com/ScrapeGraphAI/Scrapegraph-ai/compare/v1.53.0...v1.54.0-beta.1) (2025-06-06)


Expand Down
26 changes: 8 additions & 18 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
## 🚀 **Looking for an even faster and simpler way to scrape at scale (only 5 lines of code)? ** Check out our enhanced version at [**ScrapeGraphAI.com**](https://scrapegraphai.com/?utm_source=github&utm_medium=readme&utm_campaign=oss_cta&ut#m_content=top_banner)! 🚀
## 🚀 **Looking for an even faster and simpler way to scrape at scale (only 5 lines of code)?** Check out our enhanced version at [**ScrapeGraphAI.com**](https://scrapegraphai.com/?utm_source=github&utm_medium=readme&utm_campaign=oss_cta&ut#m_content=top_banner)! 🚀

---

Expand All @@ -7,6 +7,10 @@
[English](https://github.com/VinciGit00/Scrapegraph-ai/blob/main/README.md) | [中文](https://github.com/VinciGit00/Scrapegraph-ai/blob/main/docs/chinese.md) | [日本語](https://github.com/VinciGit00/Scrapegraph-ai/blob/main/docs/japanese.md)
| [한국어](https://github.com/VinciGit00/Scrapegraph-ai/blob/main/docs/korean.md)
| [Русский](https://github.com/VinciGit00/Scrapegraph-ai/blob/main/docs/russian.md) | [Türkçe](https://github.com/VinciGit00/Scrapegraph-ai/blob/main/docs/turkish.md)
| [Deutsch](https://www.readme-i18n.com/ScrapeGraphAI/Scrapegraph-ai?lang=de)
| [Español](https://www.readme-i18n.com/ScrapeGraphAI/Scrapegraph-ai?lang=es)
| [français](https://www.readme-i18n.com/ScrapeGraphAI/Scrapegraph-ai?lang=fr)
| [Português](https://www.readme-i18n.com/ScrapeGraphAI/Scrapegraph-ai?lang=pt)


[![Downloads](https://img.shields.io/pepy/dt/scrapegraphai?style=for-the-badge)](https://pepy.tech/project/scrapegraphai)
Expand Down Expand Up @@ -39,7 +43,7 @@ You can find more informations at the following [link](https://scrapegraphai.com
- **API**: [Documentation](https://docs.scrapegraphai.com/introduction)
- **SDKs**: [Python](https://docs.scrapegraphai.com/sdks/python), [Node](https://docs.scrapegraphai.com/sdks/javascript)
- **LLM Frameworks**: [Langchain](https://docs.scrapegraphai.com/integrations/langchain), [Llama Index](https://docs.scrapegraphai.com/integrations/llamaindex), [Crew.ai](https://docs.scrapegraphai.com/integrations/crewai), [CamelAI](https://github.com/camel-ai/camel)
- **Low-code Frameworks**: [Pipedream](https://pipedream.com/apps/scrapegraphai), [Bubble](https://bubble.io/plugin/scrapegraphai-1745408893195x213542371433906180), [Zapier](https://zapier.com/apps/scrapegraphai/integrations), [n8n](http://localhost:5001/dashboard), [LangFlow](https://www.langflow.org)
- **Low-code Frameworks**: [Pipedream](https://pipedream.com/apps/scrapegraphai), [Bubble](https://bubble.io/plugin/scrapegraphai-1745408893195x213542371433906180), [Zapier](https://zapier.com/apps/scrapegraphai/integrations), [n8n](http://localhost:5001/dashboard), [LangFlow](https://www.langflow.org), [Dify](https://dify.ai)
- **MCP server**: [Link](https://smithery.ai/server/@ScrapeGraphAI/scrapegraph-mcp)

## 🚀 Quick install
Expand Down Expand Up @@ -183,22 +187,6 @@ We offer SDKs in both Python and Node.js, making it easy to integrate into your

The Official API Documentation can be found [here](https://docs.scrapegraphai.com/).

## 🏆 Sponsors
<div style="text-align: center;">
<a href="https://2ly.link/1zaXG">
<img src="https://raw.githubusercontent.com/VinciGit00/Scrapegraph-ai/main/docs/assets/browserbase_logo.png" alt="Browserbase" style="width: 10%;">
</a>
<a href="https://2ly.link/1zNiz">
<img src="https://raw.githubusercontent.com/VinciGit00/Scrapegraph-ai/main/docs/assets/serp_api_logo.png" alt="SerpAPI" style="width: 10%;">
</a>
<a href="https://2ly.link/1zNj1">
<img src="https://raw.githubusercontent.com/VinciGit00/Scrapegraph-ai/main/docs/assets/transparent_stat.png" alt="Stats" style="width: 15%;">
</a>
<a href="https://scrape.do">
<img src="https://raw.githubusercontent.com/VinciGit00/Scrapegraph-ai/main/docs/assets/scrapedo.png" alt="Stats" style="width: 11%;">
</a>
</div>

## 📈 Telemetry
We collect anonymous usage metrics to enhance our package's quality and user experience. The data helps us prioritize improvements and ensure compatibility. If you wish to opt-out, set the environment variable SCRAPEGRAPHAI_TELEMETRY_ENABLED=false. For more information, please refer to the documentation [here](https://scrapegraph-ai.readthedocs.io/en/latest/scrapers/telemetry.html).

Expand Down Expand Up @@ -235,3 +223,5 @@ ScrapeGraphAI is licensed under the MIT License. See the [LICENSE](https://githu
- ScrapeGraphAI is meant to be used for data exploration and research purposes only. We are not responsible for any misuse of the library.

Made with ❤️ by [ScrapeGraph AI](https://scrapegraphai.com)

[Scarf tracking](https://static.scarf.sh/a.png?x-pxid=102d4b8c-cd6a-4b9e-9a16-d6d141b9212d)
Binary file removed docs/assets/scrapedo.png
Binary file not shown.
Binary file removed docs/assets/scrapeless.png
Binary file not shown.
Binary file removed docs/assets/serp_api_logo.png
Binary file not shown.
Binary file removed docs/assets/transparent_stat.png
Binary file not shown.
1 change: 1 addition & 0 deletions examples/markdownify/.env.example
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
SCRAPEGRAPH_API_KEY=your SCRAPEGRAPH_API_KEY
35 changes: 35 additions & 0 deletions examples/markdownify/markdownify_scrapegraphai.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
"""
Example script demonstrating the markdownify functionality
"""

import os
from dotenv import load_dotenv
from scrapegraph_py import Client
from scrapegraph_py.logger import sgai_logger

def main():
# Load environment variables
load_dotenv()

# Set up logging
sgai_logger.set_logging(level="INFO")

# Initialize the client
api_key = os.getenv("SCRAPEGRAPH_API_KEY")
if not api_key:
raise ValueError("SCRAPEGRAPH_API_KEY environment variable not found")
sgai_client = Client(api_key=api_key)

# Example 1: Convert a website to Markdown
print("Example 1: Converting website to Markdown")
print("-" * 50)
response = sgai_client.markdownify(
website_url="https://example.com"
)
print("Markdown output:")
print(response["result"]) # Access the result key from the dictionary
print("\nMetadata:")
print(response.get("metadata", {})) # Use get() with default value
print("\n" + "=" * 50 + "\n")
if __name__ == "__main__":
main()
75 changes: 75 additions & 0 deletions examples/markdownify/readme.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
# Markdownify Graph Example

This example demonstrates how to use the Markdownify graph to convert HTML content to Markdown format.

## Features

- Convert HTML content to clean, readable Markdown
- Support for both URL and direct HTML input
- Maintains formatting and structure of the original content
- Handles complex HTML elements and nested structures

## Usage

```python
from scrapegraphai import Client
from scrapegraphai.logger import sgai_logger

# Set up logging
sgai_logger.set_logging(level="INFO")

# Initialize the client
sgai_client = Client(api_key="your-api-key")

# Example 1: Convert a website to Markdown
response = sgai_client.markdownify(
website_url="https://example.com"
)
print(response.markdown)

# Example 2: Convert HTML content directly
html_content = """
<div>
<h1>Hello World</h1>
<p>This is a <strong>test</strong> paragraph.</p>
</div>
"""
response = sgai_client.markdownify(
html_content=html_content
)
print(response.markdown)
```

## Parameters

The `markdownify` method accepts the following parameters:

- `website_url` (str, optional): The URL of the website to convert to Markdown
- `html_content` (str, optional): Direct HTML content to convert to Markdown

Note: You must provide either `website_url` or `html_content`, but not both.

## Response

The response object contains:

- `markdown` (str): The converted Markdown content
- `metadata` (dict): Additional information about the conversion process

## Error Handling

The graph handles various edge cases:

- Invalid URLs
- Malformed HTML
- Network errors
- Timeout issues

If an error occurs, it will be logged and raised with appropriate error messages.

## Best Practices

1. Always provide a valid URL or well-formed HTML content
2. Use appropriate logging levels for debugging
3. Handle the response appropriately in your application
4. Consider rate limiting for large-scale conversions
1 change: 1 addition & 0 deletions examples/search_graph/scrapegraphai/.env.example
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
SCRAPEGRAPH_API_KEY=your SCRAPEGRAPH_API_KEY
Empty file.
83 changes: 83 additions & 0 deletions examples/search_graph/scrapegraphai/searchscraper_scrapegraphai.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
"""
Example implementation of search-based scraping using Scrapegraph AI.
This example demonstrates how to use the searchscraper to extract information from the web.
"""

import os
from typing import Dict, Any
from dotenv import load_dotenv
from scrapegraph_py import Client
from scrapegraph_py.logger import sgai_logger

def format_response(response: Dict[str, Any]) -> None:
"""
Format and print the search response in a readable way.

Args:
response (Dict[str, Any]): The response from the search API
"""
print("\n" + "="*50)
print("SEARCH RESULTS")
print("="*50)

# Print request ID
print(f"\nRequest ID: {response['request_id']}")

# Print number of sources
urls = response.get('reference_urls', [])
print(f"\nSources Processed: {len(urls)}")

# Print the extracted information
print("\nExtracted Information:")
print("-"*30)
if isinstance(response['result'], dict):
for key, value in response['result'].items():
print(f"\n{key.upper()}:")
if isinstance(value, list):
for item in value:
print(f" • {item}")
else:
print(f" {value}")
else:
print(response['result'])

# Print source URLs
if urls:
print("\nSources:")
print("-"*30)
for i, url in enumerate(urls, 1):
print(f"{i}. {url}")
print("\n" + "="*50)

def main():
# Load environment variables
load_dotenv()

# Get API key
api_key = os.getenv("SCRAPEGRAPH_API_KEY")
if not api_key:
raise ValueError("SCRAPEGRAPH_API_KEY not found in environment variables")

# Configure logging
sgai_logger.set_logging(level="INFO")

# Initialize client
sgai_client = Client(api_key=api_key)

try:
# Basic search scraper example
print("\nSearching for information...")

search_response = sgai_client.searchscraper(
user_prompt="Extract webpage information"
)
format_response(search_response)

except Exception as e:
print(f"\nError occurred: {str(e)}")
finally:
# Always close the client
sgai_client.close()

if __name__ == "__main__":
main()
30 changes: 0 additions & 30 deletions examples/smart_scraper_graph/README.md

This file was deleted.

4 changes: 2 additions & 2 deletions examples/smart_scraper_graph/ollama/smart_scraper_ollama.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@

graph_config = {
"llm": {
"model": "ollama/llama3.2:3b",
"model": "ollama/llama3.2",
"temperature": 0,
# "base_url": "http://localhost:11434", # set ollama URL arbitrarily
"model_tokens": 4096,
Expand All @@ -24,7 +24,7 @@
# Create the SmartScraperGraph instance and run it
# ************************************************
smart_scraper_graph = SmartScraperGraph(
prompt="Find some information about what does the company do and the list of founders.",
prompt="Find some information about the founders.",
source="https://scrapegraphai.com/",
config=graph_config,
)
Expand Down
1 change: 1 addition & 0 deletions examples/smart_scraper_graph/scrapegraphai/.env.example
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
SCRAPEGRAPH_API_KEY=your SCRAPEGRAPH_API_KEY
Loading