You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
To send parameters like ignore_links=True to ParseNode when using SmartScraperGraph, you can configure the node_config dictionary with the desired parameters. However, in the ParseNodeDepthK class, the Html2TextTransformer is already set with ignore_links=False in the code, which means links are not ignored by default.
If you want to change this behavior, you would need to modify the code where Html2TextTransformer is instantiated to set ignore_links=True. Unfortunately, this parameter is hardcoded in the current implementation, so you would need to adjust the source code directly to achieve this behavior.
Here's an example of how you might configure other parameters using node_config:
node_config= {
"verbose": True, # Example of setting the verbose flag"parse_html": True,
"parse_urls": False,
"llm_model": "your_model_name",
"chunk_size": 1024
}
parse_node=ParseNodeDepthK(
input="your_input_expression",
output=["your_output_key"],
node_config=node_config
)
For the ignore_links parameter, you would need to modify the Html2TextTransformer instantiation in the code to set ignore_links=True directly [1][2].
I'm using SmartScraperGraph to scrape a url, extracting information from the html content.
while in scraped content, lots of urls returned, which I want to ignore.
I'm not sure how to send params to ParseNode, like ignore_links = True
found some code in ParseNode:
if self.parse_html:
docs_transformed = Html2TextTransformer(
ignore_links=False
).transform_documents(input_data[0])
docs_transformed = docs_transformed[0]
version info:
scrapegraphai 1.36.0
The text was updated successfully, but these errors were encountered: