Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to send params to ParseNode when using SmartScraperGraph? #902

Open
reneix opened this issue Jan 21, 2025 · 1 comment
Open

how to send params to ParseNode when using SmartScraperGraph? #902

reneix opened this issue Jan 21, 2025 · 1 comment
Labels
question Further information is requested

Comments

@reneix
Copy link

reneix commented Jan 21, 2025

I'm using SmartScraperGraph to scrape a url, extracting information from the html content.

while in scraped content, lots of urls returned, which I want to ignore.

I'm not sure how to send params to ParseNode, like ignore_links = True

found some code in ParseNode:

if self.parse_html:
docs_transformed = Html2TextTransformer(
ignore_links=False
).transform_documents(input_data[0])
docs_transformed = docs_transformed[0]

version info:
scrapegraphai 1.36.0

Copy link

dosubot bot commented Jan 21, 2025

To send parameters like ignore_links=True to ParseNode when using SmartScraperGraph, you can configure the node_config dictionary with the desired parameters. However, in the ParseNodeDepthK class, the Html2TextTransformer is already set with ignore_links=False in the code, which means links are not ignored by default.

If you want to change this behavior, you would need to modify the code where Html2TextTransformer is instantiated to set ignore_links=True. Unfortunately, this parameter is hardcoded in the current implementation, so you would need to adjust the source code directly to achieve this behavior.

Here's an example of how you might configure other parameters using node_config:

node_config = {
    "verbose": True,  # Example of setting the verbose flag
    "parse_html": True,
    "parse_urls": False,
    "llm_model": "your_model_name",
    "chunk_size": 1024
}

parse_node = ParseNodeDepthK(
    input="your_input_expression",
    output=["your_output_key"],
    node_config=node_config
)

For the ignore_links parameter, you would need to modify the Html2TextTransformer instantiation in the code to set ignore_links=True directly [1][2].

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other

@dosubot dosubot bot added the question Further information is requested label Jan 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

1 participant