Clarification on JsonXPathExtractionStrategy Schema and type Parameter #1664
Replies: 1 comment
-
|
The docs page at docs.crawl4ai.com/extraction/no-llm-strategies covers all the types but they're spread across examples rather than in one reference table. Here's a consolidated summary: Simple types:
Compound types:
Pipeline: {
"name": "price_number",
"selector": ".//span[@class='price']",
"type": ["text", "regex"],
"pattern": "\d+\.?\d*",
"group": 0
}Each field can also have:
For XPath selectors: {
"baseSelector": "//div[contains(@class, 'product-card')]",
"fields": [
{"name": "title", "selector": ".//h2", "type": "text"},
{"name": "link", "selector": ".//a", "type": "attribute", "attribute": "href"},
{"name": "tags", "type": "list", "selector": ".//span[@class='tag']",
"fields": [{"name": "tag", "type": "text"}]}
]
}You're right that a dedicated reference page for this would help — the info exists in the docs but it's not easy to find in one place. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello,
I'm currently using JsonXPathExtractionStrategy and have some questions regarding its schema definition:
The XPath expressions defined in the schema seem slightly different from standard XPath syntax. Could you clarify if there are any specific rules or limitations?
The type field in each schema entry is not fully documented. Could you provide a detailed explanation of the available types and how they affect data extraction?
A more comprehensive documentation or examples for schema usage would be very helpful.
Thank you for your time and support!
Beta Was this translation helpful? Give feedback.
All reactions