crawlio-js is a Node.js SDK for interacting with the Crawlio web scraping and crawling API. It provides programmatic access to scraping, crawling, and batch processing endpoints with built-in error handling.
npm install crawlio.jsimport { Crawlio } from 'crawlio.js'
const client = new Crawlio({ apiKey: 'your-api-key' })
const result = await client.scrape({ url: 'https://example.com' })
console.log(result.html)Creates a new Crawlio client.
Options:
| Name | Type | Required | Description |
|---|---|---|---|
| apiKey | string |
✅ | Your Crawlio API key |
| baseUrl | string |
❌ | API base URL (default: https://crawlio.xyz) |
Scrapes a single page.
await client.scrape({ url: 'https://example.com' })ScrapeOptions:
| Name | Type | Required | Description |
|---|---|---|---|
| url | string |
✅ | Target URL |
| exclude | string[] |
✅ | CSS selectors to exclude |
| includeOnly | string[] |
❌ | CSS selectors to include |
| markdown | boolean |
❌ | Convert HTML to Markdown |
| returnUrls | boolean |
❌ | Return all discovered URLs |
| workflow | Workflow[] |
❌ | Custom workflow steps to execute |
| normalizeBase64 | boolean |
❌ | Normalize base64 content |
| cookies | CookiesInfo[] |
❌ | Cookies to include in the request |
| userAgent | string |
❌ | Custom User-Agent header for the request |
Initiates a site-wide crawl.
CrawlOptions:
| Name | Type | Required | Description |
|---|---|---|---|
| url | string |
✅ | Root URL to crawl |
| count | number |
✅ | Number of pages to crawl |
| sameSite | boolean |
❌ | Limit crawl to same domain |
| patterns | string[] |
❌ | URL patterns to match |
| exclude | string[] |
❌ | CSS selectors to exclude |
| includeOnly | string[] |
❌ | CSS selectors to include |
Checks the status of a crawl job.
Gets results from a completed crawl.
Performs a search on scraped content.
SearchOptions:
| Name | Type | Description |
|---|---|---|
| site | string |
Limit search to a specific domain |
Initiates scraping for multiple URLs in one request.
BatchScrapeOptions:
| Name | Type | Description |
|---|---|---|
| url | string[] |
List of URLs |
| options | Omit<ScrapeOptions, 'url'> |
Common options for all URLs |
Checks the status of a batch scrape job.
Fetches results from a completed batch scrape.
All Crawlio errors extend from CrawlioError. You can catch and inspect these for more context.
CrawlioErrorCrawlioRateLimitCrawlioLimitExceededCrawlioAuthenticationErrorCrawlioInternalServerErrorCrawlioFailureError
{
jobId: string
html: string
markdown: string
meta: Record<string, string>
urls?: string[]
url: string
}{
id: string
status: 'IN_QUEUE' | 'RUNNING' | 'LIMIT_EXCEEDED' | 'ERROR' | 'SUCCESS'
error: number
success: number
total: number
}{
name: string
value: string
path: string
expires?: number
httpOnly: boolean
secure: boolean
domain: string
sameSite: 'Strict' | 'Lax' | 'None'
}