Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
25bddb2
ci: test on node 22 and 24
B4nan May 17, 2025
e4b7f69
refactor: convert to native ESM
B4nan May 19, 2025
2436f7a
refactor: remove deprecated crawler options
B4nan May 20, 2025
dcf4b7a
refactor: make crawling context strict and remove the error fallback
B4nan May 20, 2025
418d546
refactor: remove `additionalBlockedStatusCodes` parameter of `Session…
B4nan May 20, 2025
a167817
refactor: remove `additionalBlockedStatusCodes` parameter of `Session…
B4nan May 20, 2025
5af07db
chore: skip docker image builds for v4
B4nan May 20, 2025
aaf3f05
chore: use `v4` dist tag
B4nan May 20, 2025
077d287
chore: run tests on v4 branch
B4nan May 20, 2025
85db2e4
chore: fix build
B4nan May 20, 2025
b18810e
chore: fix v4 publishing
B4nan May 20, 2025
69cbde6
chore: use node 22 in e2e tests and project templates
B4nan May 21, 2025
b78ee69
chore: use node 22 in e2e tests and project templates
B4nan May 21, 2025
fcc9d09
chore: improve types to get rid of some `as any`
B4nan Jun 4, 2025
c9e0b2c
chore: remove some deadcode
B4nan Jun 4, 2025
ae9a9d6
chore: bump a few more dependencies
B4nan Jun 11, 2025
8eb21a6
fix CLI
B4nan Jun 11, 2025
4a73c22
fix CLI 2
B4nan Jun 11, 2025
5bde5f7
fix: remove old system info implementation
B4nan Jun 11, 2025
0f707fe
chore: replace `lodash.isequal` with `util.isDeepStrictEqual`
B4nan Jun 17, 2025
0cf2273
Add tests and install dependencies
anghel9 Jul 21, 2025
ac6cfb6
updated test files
anghel9 Jul 21, 2025
0ad2812
test update
anghel9 Jul 22, 2025
40af642
rough draft solution for refactoring timeouts within browser crawler.…
anghel9 Aug 5, 2025
f30fbe3
Added functionality in BasicCrawler that allows the wrapper to not ca…
anghel9 Aug 6, 2025
d57c84d
Modified browserCrawler to include missing update lines and made a ne…
anghel9 Aug 7, 2025
b03648c
Included navigation hooks into navigation timeout
ezequiel38 Aug 8, 2025
334ddb6
m
anghel9 Aug 8, 2025
79885dc
m
anghel9 Aug 8, 2025
8b8a4ff
Added comments and removed some comments
ezequiel38 Aug 11, 2025
5fc3334
Added comments and removed some comments
ezequiel38 Aug 11, 2025
62bda85
removed test logs
ezequiel38 Aug 11, 2025
391f8aa
resolving merge conflict
ezequiel38 Aug 11, 2025
4def10c
changed crawler utils back to original after over-looking simple fix
anghel9 Aug 11, 2025
2b74503
changed timeout message and comment
anghel9 Aug 11, 2025
0d7afcf
Update package.json
anghel9 Aug 13, 2025
40b51c8
Made _handleNavgiations more readable and made timeout and manual-tim…
anghel9 Aug 14, 2025
da4e738
Made _handleNavgiations more readable and made timeout and manual-tim…
anghel9 Aug 14, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
4 changes: 2 additions & 2 deletions .github/workflows/docs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,10 +19,10 @@ jobs:
steps:
- uses: actions/checkout@v4

- name: Use Node.js 20
- name: Use Node.js 22
uses: actions/setup-node@v4
with:
node-version: 20
node-version: 22

- name: Enable corepack
run: |
Expand Down
14 changes: 7 additions & 7 deletions .github/workflows/release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ jobs:
matrix:
# We don't test on Windows as the tests are flaky
os: [ ubuntu-22.04 ]
node-version: [ 16, 18, 20, 22 ]
node-version: [ 22, 24 ]

runs-on: ${{ matrix.os }}

Expand Down Expand Up @@ -94,17 +94,17 @@ jobs:
token: ${{ secrets.APIFY_SERVICE_ACCOUNT_GITHUB_TOKEN }}
fetch-depth: 0

- name: Use Node.js 20
- name: Use Node.js 22
uses: actions/setup-node@v4
with:
node-version: 20
node-version: 22

- name: Enable corepack
run: |
corepack enable
corepack prepare yarn@stable --activate

- name: Activate cache for Node.js 20
- name: Activate cache for Node.js 22
uses: actions/setup-node@v4
with:
cache: 'yarn'
Expand Down Expand Up @@ -183,10 +183,10 @@ jobs:
token: ${{ secrets.APIFY_SERVICE_ACCOUNT_GITHUB_TOKEN }}
fetch-depth: 0

- name: Use Node.js 20
- name: Use Node.js 22
uses: actions/setup-node@v4
with:
node-version: 20
node-version: 22

- name: Install jq
run: sudo apt-get install jq
Expand All @@ -196,7 +196,7 @@ jobs:
corepack enable
corepack prepare yarn@stable --activate

- name: Activate cache for Node.js 20
- name: Activate cache for Node.js 22
uses: actions/setup-node@v4
with:
cache: 'yarn'
Expand Down
66 changes: 33 additions & 33 deletions .github/workflows/test-ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,9 @@ name: Check

on:
push:
branches: [ master, renovate/** ]
branches: [ master, v4, renovate/** ]
pull_request:
branches: [ master ]
branches: [ master, v4 ]

env:
YARN_IGNORE_NODE: 1
Expand All @@ -23,7 +23,7 @@ jobs:
# tests on windows are extremely unstable
# os: [ ubuntu-22.04, windows-2019 ]
os: [ ubuntu-22.04 ]
node-version: [ 16, 18, 20, 22 ]
node-version: [ 22, 24 ]

steps:
- name: Cancel Workflow Action
Expand Down Expand Up @@ -88,17 +88,17 @@ jobs:
- name: Checkout Source code
uses: actions/checkout@v4

- name: Use Node.js 20
- name: Use Node.js 22
uses: actions/setup-node@v4
with:
node-version: 20
node-version: 22

- name: Enable corepack
run: |
corepack enable
corepack prepare yarn@stable --activate

- name: Activate cache for Node.js 20
- name: Activate cache for Node.js 22
uses: actions/setup-node@v4
with:
cache: 'yarn'
Expand Down Expand Up @@ -132,17 +132,17 @@ jobs:
- name: Checkout repository
uses: actions/checkout@v4

- name: Use Node.js 20
- name: Use Node.js 22
uses: actions/setup-node@v4
with:
node-version: 20
node-version: 22

- name: Enable corepack
run: |
corepack enable
corepack prepare yarn@stable --activate

- name: Activate cache for Node.js 20
- name: Activate cache for Node.js 22
uses: actions/setup-node@v4
with:
cache: 'yarn'
Expand All @@ -167,25 +167,25 @@ jobs:

release_next:
name: Release @next
if: github.event_name == 'push' && contains(github.event.ref, 'master') && (!contains(github.event.head_commit.message, '[skip ci]') && !contains(github.event.head_commit.message, 'docs:'))
if: github.event_name == 'push' && contains(github.event.ref, 'v4') && (!contains(github.event.head_commit.message, '[skip ci]') && !contains(github.event.head_commit.message, 'docs:'))
needs: build_and_test
runs-on: ubuntu-22.04

steps:
- name: Checkout repository
uses: actions/checkout@v4

- name: Use Node.js 20
- name: Use Node.js 22
uses: actions/setup-node@v4
with:
node-version: 20
node-version: 22

- name: Enable corepack
run: |
corepack enable
corepack prepare yarn@stable --activate

- name: Activate cache for Node.js 20
- name: Activate cache for Node.js 22
uses: actions/setup-node@v4
with:
cache: 'yarn'
Expand Down Expand Up @@ -219,7 +219,7 @@ jobs:
run: |
git config --global user.name 'Apify Release Bot'
git config --global user.email '[email protected]'
yarn turbo copy --force -- --canary --preid=beta
yarn turbo copy --force -- --canary=major --preid=beta
git commit -am "chore: bump canary versions [skip ci]"

echo "access=public" > .npmrc
Expand All @@ -230,22 +230,22 @@ jobs:
GIT_USER: '[email protected]:${{ secrets.APIFY_SERVICE_ACCOUNT_GITHUB_TOKEN }}'
GH_TOKEN: ${{ secrets.APIFY_SERVICE_ACCOUNT_GITHUB_TOKEN }}

- name: Collect versions for Docker images
id: versions
run: |
crawlee=`node -p "require('./packages/crawlee/package.json').version"`
echo "crawlee=$crawlee" >> $GITHUB_OUTPUT

- name: Trigger Docker image builds
uses: peter-evans/repository-dispatch@v3
# Trigger next images only if we have something new pushed
if: steps.changed-packages.outputs.changed_packages != '0'
with:
token: ${{ secrets.APIFY_SERVICE_ACCOUNT_GITHUB_TOKEN }}
repository: apify/apify-actor-docker
event-type: build-node-images
client-payload: >
{
"crawlee_version": "${{ steps.versions.outputs.crawlee }}",
"release_tag": "beta"
}
# - name: Collect versions for Docker images
# id: versions
# run: |
# crawlee=`node -p "require('./packages/crawlee/package.json').version"`
# echo "crawlee=$crawlee" >> $GITHUB_OUTPUT
#
# - name: Trigger Docker image builds
# uses: peter-evans/repository-dispatch@v3
# # Trigger next images only if we have something new pushed
# if: steps.changed-packages.outputs.changed_packages != '0'
# with:
# token: ${{ secrets.APIFY_SERVICE_ACCOUNT_GITHUB_TOKEN }}
# repository: apify/apify-actor-docker
# event-type: build-node-images
# client-payload: >
# {
# "crawlee_version": "${{ steps.versions.outputs.crawlee }}",
# "release_tag": "beta"
# }
6 changes: 3 additions & 3 deletions .github/workflows/test-e2e.yml
Original file line number Diff line number Diff line change
Expand Up @@ -29,17 +29,17 @@ jobs:
- name: Checkout repository
uses: actions/checkout@v4

- name: Use Node.js 20
- name: Use Node.js 22
uses: actions/setup-node@v4
with:
node-version: 20
node-version: 22

- name: Enable corepack
run: |
corepack enable
corepack prepare yarn@stable --activate

- name: Activate cache for Node.js 20
- name: Activate cache for Node.js 22
uses: actions/setup-node@v4
with:
cache: 'yarn'
Expand Down
95 changes: 0 additions & 95 deletions docs/experiments/systemInfoV2.mdx

This file was deleted.

1 change: 0 additions & 1 deletion docs/guides/configuration.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -94,7 +94,6 @@ Storage directories are purged by default. If set to `false` - local storage dir

#### `CRAWLEE_CONTAINERIZED`

This variable is only effective when the systemInfoV2 experiment is enabled.
Changes how crawlee measures its CPU and Memory usage and limits. If unset, crawlee will determine if it is containerised using common features of containerized environments using the `isContainerized` utility function.
- A file at `/.dockerenv`.
- A file at `/proc/self/cgroup` containing `docker`.
Expand Down
53 changes: 53 additions & 0 deletions docs/upgrading/upgrading_v4.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
---
id: upgrading-to-v4
title: Upgrading to v4
---

import ApiLink from '@site/src/components/ApiLink';

This page summarizes most of the breaking changes in Crawlee v4.

## ECMAScript modules

Crawlee v4 is a native ESM package now. It can be still consumed from a CJS project, as long as you use TypeScript and Node.js version that supports `require(esm)`.

## Node 22+ required

Support for older node versions was dropped.

## TypeScript 5.8+ required

Support for older TypeScript versions was dropped. Older versions might work too, but only if your project is also ESM.

## Cheerio v1

Previously, we kept the dependency on cheerio locked to the latest RC version, since there were many breaking changes introduced in v1.0. This release bumps cheerio to the stable v1. Also, we now use the default `parse5` internally.

## Deprecated crawler options are removed

The crawler following options are removed:

- `handleRequestFunction` -> `requestHandler`
- `handlePageFunction` -> `requestHandler`
- `handleRequestTimeoutSecs` -> `requestHandlerTimeoutSecs`
- `handleFailedRequestFunction` -> `failedRequestHandler`

## Crawling context no longer includes Error for failed requests

The crawling context no longer includes the `Error` object for failed requests. Use the second parameter of the `errorHandler` or `failedRequestHandler` callbacks to access the error.

## Crawling context is strictly typed

Previously, the crawling context extended a `Record` type, allowing to access any property. This was changed to a strict type, which means that you can only access properties that are defined in the context.

## `additionalBlockedStatusCodes` parameter is removed

`additionalBlockedStatusCodes` parameter of `Session.retireOnBlockedStatusCodes` method is removed. Use the `blockedStatusCodes` crawler option instead.

## Remove `experimentalContainers` option

This experimental option relied on an outdated manifest version for browser extensions, it is not possible to achieve this with the currently supported versions.

## Available resource detection

In v3, we introduced a new way to detect available resources for the crawler, available via `systemInfoV2` flag. In v4, this is the default way to detect available resources. The old way is removed completely together with the `systemInfoV2` flag.
Loading