- ...
0.1.0 (2024-07-09)
- new project bootstrapping via
pipx run crawlee create
- improve error handling in project bootstrapping
0.0.7 (2024-06-27)
- selector handling for
RETRY_CSS_SELECTORS
in_handle_blocked_request
inBeautifulSoupCrawler
- selector handling in
enqueue_links
inBeautifulSoupCrawler
- improve
AutoscaledPool
state management
0.0.6 (2024-06-25)
- BREAKING:
BasicCrawler.export_data
helper method which replacesBasicCrawler.export_to
Configuration.get_global_configuration
method- Automatic logging setup
- Context helper for logging (
context.log
)
- Handling of relative URLs in
add_requests
- Graceful exit in
BasicCrawler.run
0.0.5 (2024-06-21)
- Add explicit error messages for missing package extras during import
- Better browser abstraction:
BrowserController
- Wraps a single browser instance and maintains its state.BrowserPlugin
- Manages the browser automation framework, and basically acts as a factory for controllers.
- Browser rotation with a maximum number of pages opened per browser.
- Add emit persist state event to event manager
- Add batched request addition in
RequestQueue
- Add start requests option to
BasicCrawler
- Add storage-related helpers
get_data
,push_data
andexport_to
toBasicCrawler
andBasicContext
- Add enqueue links helper to
PlaywrightCrawler
- Add max requests per crawl option to
BasicCrawler
- Fix type error in persist state of statistics
0.0.4 (2024-05-30)
- Another internal release, adding statistics capturing, proxy configuration and
the initial version of browser management and
PlaywrightCrawler
.
Statistics
ProxyConfiguration
BrowserPool
PlaywrightCrawler
0.0.3 (2024-05-15)
- Another internal release, adding mainly session management and
BeautifulSoupCrawler
.
HttpxClient
SessionPool
BeautifulSoupCrawler
BaseStorageClient
Storages
andMemoryStorageClient
were refactored
0.0.2 (2024-04-11)
- The first internal release with
BasicCrawler
andHttpCrawler
.
EventManager
&LocalEventManager
Snapshotter
AutoscaledPool
MemoryStorageClient
Storages
BasicCrawler
&HttpCrawler
0.0.1 (2024-01-30)
- Dummy package
crawlee
was released on PyPI.