Skip to content

Commit

Permalink
Release 2.0.0
Browse files Browse the repository at this point in the history
  • Loading branch information
benoit74 committed Jun 4, 2024
1 parent a866229 commit d7cbe7e
Show file tree
Hide file tree
Showing 3 changed files with 16 additions and 10 deletions.
13 changes: 8 additions & 5 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,19 +5,22 @@ All notable changes to this project are documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html) (as of version 1.4.0).

## [Unreleased]
## [2.0.0] - 2024-06-04

### Added

- Allow to specify a scraper suffix for the ZIM scraper metadata at the CLI (#168)
- Allow to specify a scraper suffix for the ZIM scraper metadata at the CLI (#168)
- New test website to test many known situations supposed to be handled (#166)

### Changed

- Replace **Service Worker** approach by **scraper-side rewriting** of static content (https://github.com/kiwix/overview/issues/95)
- Adopted Python bootstrap conventions (#152)
- Upgrade dependencies, especially move to **Python 3.12** (only) and zimscraperlib 3.3.2
- Change wording in logs about the return code 100 (which is not an error code)
- Added checks in `converter.py` to verify output directory existence, logging appropriate error messages and cleanly exit if checks fail.
- Added check for invalid zim file names
- Changed default publisher metadata from 'Kiwix' to 'openZIM'
- Added checks in `converter.py` to verify output directory existence, logging appropriate error messages and cleanly exit if checks fail. (#106)
- Added check for invalid zim file names (#232)
- Changed default publisher metadata from 'Kiwix' to 'openZIM' (#150)

## [1.5.5] - 2024-01-18

Expand Down
11 changes: 7 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,10 +7,13 @@
[![PyPI - Supported Python versions](https://img.shields.io/pypi/pyversions/warc2zim.svg)](https://pypi.org/project/warc2zim)


warc2zim provides a way to convert WARC files to ZIM, storing the WARC payload and WARC+HTTP headers separately.
warc2zim converts WARC files to ZIM file. The resulting ZIM contains all WARC records, with "programming" records (HTML/CSS/JS/...) rewriten for proper offline operation.

Additionally, the [ReplayWeb.page](https://replayweb.page) is also added to the ZIM, creating a self-contained ZIM
that can render its content in a modern browser.
The resulting ZIM is self-contained and can render properly in offline situations.

Since warc2zim 2.0.0, service workers and HTTPs are not needed anymore for proper ZIM rendering (this was a big constraint of ZIM produced by warc2zim 1.x).

WARC format being an archive of any website property, warc2zim is the perfect companion to turn any website into an offline content (see e.g. https://www.github.com/openzim/zimit for a scraper bundling the approach, transform a website URL into an offline ZIM content in a single command).

## Capabilities

Expand Down Expand Up @@ -58,7 +61,7 @@ Scenario which are known to work well:
- Web workers are not yet supported (see https://github.com/openzim/warc2zim/issues/272)
- Service workers are not supported and will most probably never be
- Inline JS code inside an onxxx HTML event (e.g. onclick, onhover, ...) is rewritten, so for instance redirection to another handled with these events is working
- However since URL rewriting is performed with dynamic JS rewriting, at this stage scraper has no clue on what is inside the ZIM and what is external ; all URLs are hence supposed to be internal, which might break some dynamic redirection to an online website
- However since URL rewriting is performed with dynamic JS rewriting, at this stage scraper has no clue on what is inside the ZIM and what is external ; all URLs are hence supposed to be internal, which might break some dynamic redirection to an online website

It is also important to note that warc2zim is inherently limited to what is present inside the WARC. A bad WARC can only produce a bad ZIM. Garbage in, garbage out.

Expand Down
2 changes: 1 addition & 1 deletion src/warc2zim/__about__.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
__version__ = "2.0.0-dev9"
__version__ = "2.0.0"

0 comments on commit d7cbe7e

Please sign in to comment.