Skip to content

Commit

Permalink
Merge pull request #418 from openzim/scraperlib_4_1
Browse files Browse the repository at this point in the history
Use HTML/JS/CSS functions extracted in zimscraperlib and adapt to scraperlib 5
  • Loading branch information
benoit74 authored Jan 7, 2025
2 parents 5040eee + 1218df0 commit 1ba5285
Show file tree
Hide file tree
Showing 46 changed files with 128 additions and 8,887 deletions.
13 changes: 1 addition & 12 deletions .github/workflows/Publish.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ on:

jobs:
publish:
runs-on: ubuntu-22.04
runs-on: ubuntu-24.04
permissions:
id-token: write # mandatory for PyPI trusted publishing

Expand All @@ -24,17 +24,6 @@ jobs:
pip install -U pip
pip install -e .[scripts]
- name: Generate fuzzy rules
run: python rules/generate_rules.py

- name: Build Javascript wombatSetup.js
uses: addnab/docker-run-action@v3
with:
image: node:20-bookworm
options: -v ${{ github.workspace }}/src/warc2zim/statics:/output -v ${{ github.workspace }}/rules:/src/rules -v ${{ github.workspace }}/javascript:/src/javascript -v ${{ github.workspace }}/build_js.sh:/src/build_js.sh
run: |
/src/build_js.sh
- name: Build packages
run: |
pip install -U pip build
Expand Down
10 changes: 1 addition & 9 deletions .github/workflows/PublishDockerDevImage.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -7,19 +7,11 @@ on:

jobs:
publish:
runs-on: ubuntu-22.04
runs-on: ubuntu-24.04

steps:
- uses: actions/checkout@v4

- name: Build Javascript wombatSetup.js
uses: addnab/docker-run-action@v3
with:
image: node:20-bookworm
options: -v ${{ github.workspace }}/src/warc2zim/statics:/output -v ${{ github.workspace }}/rules:/src/rules -v ${{ github.workspace }}/javascript:/src/javascript -v ${{ github.workspace }}/build_js.sh:/src/build_js.sh
run: |
/src/build_js.sh
- name: Build and push Docker image
uses: openzim/docker-publish-action@v10
with:
Expand Down
22 changes: 1 addition & 21 deletions .github/workflows/QA.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ on:

jobs:
check-qa:
runs-on: ubuntu-22.04
runs-on: ubuntu-24.04

steps:
- uses: actions/checkout@v4
Expand All @@ -24,9 +24,6 @@ jobs:
pip install -U pip
pip install -e .[lint,scripts,test,check]
- name: Generate fuzzy rules
run: python rules/generate_rules.py

- name: Check black formatting
run: inv lint-black

Expand All @@ -35,20 +32,3 @@ jobs:

- name: Check pyright
run: inv check-pyright

- name: Set up Node.JS
uses: actions/setup-node@v4
with:
node-version: 20

- name: Install JS dependencies
working-directory: javascript
run: yarn install

- name: Check prettier formatting
working-directory: javascript
run: yarn prettier-check

- name: Check eslint rules
working-directory: javascript
run: yarn eslint
2 changes: 1 addition & 1 deletion .github/workflows/TestWebsite.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ on:

jobs:
publish:
runs-on: ubuntu-22.04
runs-on: ubuntu-24.04

steps:
- uses: actions/checkout@v4
Expand Down
30 changes: 3 additions & 27 deletions .github/workflows/Tests.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ on:

jobs:
run-tests:
runs-on: ubuntu-22.04
runs-on: ubuntu-24.04

steps:
- uses: actions/checkout@v4
Expand All @@ -24,9 +24,6 @@ jobs:
pip install -U pip
pip install -e .[test,scripts]
- name: Generate fuzzy rules
run: python rules/generate_rules.py

- name: Run the tests
run: inv coverage --args "-vvv"

Expand All @@ -35,21 +32,8 @@ jobs:
with:
token: ${{ secrets.CODECOV_TOKEN }}

- name: Set up Node.JS
uses: actions/setup-node@v4
with:
node-version: 20

- name: Install JS dependencies
working-directory: javascript
run: yarn install

- name: Run JS tests
working-directory: javascript
run: yarn test

build_python:
runs-on: ubuntu-22.04
runs-on: ubuntu-24.04
steps:
- uses: actions/checkout@v4

Expand All @@ -59,21 +43,13 @@ jobs:
python-version-file: pyproject.toml
architecture: x64

- name: Install dependencies (and project)
run: |
pip install -U pip build
pip install -e .[scripts]
- name: Generate fuzzy rules
run: python rules/generate_rules.py

- name: Ensure we can build Python targets
run: |
pip install -U pip build
python3 -m build --sdist --wheel
build_docker:
runs-on: ubuntu-22.04
runs-on: ubuntu-24.04
steps:
- uses: actions/checkout@v4

Expand Down
12 changes: 0 additions & 12 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -495,18 +495,6 @@ pyrightconfig.json
# ignore all vscode, this is not standard configuration in this place
.vscode

# installed at build time
src/warc2zim/statics/wombat.js

# temporary directories used during development
output
tmp

# rule files are generated by rules/generate_rules.py
src/warc2zim/rules.py
tests/test_fuzzy_rules.py
javascript/src/fuzzyRules.js
javascript/test/fuzzyRules.js

# wombatSetup.js is generated with rollup
src/warc2zim/statics/wombatSetup.js
4 changes: 2 additions & 2 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,11 +11,11 @@ repos:
hooks:
- id: black
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.6.9
rev: v0.8.4
hooks:
- id: ruff
- repo: https://github.com/RobertCraigie/pyright-python
rev: v1.1.383
rev: v1.1.391
hooks:
- id: pyright
name: pyright (system)
Expand Down
4 changes: 3 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### Changed

- Upgrade to wombat 3.8.6 (#334)
- Upgrade dependencies: zimscraperlib 5.0.0, warcio 1.7.5, cdxj_index 1.4.6 and others
- Use all rewriting stuff from zimscraperlib
- Remove most HTML / CSS / JS rewriting logic which is now part of zimscraperlib 5
- Fix wombat setup settings (especially `isSW`) (#293)

### Fixed
Expand Down
6 changes: 2 additions & 4 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
FROM python:3.12-slim-bookworm
LABEL org.opencontainers.image.source https://github.com/openzim/warc2zim
LABEL org.opencontainers.image.source=https://github.com/openzim/warc2zim

RUN apt-get update -y \
&& apt-get install -y --no-install-recommends \
Expand All @@ -12,15 +12,13 @@ RUN apt-get update -y \
WORKDIR /output

# Copy pyproject.toml and its dependencies
COPY pyproject.toml openzim.toml README.md /src/
COPY rules/generate_rules.py /src/rules/generate_rules.py
COPY pyproject.toml README.md /src/
COPY src/warc2zim/__about__.py /src/src/warc2zim/__about__.py

# Install Python dependencies
RUN pip install --no-cache-dir /src

# Copy code + associated artifacts
COPY rules /src/rules
COPY src /src/src
COPY *.md /src/

Expand Down
21 changes: 4 additions & 17 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -168,26 +168,13 @@ Start a hatch shell: this will install software including dependencies in an iso
hatch shell
```

### Regenerate wombatSetup.js
### Rewriting logic and rewriting rules

wombatSetup.js is the JS code used to setup wombat when the ZIM is used.
Mostly all rewriting logic and rewriting rules now comes from the [python-scraperlib](https://github.com/openzim/python-scraperlib/).

It is normally retrieved by Python build process (see openzim.toml for details).
Should you need to add more rules or modify rewriting logic, this is the place to go.

Recommended solution to develop this JS code is to install Node.JS on your system, and then

```bash
cd javascript
yarn build-dev # or yarn build-prod
```

Should you want to regenerate this code without install Node.JS, you might simply run following command.

```bash
docker run -v $PWD/src/warc2zim/statics:/output -v $PWD/rules:/src/rules -v $PWD/javascript:/src/javascript -v $PWD/build_js.sh:/src/build_js.sh -it --rm --entrypoint /src/build_js.sh node:20-bookworm
```

It will install Python3 on-top of Node.JS in a Docker container, generate JS fuzzy rules and bundle JS code straight to `/src/warc2zim/statics/wombatSetup.js` where the file is expected to be placed.
All resulting code (Python and Javascript) as well as wombat.js and wombat-setup.js comes from the python-scraperlib.

## License

Expand Down
26 changes: 0 additions & 26 deletions build_js.sh

This file was deleted.

84 changes: 0 additions & 84 deletions docs/functional_architecture.md

This file was deleted.

Loading

0 comments on commit 1ba5285

Please sign in to comment.