diff --git a/.gitignore b/.gitignore
index 43e21fd85..5942f401a 100644
--- a/.gitignore
+++ b/.gitignore
@@ -1,8 +1,8 @@
.DS_Store
node_modules/*
yarn.lock
-tesseract.dev.js
-worker.dev.js
+tesseract.min.js
+worker.min.js
*.traineddata
*.traineddata.gz
.nyc_output
diff --git a/README.md b/README.md
index dee50a29f..8493e83ea 100644
--- a/README.md
+++ b/README.md
@@ -31,82 +31,32 @@ Video Real-time Recognition
Tesseract.js wraps a [webassembly port](https://github.com/naptha/tesseract.js-core) of the [Tesseract](https://github.com/tesseract-ocr/tesseract) OCR Engine.
-It works in the browser using [webpack](https://webpack.js.org/) or plain script tags with a [CDN](#CDN) and on the server with [Node.js](https://nodejs.org/en/).
+It works in the browser using [webpack](https://webpack.js.org/), esm, or plain script tags with a [CDN](#CDN) and on the server with [Node.js](https://nodejs.org/en/).
After you [install it](#installation), using it is as simple as:
-```javascript
-import Tesseract from 'tesseract.js';
-
-Tesseract.recognize(
- 'https://tesseract.projectnaptha.com/img/eng_bw.png',
- 'eng',
- { logger: m => console.log(m) }
-).then(({ data: { text } }) => {
- console.log(text);
-})
-```
-
-Or using workers (recommended for production use):
-
```javascript
import { createWorker } from 'tesseract.js';
-const worker = await createWorker({
- logger: m => console.log(m)
-});
-
(async () => {
- await worker.loadLanguage('eng');
- await worker.initialize('eng');
- const { data: { text } } = await worker.recognize('https://tesseract.projectnaptha.com/img/eng_bw.png');
- console.log(text);
+ const worker = await createWorker('eng');
+ const data = await worker.recognize('https://tesseract.projectnaptha.com/img/eng_bw.png');
+ console.log(data.text);
await worker.terminate();
})();
```
+When recognizing multiple images, users should create a worker once, run `worker.recognize` for each image, and then run `worker.terminate()` once at the end (rather than running the above snippet for every image).
-For a basic overview of the functions, including the pros/cons of different approaches, see the [intro](./docs/intro.md). [Check out the docs](#documentation) for a full explanation of the API.
-
-## Major changes in v4
-Version 4 includes many new features and bug fixes--see [this issue](https://github.com/naptha/tesseract.js/issues/662) for a full list. Several highlights are below.
-
-- Added rotation preprocessing options (including auto-rotate) for significantly better accuracy
-- Processed images (rotated, grayscale, binary) can now be retrieved
-- Improved support for parallel processing (schedulers)
-- Breaking changes:
- - `createWorker` is now async
- - `getPDF` function replaced by `pdf` recognize option
-
-## Major changes in v3
-- Significantly faster performance
- - Runtime reduction of 84% for Browser and 96% for Node.js when recognizing the [example images](./examples/data)
-- Upgrade to Tesseract v5.1.0 (using emscripten 3.1.18)
-- Added SIMD-enabled build for supported devices
-- Added support:
- - Node.js version 18
-- Removed support:
- - ASM.js version, any other old versions of Tesseract.js-core (<3.0.0)
- - Node.js versions 10 and 12
-
-## Major changes in v2
-- Upgrade to tesseract v4.1.1 (using emscripten 1.39.10 upstream)
-- Support multiple languages at the same time, eg: eng+chi\_tra for English and Traditional Chinese
-- Supported image formats: png, jpg, bmp, pbm
-- Support WebAssembly (fallback to ASM.js when browser doesn't support)
-- Support Typescript
-
-Read a story about v2: Why I refactor tesseract.js v2?
- Check the support/1.x branch for version 1
## Installation
Tesseract.js works with a `
+
+
```
-After including the script the `Tesseract` variable will be globally available.
+After including the script the `Tesseract` variable will be globally available and a worker can be created using `Tesseract.createWorker`.
+Alternatively, an ESM build (used with `import` syntax) can be found at `https://cdn.jsdelivr.net/npm/tesseract.js@5/dist/tesseract.esm.min.js`.
### Node.js
@@ -122,16 +72,51 @@ npm install tesseract.js@3.0.3
yarn add tesseract.js@3.0.3
```
-
## Documentation
-* [Intro](./docs/intro.md)
+* [Workers vs. Schedulers](./docs/workers_vs_schedulers.md)
* [Examples](./docs/examples.md)
-* [Image Format](./docs/image-format.md)
+* [Supported Image Formats](./docs/image-format.md)
* [API](./docs/api.md)
* [Local Installation](./docs/local-installation.md)
* [FAQ](./docs/faq.md)
+## Major changes in v5
+Version 5 changes are documented in [this issue](https://github.com/naptha/tesseract.js/issues/820). Highlights are below.
+
+ - Significantly smaller files by default (54% smaller for English, 73% smaller for Chinese)
+ - This results in a ~50% reduction in runtime for first-time users (who do not have the files cached yet)
+ - Significantly lower memory usage
+ - Compatible with iOS 17 (using default settings)
+ - Breaking changes:
+ - `createWorker` arguments changed
+ - Setting non-default language and OEM now happens in `createWorker`
+ - E.g. `createWorker("chi_sim", 1)`
+ - `worker.initialize` and `worker.loadLanguage` functions now do nothing and can be deleted from code
+ - See [this issue](https://github.com/naptha/tesseract.js/issues/820) for full list
+
+## Major changes in v4
+Version 4 includes many new features and bug fixes--see [this issue](https://github.com/naptha/tesseract.js/issues/662) for a full list. Several highlights are below.
+
+- Added rotation preprocessing options (including auto-rotate) for significantly better accuracy
+- Processed images (rotated, grayscale, binary) can now be retrieved
+- Improved support for parallel processing (schedulers)
+- Breaking changes:
+ - `createWorker` is now async
+ - `getPDF` function replaced by `pdf` recognize option
+
+## Major changes in v3
+- Significantly faster performance
+ - Runtime reduction of 84% for Browser and 96% for Node.js when recognizing the [example images](./examples/data)
+- Upgrade to Tesseract v5.1.0 (using emscripten 3.1.18)
+- Added SIMD-enabled build for supported devices
+- Added support:
+ - Node.js version 18
+- Removed support:
+ - ASM.js version, any other old versions of Tesseract.js-core (<3.0.0)
+ - Node.js versions 10 and 12
+
+
## Use tesseract.js the way you like!
- Electron Version: https://github.com/Balearica/tesseract.js-electron
@@ -167,7 +152,7 @@ npm start
```
The development server will be available at http://localhost:3000/examples/browser/demo.html in your favorite browser.
-It will automatically rebuild `tesseract.dev.js` and `worker.dev.js` when you change files in the **src** folder.
+It will automatically rebuild `tesseract.min.js` and `worker.min.js` when you change files in the **src** folder.
### Online Setup with a single Click
diff --git a/benchmarks/browser/auto-rotate-benchmark.html b/benchmarks/browser/auto-rotate-benchmark.html
index ac97ed125..dcc1f003f 100644
--- a/benchmarks/browser/auto-rotate-benchmark.html
+++ b/benchmarks/browser/auto-rotate-benchmark.html
@@ -1,7 +1,7 @@