Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not compatible (?) with Next.js projects using turbopack #8

Open
chuguystyr opened this issue Sep 23, 2024 · 6 comments
Open

Not compatible (?) with Next.js projects using turbopack #8

chuguystyr opened this issue Sep 23, 2024 · 6 comments

Comments

@chuguystyr
Copy link

When trying to use the library inside server component in next js 14 project using turbopack like this:

import scribe from "scribe.js-ocr"
const UplaodScheduleForm: React.FC = async () => {
  return (
    <form action={submitHandler}>
      <input type="file" name="schedule" id="schedule" accept=".pdf" />
      <button type="submit">Upload</button>
    </form>
  )
}

export default UplaodScheduleForm

const submitHandler = async (e: FormData) => {
  "use server"
  const file = e.get("schedule") as File
  console.log("Extracting text from file")
  scribe.extractText([file]).then((text: string) => {
    console.log("Extracted text")
    console.log(text)
  })
}

I get errors like these:

 ⨯ unhandledRejection: TypeError [ERR_INVALID_ARG_TYPE]: The "path" argument must be of type string or an instance of Buffer or URL. Received an instance of URL
    at open (node:internal/fs/promises:635:10)
    at readFile (node:internal/fs/promises:1238:20)
    at loadBuiltInFontsRaw (C:\scripti\.next\server\chunks\ssr\node_modules_scribe_js-ocr_5d715d._.js:10953:26)
    at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
    at async Promise.all (index 3)
    at async init (C:\scripti\.next\server\chunks\ssr\node_modules_scribe_js-ocr_5d715d._.js:17308:5) {
  code: 'ERR_INVALID_ARG_TYPE'
}

It seems the error happens here when lib isn't able to get fonts:

 } else {
        const { readFile } = await Promise.resolve().then(()=>__turbopack_external_require__('fs/promises', true));
        carlitoNormal = readFile(new __turbopack_relative_url__(__turbopack_require__("[project]/node_modules/scribe.js-ocr/fonts/all_ttf/Carlito-Regular.ttf [app-rsc] (static)"))).then((res)=>res.buffer);
        carlitoItalic = readFile(new __turbopack_relative_url__(__turbopack_require__("[project]/node_modules/scribe.js-ocr/fonts/all_ttf/Carlito-Italic.ttf [app-rsc] (static)"))).then((res)=>res.buffer);
        carlitoBold = readFile(new __turbopack_relative_url__(__turbopack_require__("[project]/node_modules/scribe.js-ocr/fonts/all_ttf/Carlito-Bold.ttf [app-rsc] (static)"))).then((res)=>res.buffer);
        centuryNormal = readFile(new __turbopack_relative_url__(__turbopack_require__("[project]/node_modules/scribe.js-ocr/fonts/all_ttf/C059-Roman.ttf [app-rsc] (static)"))).then((res)=>res.buffer);
        centuryItalic = readFile(new __turbopack_relative_url__(__turbopack_require__("[project]/node_modules/scribe.js-ocr/fonts/all_ttf/C059-Italic.ttf [app-rsc] (static)"))).then((res)=>res.buffer);
        centuryBold = readFile(new __turbopack_relative_url__(__turbopack_require__("[project]/node_modules/scribe.js-ocr/fonts/all_ttf/C059-Bold.ttf [app-rsc] (static)"))).then((res)=>res.buffer);
        garamondNormal = readFile(new __turbopack_relative_url__(__turbopack_require__("[project]/node_modules/scribe.js-ocr/fonts/all_ttf/EBGaramond-Regular.ttf [app-rsc] (static)"))).then((res)=>res.buffer);
        garamondItalic = readFile(new __turbopack_relative_url__(__turbopack_require__("[project]/node_modules/scribe.js-ocr/fonts/all_ttf/EBGaramond-Italic.ttf [app-rsc] (static)"))).then((res)=>res.buffer);
        garamondBold = readFile(new __turbopack_relative_url__(__turbopack_require__("[project]/node_modules/scribe.js-ocr/fonts/all_ttf/EBGaramond-Bold.ttf [app-rsc] (static)"))).then((res)=>res.buffer);
        palatinoNormal = readFile(new __turbopack_relative_url__(__turbopack_require__("[project]/node_modules/scribe.js-ocr/fonts/all_ttf/P052-Roman.ttf [app-rsc] (static)"))).then((res)=>res.buffer);
        palatinoItalic = readFile(new __turbopack_relative_url__(__turbopack_require__("[project]/node_modules/scribe.js-ocr/fonts/all_ttf/P052-Italic.ttf [app-rsc] (static)"))).then((res)=>res.buffer);
        palatinoBold = readFile(new __turbopack_relative_url__(__turbopack_require__("[project]/node_modules/scribe.js-ocr/fonts/all_ttf/P052-Bold.ttf [app-rsc] (static)"))).then((res)=>res.buffer);
        nimbusRomNo9LNormal = readFile(new __turbopack_relative_url__(__turbopack_require__("[project]/node_modules/scribe.js-ocr/fonts/all_ttf/NimbusRoman-Regular.ttf [app-rsc] (static)"))).then((res)=>res.buffer);
        nimbusRomNo9LItalic = readFile(new __turbopack_relative_url__(__turbopack_require__("[project]/node_modules/scribe.js-ocr/fonts/all_ttf/NimbusRoman-Italic.ttf [app-rsc] (static)"))).then((res)=>res.buffer);
        nimbusRomNo9LBold = readFile(new __turbopack_relative_url__(__turbopack_require__("[project]/node_modules/scribe.js-ocr/fonts/all_ttf/NimbusRoman-Bold.ttf [app-rsc] (static)"))).then((res)=>res.buffer);
        nimbusSansNormal = readFile(new __turbopack_relative_url__(__turbopack_require__("[project]/node_modules/scribe.js-ocr/fonts/all_ttf/NimbusSans-Regular.ttf [app-rsc] (static)"))).then((res)=>res.buffer);
        nimbusSansItalic = readFile(new __turbopack_relative_url__(__turbopack_require__("[project]/node_modules/scribe.js-ocr/fonts/all_ttf/NimbusSans-Italic.ttf [app-rsc] (static)"))).then((res)=>res.buffer);
        nimbusSansBold = readFile(new __turbopack_relative_url__(__turbopack_require__("[project]/node_modules/scribe.js-ocr/fonts/all_ttf/NimbusSans-Bold.ttf [app-rsc] (static)"))).then((res)=>res.buffer);

Is there a way to fix this ?

@Balearica
Copy link
Contributor

It looks like Turbopack currently does not support the ability to arbitrarily define variables (in Webpack this is done using DefinePlugin), which is how the other builds are forced to use the browser version rather than the Node.js version. Therefore, supporting Turbopack appears to require an update. I will try and implement a fix that improves support for build systems in the next couple of days.

@Balearica
Copy link
Contributor

I was not familiar with Turbopack, but looked into it, and I don't think this will be possible for Turbopack to detect the environment until Turbopack is further along in development. It appears that Turbopack is currently in beta, and still lacks some fairly basic features/behaviors present in other build systems. Namely, there does not appear to be any mechanism for having the code detect that it's being built for browser (outside of Turbopack-specific environment variables).

  1. As noted above, Turbopack does not appear to support arbitrarily assigning globals used to detect environment (e.g. document, window, process, etc.).
  2. The current version also does not detect that the typeof window === 'undefined' condition should resolve to true in the browser version.
    1. This condition has been recommended for use within Webpack builds for years to determine environment--my guess is that Turbopack is broken for a non-trivial number of other libraries as well due to not handling this.
    2. This appears to have been fixed in a development branch but not master (Implement typeof window inlining for Turbopack vercel/next.js#66128). Therefore, it is likely that a future version of Turbopack will work.

@chuguystyr
Copy link
Author

Thanks for in-deph explanation, I really appreciate this. Tried using both nextjs 15rc and canary builds, but it's all the same. But when I moved from turbopack back to webpack I got another problem. Build fails with this message:

Failed to compile.

static/media/generalWorker.7a8d3e20.js from Terser
  x 'import', and 'export' cannot be used outside of module code
   ,-[1:1]
 1 | import { convertPageAbbyy } from '../import/convertPageAbbyy.js';
   : ^^^^^^
 2 | import { convertPageBlocks } from '../import/convertPageBlocks.js';
 3 | import { convertPageHocr } from '../import/convertPageHocr.js';
 4 | import { convertPageStext } from '../import/convertPageStext.js';
   `----

Caused by:
    0: failed to parse input file
    1: Syntax Error
Error:
  x 'import', and 'export' cannot be used outside of module code
   ,-[1:1]
 1 | import { convertPageAbbyy } from '../import/convertPageAbbyy.js';
   : ^^^^^^
 2 | import { convertPageBlocks } from '../import/convertPageBlocks.js';
 3 | import { convertPageHocr } from '../import/convertPageHocr.js';
 4 | import { convertPageStext } from '../import/convertPageStext.js';
   `----

Caused by:
    0: failed to parse input file
    1: Syntax Error

static/media/mupdf-worker.14e8c874.js from Terser
  x 'import', and 'export' cannot be used outside of module code
    ,-[74:1]
 74 |   return base64;
 75 | }
 76 |
 77 | export const mupdf = {};
    : ^^^^^^
 78 | let ready = false;
 79 |
 80 | if (typeof process === 'object') {
    `----

Caused by:
    0: failed to parse input file
    1: Syntax Error
Error:
  x 'import', and 'export' cannot be used outside of module code
    ,-[74:1]
 74 |   return base64;
 75 | }
 76 |
 77 | export const mupdf = {};
    : ^^^^^^
 78 | let ready = false;
 79 |
 80 | if (typeof process === 'object') {
    `----

Caused by:
    0: failed to parse input file
    1: Syntax Error


> Build failed because of webpack errors

While running dev server I get this (same for cloned sample repo but it doesn't prevent build there of course):

⚠ ./node_modules/scribe.js-ocr/js/worker/generalWorker.js
The generated code contains 'async/await' because this module is using "topLevelAwait".
However, your target environment does not appear to support 'async/await'.
As a result, the code may not run as expected or may cause runtime errors.

Import trace for requested module:
./node_modules/scribe.js-ocr/js/worker/generalWorker.js
./node_modules/scribe.js-ocr/js/generalWorkerMain.js
./node_modules/scribe.js-ocr/scribe.js
./components/courses/UploadSchedule.tsx

./node_modules/scribe.js-ocr/mupdf/mupdf-worker.js
The generated code contains 'async/await' because this module is using "topLevelAwait".
However, your target environment does not appear to support 'async/await'.
As a result, the code may not run as expected or may cause runtime errors.

Import trace for requested module:
./node_modules/scribe.js-ocr/mupdf/mupdf-worker.js
./node_modules/scribe.js-ocr/mupdf/mupdf-async.js
./node_modules/scribe.js-ocr/js/containers/imageContainer.js
./node_modules/scribe.js-ocr/scribe.js
./components/courses/UploadSchedule.tsx

./node_modules/web-worker/cjs/node.js
Critical dependency: the request of a dependency is an expression

Import trace for requested module:
./node_modules/web-worker/cjs/node.js
./node_modules/scribe.js-ocr/js/generalWorkerMain.js
./node_modules/scribe.js-ocr/scribe.js
./components/courses/UploadSchedule.tsx

I used the config you suggested in sample repo:

/** @type {import('next').NextConfig} */
const nextConfig = {
  reactStrictMode: true,
  webpack: (config, { buildId, dev, isServer, defaultLoaders, webpack }) => {
    if (!isServer) {
      // Set the 'process' to undefined in the client-side bundle
      config.plugins.push(
        new webpack.DefinePlugin({
          process: 'undefined'
        })
      );
    }
    return config;
  },

};

export default nextConfig;

I also use TS, so here's my tsconfig (assume this might cause trouble, but no idea how exactly):

{
  "compilerOptions": {
    "target": "es5",
    "lib": ["dom", "dom.iterable", "esnext"],
    "allowJs": true,
    "skipLibCheck": true,
    "strict": true,
    "noEmit": true,
    "esModuleInterop": true,
    "module": "esnext",
    "moduleResolution": "bundler",
    "resolveJsonModule": true,
    "isolatedModules": true,
    "jsx": "preserve",
    "incremental": true,
    "plugins": [
      {
        "name": "next"
      }
    ],
    "baseUrl": "."
  },
  "include": [
    "next-env.d.ts",
    "**/*.ts",
    "**/*.tsx",
    ".next/types/**/*.ts",
    "postcss.config.cjs",
    "next.config.js",
    "vitest.config.js",
    "__tests__/components/submitButton.test.tsx",
    "eslint.config.js"
  ],
  "exclude": ["node_modules"]
}

@Balearica
Copy link
Contributor

@chuguystyr Please provide a minimal example repo that produces this error while being as close to the Next.js example code as possible. Your message above seems to indicate you're using the same Next.js config file as the working example code but getting an error only in your project, so that's not enough to go off of.

@chuguystyr
Copy link
Author

Of course, here's the repo.

@Balearica
Copy link
Contributor

Of course, here's the repo.

@chuguystyr Thanks. It looks like this can be resolved by removing the if (!isServer) condition from the next.config.js file so the webpack.DefinePlugin configuration is always applied.

The reason why this repo differs from the working example repo I posted appears to be because it's using server components due to the app directory being used instead of pages. This causes the isServer argument in the next.config.js file to sometimes return true, and when this occurs, the webpack.DefinePlugin configuration is not applied and the Node.js code for Scribe.js is loaded, which causes the errors.

I'm going to update the Next.js example to remove this condition to avoid future confusion. Thinking about this more, I don't think there is ever a reason for the Node.js code to be used within a Next.js build. Even if you wanted to implement OCR server-side, I don't think executing within a React Server Component would ever be how you would implement that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants