Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace lodash with native ES2015 functions #1871

Merged
merged 3 commits into from
Dec 30, 2022

Conversation

joliss
Copy link

@joliss joliss commented Oct 22, 2022

Fixes #1870.

This patch brings packages/chevrotain/lib/chevrotain.min.js from 201 KB down to 158 KB.

It's split into two commits for better reviewability: one to use native ES2015 functions instead of lodash, and one to reformat everything with prettier (which causes a lot of lines to be reindented) and run pnpm install.

I tried to keep track of what features I'm using to replace lodash:

  • ES2015:

    • Array.prototype: reduce, map, filter, forEach, every, some, slice, find
    • Array.isArray
    • Object.keys
    • Object.assign
    • Object.prototype.hasOwnProperty
    • Set
  • ES2018:

    • spread in object literals { ...options, foo: true }; I'm using this and relying on the compiler to make it ES2015-compatible
  • ES2020:

    • null-ish coalescing operator ??; relying on the compiler for this as well -- I verified that both ?? and ... are indeed compiled away in packages/chevrotain/lib/chevroain.js

The following features I'm not using, to preserve ES2015 compatibility. They might be worth revisiting in the future if we bump the minimum ES version:

  • ES2016:

    • includes: using indexOf instead
  • ES2019:

    • flat: using ([] as SomeType[]).concat(...arr) instead
    • flatMap: using map followed by ([] as SomeType[]).concat(...arr), or a helper function (see comment)
  • not (yet?) in ES:

    • groupBy: using a three-liner instead

I'll leave a few comments on the diff right after posting this PR, to point out a few things you might want to check are OK before merging this. If there's anything you think needs to be changed, just let me know and I'll try to update the PR; or feel free to amend the commit as you see fit.

@@ -13,7 +12,7 @@ export class PerformanceTracer {
traceInitIndent: number

initPerformanceTracer(config: IParserConfig) {
if (has(config, "traceInitPerf")) {
if (config.traceInitPerf) {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I turned this into a check for truthiness, so that it doesn't run with traceInitPerf: false. I assume that's the intended behavior.

@@ -911,15 +878,16 @@ export function performWarningRuntimeChecks(
): ILexerDefinitionError[] {
const warnings = []
let hasAnyLineBreak = false
const allTokenTypes = compact(flatten(values(lexerDefinition.modes)))
const allTokenTypes = ([] as TokenType[])
.concat(...Object.values(lexerDefinition.modes ?? {}))
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lexerDefinition.modes: MultiModesDefinition, so it should not be null or undefined, but the ?? {} is still necessary to make the test suite pass. So there might be a bad typecast somewhere.

Lodash's values function hides this issue, as values(undefined) yields [], whereas Object.values(undefined) throws an error.

useSticky: SUPPORT_STICKY,
debug: false as boolean,
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This debug flag appeared unused and caused TypeScript to complain, so I removed it.

([currModeName, currModeValue]) => {
// Make currModeValue a non-sparse array
currModeValue = Array.from(currModeValue)
currModeValue.forEach((currTokType, currIdx) => {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lodash's forEach calls the callback for holes in sparse arrays (with argument undefined), whereas Array.prototype.forEach skips holes. This was the only place where this difference caused a test to fail, so I inserted the Array.from call above to make the array non-sparse. I haven't investigated to see whether the sparse array comes from the test code or whether it's created somewhere in the production code.

singleAssignCategoriesToksMap(newPath, nextCategory)
}
})
}

export function hasShortKeyProperty(tokType: TokenType): boolean {
return has(tokType, "tokenTypeIdx")
return tokType !== null && tokType.hasOwnProperty("tokenTypeIdx")
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tokType shouldn't ever be null according to the type signature, but it still seems to be happening for some reason. Lodash's has handles null fine, but we cannot call null.hasOwnProperty, so I had to check for it to make the tests pass.

assign(
this,
pickBy(options, (v) => v !== undefined)
)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function, as well as the ones in the chunks above, seem to work fine without pickBy. It would've been noisy to insert an inline version of pickBy several times, so opted to go with Object.assign(this, options), thus assigning fields even if they're undefined in options

// ES2019 Array.prototype.flatMap
function flatMap<U, R>(arr: U[], callback: (x: U, idx: number) => R[]): R[] {
return Array.prototype.concat.apply([], arr.map(callback))
}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

flatMap is called repeatedly in this file (checks.ts), and I found that inlining the above definition would've made the code less readable. So I opted to put a tiny non-exported helper function at the top of the file.

The same definition is also repeated in llk_lookahead.ts. I considered putting it into parse/parser/utils, but as it's just a one-liner, and it can go away once we bump the minimum ES version to 2019, I figured it's easier to just define this one-liner in each file.

@@ -147,5 +136,5 @@ export function validateMissingCstMethods(
}
)

return compact<IVisitorDefinitionError>(errors)
return errors.filter((err) => !!err)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that this compact isn't covered by the test suite, or perhaps it's unnecessary. Using just return errors, the tests still pass.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not see why the compact would be needed.
Perhaps it is legacy code...

@joliss
Copy link
Author

joliss commented Oct 22, 2022

I'll leave a few comments on the diff right after posting this PR

Ok, all done. I think it's ready for review!

@mattbishop
Copy link

That's a lot of work, thanks!

Can you run web_benchmark and see if there's a perf difference?

@bd82
Copy link
Member

bd82 commented Nov 2, 2022

I'm leaning towards not accepting this PR.

a ~20% reduction in bundle size simply does not seem to be worth (for me) the:

  • Reduction is code readability (less declarative code).
  • Added complexity of checking / validating availability of built-in functions in future features.
  • Poorer set of available utilities for future features.

Maybe this is because my use cases for Chevrotain are less sensitive in regards to bundle size...
Or maybe because the amount of time can dedicate to maintain Chevrotain is smaller than in the past
so I weigh maintainability related features higher...

@mattbishop
Copy link

It might be early, true.

How about performance? @joliss csn your I run the benchmark suite?

this.tokTypesThatCannotBeInsertedInRecovery,
tokType as unknown
return (
this.tokTypesThatCannotBeInsertedInRecovery.indexOf(tokType as any) === -1

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could also be

return !this.tokTypesThatCannotBeInsertedInRecovery.includes(tokType as any)

?

@bradenmacdonald
Copy link

Wow, awesome work! I was reviewing my project's dependencies today and annoyed by the 235 (!) lodash source files that are in the deno.lock dependency file (pre-bundling) because of Chevrotain.

I haven't yet contributed to Chevrotain's codebase myself, but as someone who works with modern TypeScript code all the time, I find the version in this PR much easier to read and follow because it uses standard functions and syntax - the same ones I'm used to. I am not familiar with assign, identity, compact, drop, flatMap, values, uniq, noop, etc. Even though those functions may follow common conventions, they're still an extra layer of indirection that I would need to learn and think about compared to built-in things. So I want to point out that I would disagree with regards to code readability - I think this change generally makes the code more readable (except in a few small instances), and thus has the benefit of making the codebase more accessible to potential contributors. However, it is a stylistic difference with the key function name (like map etc.) being moved to the right of the value, not preceding it, so I understand why it may be less appealing.

@joliss
Copy link
Author

joliss commented Dec 23, 2022

@bd82 I wasn't able to figure out how to run the benchmark, and I don't have my development machine with me over the holidays. But if you're up for it, I was wondering you might be able to try and benchmark to compare this PR's commit with its parent, to see if it makes any difference?

@bd82
Copy link
Member

bd82 commented Dec 23, 2022

I wasn't able to figure out how to run the benchmark, and I don't have my development machine with me over the holidays. But if you're up for it, I was wondering you might be able to try and benchmark to compare this PR's commit with its parent, to see if it makes any difference?

@joliss I will try to find some time to look into this again.

I doubt there will be a performance regression as the performance critical paths normally avoid the high order functions (e.g map / forEach / reduce / ...).

@bd82
Copy link
Member

bd82 commented Dec 23, 2022

So I want to point out that I would disagree with regards to code readability - I think this change generally makes the code more readable (except in a few small instances), and thus has the benefit of making the codebase more accessible to potential contributors

@bradenmacdonald:

It is quite possible I have certain biases and gotten "used" to certain styles.
I will have another look at the PR and re-evaluate it.

@@ -22,7 +21,7 @@ Object.freeze(RECOGNITION_EXCEPTION_NAMES)
// hacks to bypass no support for custom Errors in javascript/typescript
export function isRecognitionException(error: Error) {
// can't do instanceof on hacked custom js exceptions
return includes(RECOGNITION_EXCEPTION_NAMES, error.name)
return RECOGNITION_EXCEPTION_NAMES.indexOf(error.name) !== -1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bd82 bd82 changed the base branch from master to remove_lodash December 30, 2022 16:49
@bd82 bd82 merged commit f74a792 into Chevrotain:remove_lodash Dec 30, 2022
bd82 pushed a commit that referenced this pull request Jun 29, 2023
Replace lodash with native ES2015 functions (#1871)

Co-authored-by: Jo Liss <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Remove (or Replace) lodash dependency to reduce bundle size?
4 participants