Skip to content

Conversation

@Alvaro-Kothe
Copy link
Member

This is just an idea of using simdjson in pandas, partially following #58278.

  • Uses simdjson with meson wraps
  • Still not using nanoarrow, and I don't know how would it be possible without knowing the json schema beforehand.
  • Deletes vendored ujson decoder.

I also explored pysimdjson, which would be viable and do the exact same thing that this PR does. Although, it only decodes, while creating an extension permits to create an encoder on top of simdjson.

From the current PR, compared to ultrajson, it shows no performance increase and memory consumption remained similar, while with pysimdjson seems to increase the memory consumption slightly.

Copy link
Contributor

@divya1974 divya1974 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. In pandas/_libs/src/parser/json.cpp : Missing explicit GIL handling and no try/catch around C++ exceptions — risk of C++ exceptions escaping to Python and leaking resources.
  2. _json.py : No clear fallback/warning path when the native simdjson extension fails to import/compile; users may get import errors instead of a documented fallback.
  3. pyproject.toml : You added native code and new subprojects but didn’t update build/test instructions or include meson steps in CI/docs; packaging may fail for users installing from source.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants