Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Add json_extract function to PPL #3211

Open
acarbonetto opened this issue Dec 18, 2024 · 2 comments
Open

[FEATURE] Add json_extract function to PPL #3211

acarbonetto opened this issue Dec 18, 2024 · 2 comments
Labels
enhancement New feature or request

Comments

@acarbonetto
Copy link
Collaborator

acarbonetto commented Dec 18, 2024

Is your feature request related to a problem?

As part of the RFC to add JSON functions, the json_extract function would be useful to extract specific objects and scalars from an existing json_object.

What solution would you like?

### `JSON_EXTRACT`

**Description**

`json_extract(doc, path [, path]...)` Extracts a json value or scalar from a json document based on the path(s) specified.

**Argument type:** STRING, STRING

**Return type:** STRING | NULL | JSON ARRAY

- Returns a JSON array for multiple paths or if the path leads to an array.  
- Return null if path is not valid.

Example:

    os> source=people | eval `json_extract(json('{"a":"b"}'), '$.a')` = json_extract('{"a":"b"}', '$a')
    fetched rows / total rows = 1/1
    +----------------------------------+
    | json_extract('{"a":"b"}', 'a')   |
    +----------------------------------+
    | b                                |
    +----------------------------------+

    os> source=people | eval `json_extract(json('{"a":[{"b":1},{"b":2}]}'), '$.a[1].b')` = json_extract('{"a":[{"b":1},{"b":2}]}', '$.a[1].b')
    fetched rows / total rows = 1/1
    +-----------------------------------------------------------+
    | json_extract('{"a":[{"b":1.0},{"b":2.0}]}', '$.a[1].b')   |
    +-----------------------------------------------------------+
    | 2.0                                                       |
    +-----------------------------------------------------------+

    os> source=people | eval `json_extract('{"a":[{"b":1},{"b":2}]}', '$.a[*].b')` = json_extract(json('{"a":[{"b":1},{"b":2}]}'), '$.a[*].b')
    fetched rows / total rows = 1/1
    +-----------------------------------------------------------+
    | json_extract('{"a":[{"b":1.0},{"b":2.0}]}', '$.a[*].b')   |
    +-----------------------------------------------------------+
    | [1.0,2.0]                                                 |
    +-----------------------------------------------------------+

    os> source=people | eval `invalid_json` = json_extract(json('{"invalid": "json"'))
    fetched rows / total rows = 1/1
    +----------------+
    | invalid_json   |
    +----------------+
    | null           |
    +----------------+

What alternatives have you considered?

N/A

Do you have any additional context?

Related spark PPL PR: opensearch-project/opensearch-spark#780

@kenrickyap
Copy link
Contributor

kenrickyap commented Jan 17, 2025

Hi @YANG-DB @LantaoJin currently working on this ticket and was wondering if I could import the following library: https://github.com/json-path/JsonPath. It has APACHE 2.0 licence and would be ideal in handling the path traversal that needs to be done.

This would be more robust than implementing the functionality ourselves and would be useful to other json functions.

@YANG-DB
Copy link
Member

YANG-DB commented Jan 17, 2025

@kenrickyap I agree

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants