Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IS NULL filter operator #300

Open
jamesfisher-geo opened this issue Sep 18, 2024 · 4 comments
Open

IS NULL filter operator #300

jamesfisher-geo opened this issue Sep 18, 2024 · 4 comments

Comments

@jamesfisher-geo
Copy link
Collaborator

jamesfisher-geo commented Sep 18, 2024

Describe the bug
IS NULL is included in the filter extension logic. However, I cannot get it to work with cql2-json or cql2-text.

To Reproduce
Steps to reproduce the behavior:

  1. docker compose up app-opensearch or docker compose up app-elasticsearch
  2. python3 data_loader.py --base-url http://localhost:8082
  3. Try the following:
    POST http://localhost:8082/search
    body
{
  "filter-lang": "cql2-json",
  "filter": {
            "op": "isNull",
            "args": [ { "property": "sentinel:data_coverage" } ]
          }

}
{
    "detail": "Error with cql2_json filter: Q() can only accept dict with a single query ({\"match\": {...}}). Instead it got ({})"
}

GET http://localhost:8082/search?filter=sentinel:data_coverage > 50 OR landsat:coverage_percent < 10 OR (sentinel:data_coverage IS NULL AND landsat:coverage_percent IS NULL)

{
    "detail": "Error with cql2_json filter: Q() can only accept dict with a single query ({\"match\": {...}}). Instead it got ({})"
}

Expected behavior
valid ItemCollection

@jamesfisher-geo
Copy link
Collaborator Author

For full implementation we are waiting on a new version of pygeofilter

@jamesfisher-geo
Copy link
Collaborator Author

jamesfisher-geo commented Nov 7, 2024

It is possible that we could migrate the cql2 parsing logic to use cql-rs Python bindings

@louisstuart96
Copy link

In

elif query["op"] in [
ComparisonOp.EQ,
ComparisonOp.NEQ,
ComparisonOp.LT,
ComparisonOp.LTE,
ComparisonOp.GT,
ComparisonOp.GTE,
]:
:
It seems ComparisonOp.IS_NULL is absent in the list, so that the following if query["op"] == ComparisonOp.IS_NULL: fails. Could this be the problem?

@louisstuart96
Copy link

louisstuart96 commented Dec 17, 2024

Possible quick fix:

def to_es(query: Dict[str, Any]) -> Dict[str, Any]:
    """
    Transform a simplified CQL2 query structure to an Elasticsearch compatible query DSL.

    Args:
        query (Dict[str, Any]): The query dictionary containing 'op' and 'args'.

    Returns:
        Dict[str, Any]: The corresponding Elasticsearch query in the form of a dictionary.
    """
    if query["op"] in [LogicalOp.AND, LogicalOp.OR, LogicalOp.NOT]:
        bool_type = {
            LogicalOp.AND: "must",
            LogicalOp.OR: "should",
            LogicalOp.NOT: "must_not",
        }[query["op"]]
        return {"bool": {bool_type: [to_es(sub_query) for sub_query in query["args"]]}}

    elif query["op"] == ComparisonOp.IS_NULL:

        field = to_es_field(query["args"][0]["property"])
        return {"bool": {"must_not": {"exists": {"field": field}}}}

    elif query["op"] in [
        ComparisonOp.EQ,
        ComparisonOp.NEQ,
        ComparisonOp.LT,
        ComparisonOp.LTE,
        ComparisonOp.GT,
        ComparisonOp.GTE,
    ]:
        field = to_es_field(query["args"][0]["property"])
        value = query["args"][1]
        if isinstance(value, dict) and "timestamp" in value:
            # Handle timestamp fields specifically
            value = value["timestamp"]

        if query["op"] == ComparisonOp.EQ:
            return {"term": {field: value}}
        elif query["op"] == ComparisonOp.NEQ:
            return {"bool": {"must_not": [{"term": {field: value}}]}}
        else:
            range_op = {
                    ComparisonOp.LT: "lt",
                    ComparisonOp.LTE: "lte",
                    ComparisonOp.GT: "gt",
                    ComparisonOp.GTE: "gte",
                }[query["op"]]
            return {"range": {field: {range_op: value}}}

    elif query["op"] == AdvancedComparisonOp.BETWEEN:
        field = to_es_field(query["args"][0]["property"])
        gte, lte = query["args"][1], query["args"][2]
        if isinstance(gte, dict) and "timestamp" in gte:
            gte = gte["timestamp"]
        if isinstance(lte, dict) and "timestamp" in lte:
            lte = lte["timestamp"]
        return {"range": {field: {"gte": gte, "lte": lte}}}

    elif query["op"] == AdvancedComparisonOp.IN:
        field = to_es_field(query["args"][0]["property"])
        values = query["args"][1]
        if not isinstance(values, list):
            raise ValueError(f"Arg {values} is not a list")
        return {"terms": {field: values}}

    elif query["op"] == AdvancedComparisonOp.LIKE:
        field = to_es_field(query["args"][0]["property"])
        pattern = cql2_like_to_es(query["args"][1])
        return {"wildcard": {field: {"value": pattern, "case_insensitive": True}}}

    elif query["op"] == SpatialIntersectsOp.S_INTERSECTS:
        field = to_es_field(query["args"][0]["property"])
        geometry = query["args"][1]
        return {"geo_shape": {field: {"shape": geometry, "relation": "intersects"}}}

    return {}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants