Skip to content

Error: failed to parse field [...] of type [date] when painless script updates unrelated field #108977

@pmishev

Description

@pmishev

Elasticsearch Version

7.17.12

Installed Plugins

No response

Java Version

bundled

OS Version

Linux aa933ae49f18 5.15.49-linuxkit #1 SMP Tue Sep 13 07:51:46 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

Problem Description

When trying to delete a field from an index, which contains an epoch_second date field with a decimal number, an unexpected error occurs on the date field and the deletion doesn't happen.

Seems like in a certain scenario, ES does not recognise decimal numbers in the scientific notation it itself saves the data in.

Steps to Reproduce

PUT /test_ts
{
  "mappings": {
    "properties": {
      "update_datetime" : {
        "type" : "date",
        "format" : "epoch_second"
      },
      "is_private" : {
        "type" : "boolean"
      }
    }
  }
}
POST test_ts/_doc/1
{
  "update_datetime": 1716462600.37034
}
POST test_ts/_update_by_query
{
  "script": {
    "source": "ctx._source.remove('is_private');",
    "lang": "painless"
  }
}

Results in:

failed to parse field [update_datetime] of type [date] in document with id '1'. Preview of field's value: '1.71646260037034E9'

Strangely reindexing works fine with no errors:

POST _reindex
{
  "source": {
    "index": "test_ts"
  },
  "dest": {
    "index": "test_ts_1"
  }
}

Logs (if relevant)

No response

Activity

added
:Core/Infra/ScriptingScripting abstractions, Painless, and Mustache
and removed
needs:triageRequires assignment of a team area label
on May 31, 2024
elasticsearchmachine

elasticsearchmachine commented on May 31, 2024

@elasticsearchmachine
Collaborator

Pinging @elastic/es-core-infra (Team:Core/Infra)

rjernst

rjernst commented on May 31, 2024

@rjernst
Member

Can you please add error_trace=true to your test_ts/_update_by_query request? ie:

POST test_ts/_update_by_query?error_trace=true

That should give more details about where the error is actually occurring.

pmishev

pmishev commented on Jun 4, 2024

@pmishev
Author
{
  "took" : 5,
  "timed_out" : false,
  "total" : 1,
  "updated" : 0,
  "deleted" : 0,
  "batches" : 1,
  "version_conflicts" : 0,
  "noops" : 0,
  "retries" : {
    "bulk" : 0,
    "search" : 0
  },
  "throttled_millis" : 0,
  "requests_per_second" : -1.0,
  "throttled_until_millis" : 0,
  "failures" : [
    {
      "index" : "test_ts",
      "type" : "_doc",
      "id" : "1",
      "cause" : {
        "type" : "mapper_parsing_exception",
        "reason" : "failed to parse field [update_datetime] of type [date] in document with id '1'. Preview of field's value: '1.71646260037034E9'",
        "caused_by" : {
          "type" : "illegal_argument_exception",
          "reason" : "failed to parse date field [1.71646260037034E9] with format [epoch_second]",
          "caused_by" : {
            "type" : "date_time_parse_exception",
            "reason" : "Failed to parse with all enclosed parsers"
          }
        }
      },
      "status" : 400
    }
  ]
}
rjernst

rjernst commented on Jun 18, 2024

@rjernst
Member

Thanks for the info, I see what is happening.

Your update_datetime is passed as a JSON number. When this is parsed in Java (as it is when reindexing), it is placed in a double type. When that double is serialized back out, it uses scientific notation. Yet the epoch_second date format can't handle scientific notation.

While understandably confusing, I think fixing this would be difficult. When reindexing we don't know about the mapped types when parsing the source, it's just a json object. It might be possible to rework reindexing to use the original source bytes, but not without a bit of rework.

One workaround that should work is to use a string. So when indexing your original document, try this:

POST test_ts/_doc/1
{
  "update_datetime": "1716462600.37034"
}

That should retain the orignal formatting when parsed as JSON, and then serialized again as a string to be reindexed.

pmishev

pmishev commented on Jun 24, 2024

@pmishev
Author

Thanks for the workaround. So far seems to work after fixing my existing data:

POST test_ts/_update_by_query
{
  "script": {
    "source": """
      if (ctx._source.update_datetime instanceof Double) {
        double updateDatetime = ctx._source.update_datetime;
        // Convert double to String
        String updateDateTimeString = updateDatetime + "";
        // Remove the E9 suffix
        updateDateTimeString = updateDateTimeString.splitOnToken('E')[0];
        // Remove decimal point
        String[] splitString = updateDateTimeString.splitOnToken('.');
        updateDateTimeString = splitString[0] + splitString[1];
        // Insert the decimal point in the correct place
        String part1 = updateDateTimeString.substring(0, 10);
        String part2 = updateDateTimeString.substring(10);
        ctx._source.update_datetime = part1 + "." + part2;
      }
    """
  }
}
mosche

mosche commented on Jan 10, 2025

@mosche
Contributor

The underlying issue is in XContentHelper.convertToMap, which infers the field type based on the JSON token type. Any number containing a decimal point is converted to a double, possibly causing a precision loss compared to the string representation. double only has 52 bits of mantissa which can only precisely store the number of
nanoseconds until about 6am on April 15th, 1970. Additionally the string representation becomes the scientific notation which the parsers don't know how to deal with.

Related:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @rjernst@mosche@pmishev@henningandersen@elasticsearchmachine

        Issue actions

          Error: failed to parse field [...] of type [date] when painless script updates unrelated field · Issue #108977 · elastic/elasticsearch