You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
hello, piotrrzysko, In many business scenarios, parsing multi value from json only requires path array as parameters and get the string type value only one time, usage like hive's json_tuple udf, for example: parseValue(json, 'path1', 'path2', 'path3',,,,), and return (value1,vaule2,value3,,,)
Therefore, we can quickly get the value from json by bitIndexs built by simdjson. The advantage of this solution is that it avoids creating many java object instance for each json node, thereby avoiding garbage collection overhead, and can perform pruning operations, which can make performance better.
a simple example,
json value is: {"field1":{"field2":"value2","field3":3},"field4":["value4","value5"]}
we want to get paths is: [$.field1.field2,$.field4.0, $.field4]. ($.field4 will compress list to string, $.field4.0 will get first element from list)
expect return value is [value2, value4, '["value4","value5"]']
Solution Implementation
first, we can convert the path array to a tree。if node color is blue, means we want get value for the path, if the node is container type, we will compress it to string. for example $.field4
second,loop through the bitindex,and fill values into paths tree。
In the above example, the bitindex value is [0, 1, 9, 10, 11, 19, 20, 28, 29, 37, 38, 39, 40, 41, 49, 50, 51, 59, 60, 68, 69]
In the picture below, I marked the position marked by bitindex with ‘#’.
We can know that bitindex will mark the starting and ending positions of map type and list type ([ ] { }); the starting position of map type key and value and the middle ':' , and the position of ',' between different elements.
for the above example, we loop through the bitindex, step by step get the value of each node of json path tree, following is a simple flow chart
Since the json path tree can be reused, in the process of parsing multiple jsons, there is no need to build a json node tree for each json, but only a tree for the required path, which can improving parsing performance, and support compressing container type json data, and parsing multiple values at the same time, and is compatible with the case where the json value on the path is null.
The text was updated successfully, but these errors were encountered:
benchmark, simdjson2 vs jackson, performance is more than 6 times higher. if parsing less of json fields, the performance improvement is particularly obvious. reference
benchmark, simdjson2 vs jackson, performance is more than 6 times higher. if parsing less of json fields, the performance improvement is particularly obvious. reference
simdjson: 95.936 ops
jackson: 15.833
I think the benchmark is flawed due to the current setup, see #60 (comment) for details.
The text was updated successfully, but these errors were encountered: