Skip to content

Conversation

@EPMatt
Copy link
Collaborator

@EPMatt EPMatt commented Nov 12, 2025

Changes

Support &-based escaped string literals

Implement &-based escaping for Character References and Predefined Character References in String Literals, Key Specifiers (XQuery 4.0), BracedURILiterals, Text Node content, and Attribute Content Values.

The escaping is implemented at the static analysis level (post automated parsing).

Refactor schema detection in RuntimeIterator for creating Dataframes directly from an RDD

Replace the current JSON-based schema detection mechanism (which uses Spark's schema_of_variant_agg and only supports JSON types) with a native Java/JSONiq implementation that works directly with Item types using the findLeastCommonSuperTypeWith method.

Testing

See improved test coverage for XQuery Parser (PR CI comment).

@EPMatt EPMatt marked this pull request as ready for review November 12, 2025 15:15
@github-actions
Copy link

Test Results (qt3tests)

RumbleDB, XQuery parser
Test Suite Passing Failing Errors Skipped Total
MathTest 147 0 2 0 149
MiscTest 181 373 180 137 871
Prod1Test 4836 851 1808 743 8238
Fn1Test 2588 610 1730 367 5295
SerTest 4 2 1 336 343
Fn2Test 3156 956 1264 464 5840
AppTest 989 46 1084 38 2157
Prod2Test 1733 543 1169 524 3969
ArrayTest 0 45 155 9 209
XsTest 89 14 12 49 164
OpTest 4012 117 194 43 4366
MapTest 4 23 193 0 220
Total 17739 3580 7792 2710 31821
RumbleDB, JSONiq parser
Test Suite Passing Failing Errors Skipped Total
MiscTest 162 284 114 311 871
ArrayTest 0 0 0 209 209
Fn1Test 2400 177 118 2600 5295
XsTest 89 0 0 75 164
Prod1Test 3902 201 324 3811 8238
SerTest 4 0 0 339 343
Fn2Test 2659 288 85 2808 5840
MapTest 3 1 14 202 220
AppTest 971 17 20 1149 2157
Prod2Test 1320 221 129 2299 3969
OpTest 3742 28 21 575 4366
MathTest 147 0 1 1 149
Total 15399 1217 826 14379 31821

Download detailed test results

@github-actions
Copy link

Test Results (qt3tests)

RumbleDB, XQuery parser
Test Suite Passing Failing Errors Skipped Total
MathTest 147 0 2 0 149
MiscTest 181 373 180 137 871
Prod1Test 4836 851 1808 743 8238
Fn1Test 2588 610 1730 367 5295
SerTest 4 2 1 336 343
Fn2Test 3156 956 1264 464 5840
AppTest 989 46 1084 38 2157
Prod2Test 1733 543 1169 524 3969
ArrayTest 0 45 155 9 209
XsTest 89 14 12 49 164
OpTest 4012 117 194 43 4366
MapTest 4 23 193 0 220
Total 17739 3580 7792 2710 31821
RumbleDB, JSONiq parser
Test Suite Passing Failing Errors Skipped Total
MiscTest 162 284 114 311 871
ArrayTest 0 0 0 209 209
Fn1Test 2400 177 118 2600 5295
XsTest 89 0 0 75 164
Prod1Test 3902 201 324 3811 8238
SerTest 4 0 0 339 343
Fn2Test 2659 288 85 2808 5840
MapTest 3 1 14 202 220
AppTest 971 17 20 1149 2157
Prod2Test 1320 221 129 2299 3969
OpTest 3742 28 21 575 4366
MathTest 147 0 1 1 149
Total 15399 1217 826 14379 31821

Download detailed test results

ghislainfourny

This comment was marked as outdated.

@ghislainfourny
Copy link
Member

ghislainfourny commented Nov 20, 2025

I tested it with the Python library and the DataFrame query output and it seems some cases that were previously successfully converted to DataFrames no longer succeed. Would it maybe be possible to extend the error message with the inferred common denominator schema to help understand what the issue is? Then I will run it again and see if there is an obvious overlook. Thanks!

My guess right now is that upon merging two different object types, RumbleDB just outputs the topmost "object" primitive type instead of merging field by field. The reason is that the type hierarchy in JSound is explicitly based on the given base types and does not logically follow just from the object content layout. We might need to add an option to the method that computes the least common super type that says "lax" or "strict" and if it is "lax", we merge field by field to a new anonymous type, if it is "strict", we output the topmost object primitive type.

@github-actions
Copy link

Test Results (qt3tests)

RumbleDB, XQuery parser
Test Suite Passing Failing Errors Skipped Total
MathTest 147 0 2 0 149
MiscTest 181 373 180 137 871
Prod1Test 4836 851 1808 743 8238
Fn1Test 2588 610 1730 367 5295
SerTest 4 2 1 336 343
Fn2Test 3156 948 1272 464 5840
AppTest 989 46 1084 38 2157
Prod2Test 1733 543 1169 524 3969
ArrayTest 0 45 155 9 209
XsTest 89 14 12 49 164
OpTest 4012 117 194 43 4366
MapTest 4 23 193 0 220
Total 17739 3572 7800 2710 31821
RumbleDB, JSONiq parser
Test Suite Passing Failing Errors Skipped Total
MiscTest 162 284 114 311 871
ArrayTest 0 0 0 209 209
Fn1Test 2400 177 118 2600 5295
XsTest 89 0 0 75 164
Prod1Test 3902 201 324 3811 8238
SerTest 4 0 0 339 343
Fn2Test 2659 282 91 2808 5840
MapTest 3 1 14 202 220
AppTest 971 17 20 1149 2157
Prod2Test 1320 221 129 2299 3969
OpTest 3742 28 21 575 4366
MathTest 147 0 1 1 149
Total 15399 1211 832 14379 31821

Download detailed test results

@ghislainfourny ghislainfourny self-requested a review November 20, 2025 16:06
Copy link
Member

@ghislainfourny ghislainfourny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am setting the review on "Request changes" so we can investigate why some outputs no longer get validated (see my comment above).

…ode (#3)

* feat(types): implement common super type lax mode, custom implementations for objects and arrays

* test(types): add tests for common supertype lax mode

* fix(validatetypeiterator): use new lax common supertype method for rdd schema inference
@EPMatt
Copy link
Collaborator Author

EPMatt commented Nov 28, 2025

@ghislainfourny I’ve added the common supertype lax mode as requested (see EPMatt#3). Please let me know if this resolves the issue.

Regarding the detailed error log: could you share the output (logs and/or stack trace) you’re seeing with the failing test cases? This will help me determine where to improve the error reporting in the code.

Thanks!

@github-actions
Copy link

Test Results (qt3tests)

RumbleDB, XQuery parser
Test Suite Passing Failing Errors Skipped Total
MathTest 147 0 2 0 149
MiscTest 181 373 180 137 871
Prod1Test 4836 851 1808 743 8238
Fn1Test 2588 610 1730 367 5295
SerTest 4 2 1 336 343
Fn2Test 3156 948 1272 464 5840
AppTest 989 46 1084 38 2157
Prod2Test 1733 543 1169 524 3969
ArrayTest 0 45 155 9 209
XsTest 89 14 12 49 164
OpTest 4012 117 194 43 4366
MapTest 4 23 193 0 220
Total 17739 3572 7800 2710 31821
RumbleDB, JSONiq parser
Test Suite Passing Failing Errors Skipped Total
MiscTest 162 284 114 311 871
ArrayTest 0 0 0 209 209
Fn1Test 2400 177 118 2600 5295
XsTest 89 0 0 75 164
Prod1Test 3902 201 324 3811 8238
SerTest 4 0 0 339 343
Fn2Test 2659 282 91 2808 5840
MapTest 3 1 14 202 220
AppTest 971 17 20 1149 2157
Prod2Test 1320 221 129 2299 3969
OpTest 3742 28 21 575 4366
MathTest 147 0 1 1 149
Total 15399 1211 832 14379 31821

Download detailed test results

@github-actions
Copy link

github-actions bot commented Dec 6, 2025

Test Results (qt3tests)

RumbleDB, XQuery parser
Test Suite Passing Failing Errors Skipped Total
MathTest 147 0 2 0 149
MiscTest 181 373 180 137 871
Prod1Test 4836 851 1808 743 8238
Fn1Test 2588 610 1730 367 5295
SerTest 4 2 1 336 343
Fn2Test 3156 948 1272 464 5840
AppTest 989 46 1084 38 2157
Prod2Test 1733 543 1169 524 3969
ArrayTest 0 45 155 9 209
XsTest 89 14 12 49 164
OpTest 4012 117 194 43 4366
MapTest 4 23 193 0 220
Total 17739 3572 7800 2710 31821
RumbleDB, JSONiq parser
Test Suite Passing Failing Errors Skipped Total
MiscTest 162 284 114 311 871
ArrayTest 0 0 0 209 209
Fn1Test 2400 177 118 2600 5295
XsTest 89 0 0 75 164
Prod1Test 3902 201 324 3811 8238
SerTest 4 0 0 339 343
Fn2Test 2659 282 91 2808 5840
MapTest 3 1 14 202 220
AppTest 971 17 20 1149 2157
Prod2Test 1320 221 129 2299 3969
OpTest 3742 28 21 575 4366
MathTest 147 0 1 1 149
Total 15399 1211 832 14379 31821

Download detailed test results

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants