Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HIVE-28262:Single column use MultiDelimitSerDe parse column error #5252

Conversation

LaughingVzr
Copy link
Contributor

@LaughingVzr LaughingVzr commented May 16, 2024

What changes were proposed in this pull request?

modify LazyStruct#findIndexes function and LazyStruct#parseMultiDelimit function, change fields.length Conditional judgment:

public void parseMultiDelimit(byte[] rawRow, byte[] fieldDelimit) {
     - if (fields.length > 1 && delimitIndexes[i - 1] != -1) {
     + if (delimitIndexes[i - 1] != -1) {
  }

 private int[] findIndexes(byte[] array, byte[] target) {
    - if (fields.length <= 1) {
    + if (fields.length < 1) {
   ...
    - for (int i = 1; i < indexes.length; i++) {
    + for (int i = 1; i <= indexes.length; i++) {
    ...
    }
    return indexes;
  }

I add an test for this fix:

@Test
    public void testParseMultiDelimit() throws Throwable {
        try {
            // single field named id
            List<String> structFieldNames = new ArrayList<>();
            structFieldNames.add("id");
            // field type is string
            List<TypeInfo> fieldTypes = new ArrayList<>();
            PrimitiveTypeInfo primitiveTypeInfo = new PrimitiveTypeInfo();
            primitiveTypeInfo.setTypeName("string");
            fieldTypes.add(primitiveTypeInfo);

            // separators + escapeChar => "|�����"
            byte[] separators = new byte[]{124, 2, 3, 4, 5, 6, 7, 8};

            // sequence =>"\N"
            Text sequence = new Text();
            sequence.set(new byte[]{92, 78});

            // create lazy object inspector parameters
            LazyObjectInspectorParameters lazyObjectInspectorParameters = new LazyObjectInspectorParametersImpl(false, (byte) '0',
                    false, null, separators, sequence);
            // create a lazy struct inspector
            ObjectInspector lazyStructInspector = LazyFactory.createLazyStructInspector(structFieldNames, fieldTypes, lazyObjectInspectorParameters);
            LazyStruct lazyStruct = (LazyStruct) LazyFactory.createLazyObject(lazyStructInspector);

            // origin row data
            String rowData = "1|@|";
            // row field delimiter
            String fieldDelimiter = "|@|";

            // parse row use multi delimit
            lazyStruct.parseMultiDelimit(rowData.getBytes(StandardCharsets.UTF_8),
                    fieldDelimiter.getBytes(StandardCharsets.UTF_8));

            // check the first field and second field start position index
            // before fix result: 0,1
            // after fix result: 0,2
            Assert.assertArrayEquals(new int[]{0, 2}, lazyStruct.startPosition);
        } catch (Throwable e) {
            e.printStackTrace();
            throw e;
        }
    }

Why are the changes needed?

If a table only have one column field with multidelimit,query this column data is error data.
When I use this data to do other operation(e.g cast use UDFToLong function),get result is NULL.

Does this PR introduce any user-facing change?

No

Is the change a dependency upgrade?

No

How was this patch tested?

test class: serde/src/test/org/apache/hadoop/hive/serde2/lazy/TestLazyStruct.java
test function: org.apache.hadoop.hive.serde2.lazy.TestLazyStruct#testParseMultiDelimit
test command: mvn test -Dtest=TestLazyStruct --pl serde

Copy link

sonarcloud bot commented May 19, 2024

Quality Gate Passed Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
No data about Duplication

See analysis details on SonarCloud

@LaughingVzr LaughingVzr force-pushed the BUG-FIX-one-fields-multidelimit-parse-error branch from 87df75f to 3cae7af Compare June 27, 2024 07:52
@LaughingVzr
Copy link
Contributor Author

Hi, @LaughingVzr Thanks for you contribution. Failed tests seemed to be related your change, Please take a look: http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-5252/7/tests

Ok, I will check it

@zhangbutao
Copy link
Contributor

LGTM. Pending some nit comments. :)
Thanks for the fix!

LaughingVzr

This comment was marked as duplicate.

LaughingVzr

This comment was marked as spam.

Copy link

sonarcloud bot commented Jul 2, 2024

Copy link
Contributor

@zhangbutao zhangbutao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@zhangbutao zhangbutao merged commit b15ce91 into apache:master Jul 3, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants